China – Alibaba, the Chinese tech and e-commerce giant, has officially joined the AI race with the launch of Qwen2.5-VL, an open-source multimodal model that builds on the capabilities of its predecessor, Qwen2-VL.
In a blog post, Alibaba shared that Qwen2.5-VL shows impressive multimodal capabilities, excelling at understanding texts, charts, diagrams, and layouts in images. It can also analyse videos longer than an hour, answer related questions, and pinpoint specific segments.
Additionally, the model can also convert unstructured data from invoices, forms, or tables into organised formats like JSON, making it useful for automating tasks such as processing financial or legal documents.
Alibaba also claimed that by combining parsing and localisation features, Qwen2.5-VL can act as a visual agent, helping users perform tasks like checking the weather or booking a flight by guiding the use of different tools on computers and mobile devices.
The company further revealed that their flagship model, Qwen2.5-VL-72B-Instruct, performs competitively across a range of benchmarks, including document and diagram reading, visual question answering, college-level math, video understanding, and visual tasks.
It’s also worth noting that Alibaba and the Qwen team are developing Qwen2.5-Max, a large-scale MoE model they claim outperforms DeepSeek V3 in key areas like coding, general tasks, and human preferences. They also stated it has shown competitive results in other assessments, including tests on college-level knowledge.
“Qwen2.5-Max outperforms … almost across the board GPT-4o, DeepSeek-V3, and Llama-3.1-405B,” Alibaba’s cloud unit said in an announcement on its official WeChat account, referencing OpenAI and Meta’s leading open-source AI models, Reuters reported.
Both Qwen2.5-Max and Qwen2.5-VL are now accessible via Qwen Chat, Alibaba’s conversational AI platform, where users can interact with the models, explore features, and perform tasks like searching. Additionally, developers can access the Qwen2.5-Max API through Alibaba Cloud.
The release of Alibaba’s Qwen2.5 comes after DeepSeek launched its AI assistant powered by the DeepSeek-V3 model on January 10, followed by the January 20 release of its R1 model, which has sparked significant discussions around the AI boom and the pressure for AI firms to upgrade their own model.