Alibaba launches maths-specific AI models said to outperform LLMs from OpenAI, Google

The new Qwen2-Math large language models are expected to help solve complex maths problems

Reading Time:2 minutes

Alibaba Group Holding’s maths-specific large language models further burnish the company’s artificial intelligence credentials. Photo: Shutterstock

Ann Caoin Shanghai

Published: 9:30am, 10 Aug 2024

Alibaba Group Holding is aiming to raise the bar in artificial intelligence (AI) development by launching a group of maths-specific large language models (LLMs) called Qwen2-Math, which the e-commerce giant claims can outperform the capabilities of OpenAI’s GPT-4o in that field.

“Over the past year, we have dedicated significant efforts to researching and enhancing the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems,” the Qwen team, part of Alibaba’s cloud computing unit, said in a post published on developer platform GitHub on Thursday. Alibaba owns the South China Morning Post.

The latest LLMs – the technology underpinning generative AI services like ChatGPT – were built on the Qwen2 LLMs released by Alibaba in June and covers three models based on their scale of parameters – a machine-learning term for variables present in an AI system during training, which helps establish how data prompts yield the desired output.

The model with the largest parameter count, Qwen2-Math-72B-Instruct, outperformed proprietary US-developed LLMs in maths benchmarks, according to the Qwen team’s post. Those included GPT-4o, Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro and Meta Platforms’ Llama-3.1-405B.

“We hope that Qwen2-Math can contribute to the community for solving complex mathematical problems,” the post said.

The family of Tongyi Qianwen, also known as Qwen, large language models from Alibaba Group Holding’s cloud computing unit, now includes maths-specific LLMs. Photo: Shutterstock

The Qwen2-Math AI models were tested on both English and Chinese maths benchmarks, according to the post. These included GSM8K, a data set of 8,500 high-quality linguistically diverse grade school maths problems; OlympiadBench, a high-level bilingual multimodal scientific benchmark; and the gaokao, the mainland’s daunting university entrance examination.

Select Voice

Choose your listening speed

Get through articles 2x faster

1.25x

250 WPM

Slow

Average

Fast

00:0000:00

1.25x