Exclusive | PolyU’s top AI scientist Yang Hongxia seeks to revolutionise LLM development in Hong Kong

Yang Hongxia said she wants to change the resource-intensive process of training LLMs into a decentralised, machine-learning paradigm

Reading Time:2 minutes

Artificial intelligence scientist Yang Hongxia, said small AI models, once put together, can outperform the most advanced large language models in specific domains. Photo: Sina

Published: 9:00am, 13 Aug 2024Updated: 4:10pm, 25 Sep 2024

Mainland Chinese artificial intelligence (AI) scientist Yang Hongxia, the newly named professor at Hong Kong Polytechnic University (PolyU), is on a mission to revolutionise the development of large language models (LLMs) – the technology underpinning generative AI services like ChatGPT – from her base in the city.

In an interview with the South China Morning Post, Yang – who previously worked on AI models at ByteDance and Alibaba Group Holding’s Damo Academy – said she wants to change the existing resource-intensive process of training LLMs into a decentralised, machine-learning paradigm. Alibaba owns the South China Morning Post.

“The rapid advances in generative AI, spearheaded by OpenAI’s GPT series, have unlocked immense possibilities,” she said. “Yet this progress comes with significant disparities in resource allocation.”

At present, Yang said LLM development has mostly relied on deploying advanced and expensive graphics processing units (GPUs), from the likes of Nvidia and Advanced Micro Devices, in data centres for projects involving vast amounts of raw data, which has put deep-pocketed Big Tech companies and well-funded start-ups at a major advantage.

The entrance to the Hung Hom campus of Hong Kong Polytechnic University, where artificial intelligence scientist Yang Hongxia serves as a professor at the Department of Computing. Photo: Sun Yeung

Yang said she and her colleagues propose a “model-over-models” approach to LLM development. That calls for a decentralised paradigm in which developers train smaller models across thousands of specific domains, including code generation, advanced data analysis and specialised AI agents.

These smaller models would then evolve into a large and comprehensive LLM, also known as a foundation model. Yang pointed out that this approach could reduce the computational demands at each stage of LLM development.