Advertisement

China’s AI aspirations boosted as public data allowed for labelling

New initiative encourages collaboration between government and enterprises to annotate and train data for large language models tailored to government use

Reading Time:2 minutes
Why you can trust SCMP
A woman walks past a building with an exterior wall painted to look like a circuit board in China’s Hangzhou, Zhejiang province. Photo: AFP

For the first time, China has released a plan to allow the use of public data for labelling as the nation aims to empower its fast-growing digital economy and facilitate artificial intelligence (AI) development amid an intensifying international rivalry.

Advertisement

According to a 13-point circular jointly released by four government bodies on Monday, China will promote the systematic labelling and utilisation of public data, while addressing the data needs of key sectors such as agriculture, manufacturing and information technology.

“[The government should] support cross-sectoral, cross-regional and cross-administrative-level use of public data; encourage collaboration between government and enterprises in data labelling and training for large language models tailored to government affairs; and also promote the inclusion of data-labelling services into government procurement,” said the circular, jointly issued by the National Development and Reform Commission (NDRC), the National Bureau of Statistics, the Ministry of Finance, and the Ministry of Human Resources and Social Security.

It added that the labelling of public data should be conducted in an orderly manner and in accordance with the law.

Data annotation – the process of categorising and labelling different data types such as text, audio, images and video – is often considered a foundation for enabling AI systems to produce accurate and reliable outcomes. The lack of high-quality data has been one of the challenges in developing large language models.

Advertisement

With the sector valued at 80 billion yuan (US$10.91 billion) in 2023, data annotation is widely applicable in advancing fields such as autonomous driving, low-altitude economies, smart manufacturing and intelligent healthcare.

China has vowed to increase the compound annual growth rate of the data-annotation industry to 20 per cent by 2027, according to the guidelines.

loading
Advertisement