Singapore builds ChatGPT-alike to better represent Southeast Asian languages, cultures

Name: How does China’s AI stack up against ChatGPT?
Uploaded: 2024-02-08T04:07:38.000Z
Duration: 5 min 3 s
Description: How does China’s AI stack up against ChatGPT?

Trained on data in 11 languages including Vietnamese and Thai, the open-sourced large language model is the first in a family of LLMs named SEA-LION

It aims to lessen reliance on Western models developed for English – but concerns remain about bias, historical revisionism and censorship

Reading Time:4 minutes

Why you can trust SCMP

Thomson Reuters Foundation

Published: 12:07pm, 8 Feb 2024Updated: 12:12pm, 8 Feb 2024

Like millions worldwide, Southeast Asians have been trying out large language models such as Meta’s Llama 2 and Mistral AI – but in their native Bahasa Indonesia or Thai. The result has usually been gibberish in English.

This leaves them at a disadvantage, tech experts warn, as generative artificial intelligence transforms education, work and governance worldwide.

A Singapore government-led initiative aims to correct the imbalance with a Southeast Asian LLM, the first in a family of models named SEA-LION – Southeast Asian Languages in One Network – trained in the region’s languages and cultural norms.

05:03

How does China’s AI stack up against ChatGPT?

Trained on data in 11 Southeast Asian languages including Vietnamese, Thai and Bahasa Indonesia, the open-sourced model is a cheaper and more efficient option for the region’s businesses, governments and academia, said Leslie Teo at AI Singapore.

“Do we want to force every person in Southeast Asia to adapt to the machine, or do we want to make it more accessible so people in the region can make full use of the technology without having to be an English speaker?” he said.

“We are not trying to compete with the big LLMs; we are trying to complement them, so there can be better representation of us,” said Teo, senior director for AI products.

There are over 7,000 languages spoken worldwide. Yet LLMs including Open AI’s GPT-4 and Meta’s Llama 2 that are used to build AI systems such as chatbots and other tools, have largely been developed for, and are trained on, the English language.

Singapore builds ChatGPT-alike to better represent Southeast Asian languages, cultures

Trained on data in 11 languages including Vietnamese and Thai, the open-sourced large language model is the first in a family of LLMs named SEA-LION It aims to lessen reliance on Western models developed for English – but concerns remain about bias, historical revisionism and censorship

Trained on data in 11 languages including Vietnamese and Thai, the open-sourced large language model is the first in a family of LLMs named SEA-LION

It aims to lessen reliance on Western models developed for English – but concerns remain about bias, historical revisionism and censorship