The dawn of ChatGPT equivalents in non-English-speaking regions

New This article is in

Debraj Manna
Aug 13, 2024

Reading time

2 mins

The dawn of ChatGPT equivalents in non-English-speaking regions

Since OpenAI launched its AI Chatbot, ChatGPT, for the public, there has been an uproar about all things generative AI. But how do these AI models vary across the globe? Are there any requirements for having country-specific AI alternatives? This article delves into the current scenario of regional AI alternatives and how they are changing the world.

Chatbots like ChatGPT are powered by large language models (LLMs) that work by learning from available data—in this case, textual data. LLMs are deep neural networks that generate text outputs according to how humans write or speak. Although LLMs like ChatGPT are pre-trained, they also continuously learn from user queries and feedback. As these generative AI LLMs depend on the data they are fed for the training, what they “know” depends entirely on what information is available to them. Moreover, their responses are also based on what data was used for the pre-training and what it picked up during interactions with us.

While continuous learning provides these LLMs multiple advantages, data security issues could be imminent as LLMs continue to learn through user queries. Although one way to mitigate this issue is to impose governmental regulations, the other could be using AI chatbots specific to regions.

As linguistic nuances and cultural specificities vary with every geographical region and language, it’s often hard to use an LLM developed via training in a language distinct from the regional language of the place where it needs to be used. The global dominance of English ensures that big tech companies build LLMs trained in English. While most well-known LLMs have been trained in and generate outputs mainly in English, of late, several have started to strive for mastery in other languages.

Regional alternatives of ChatGPT include China’s ChatGLM, South Korea’s Clover X, India’s Dhenu 1.0, and others.

Regional AI chatbots

ChatGLM

The unavailability of ChatGPT and nuances in linguistic specification calls for an LLM based in China. Developed by Zhipu AI and Tsinghua University, ChatGLM is a bilingual language model that works in Chinese and English. As this LLM was pre-trained and validated with available Chinese data by Chinese speakers, it preempts the LLM in oversimplifying and ignoring linguistic nuances of the Chinese language compared to when LLMs are trained in English. Further, it enables ChatGLM to provide correct information related to the nation.

Clova X

Navier has developed an AI chatbot, Clova X, powered by an LLM, HyperClova X, which is trained explicitly in Korean. It has also launched Cue, an AI-based search engine. With a thorough understanding of Korean culture, Clova X aims to provide the best-suited service to the people of Korea. As Clova X is connected with several other services, searching for travel destinations, making restaurant reservations, and booking cab services are all possible.

Dhenu 1.0

It is an Indian AI chatbot specifically trained to provide Indian agricultural solutions. It is bilingual in Hindi and English and provides information about diseases of essential crops like rice, maize, and wheat in a conversational style with farmers in India. With information on geography, climate, and crops, Dhenu 1.0 bridges the language gap and targets a key profession for its applications.

Many companies are specifically creating AI LLMs for non-English-speaking countries, including Japan, China, and Southeast Asian countries. Several Indian companies have built LLMs trained in Indian languages like Telugu, Tamil, Kannada, Hindi, Bangla, etc. Some of these LLMs are Navarasa 2.0, Bhashini, and Kannada Llama.

Challenges in developing country-specific LLMs

While developing country-specific LLMs could be the need of the hour to help everyone use AI-based capabilities, several challenges line up in creating these LLMs.

Scarcity of available data on the internet in a regional language: As LLMs require pre-training data to learn and answer user queries, large amounts of data in one regional language might be challenging.

Language understanding by deconstructing sentences: Syntax of sentences can vary across languages, including the mere use of spaces between words. This could prove to be a tricky part to navigate with less available information.

Difficulty in achieving comparable prowess with other global AI chatbots: Because they are targeted to a region, regional AI chatbots often have a limited number of users working on them and even fewer providing feedback.

With further developments of regional AI chatbots in the future, the availability of some open-source versions will ensure that most of the targeted population can use the models and can also download and train these models with in-house data for usage in specific applications. As countries develop new ideas, it will be worth seeing how regional AI chatbots of the future shape our world.