A novel artificial intelligence large language model that is designed specifically for Arabic, and was developed in Abu Dhabi, has been introduced by the UAE government. The objective of developing such an AI language LLM was to integrate one of the world’s most extensively spoken languages into the mainstream AI landscape.
Meet Jais – UAE’s own LLM
Known as Jais, this open-source bilingual Arabic-English model was created through collaboration between Inception, a subsidiary of Abu Dhabi’s AI firm G42, Mohammed bin Zayed University of Artificial Intelligence, and Silicon Valley’s Cerebras Systems.
The developers assert that Jais surpasses the accuracy of existing Language Models (LLMs) for Arabic. This resource can be downloaded via the machine-learning platform Hugging Face.
The unveiling of Jais marks a progressive stride in encouraging the scientific and computational communities to direct more attention toward non-English LLMs, akin to initiatives seen in Japan and India, according to Andrew Jackson, the CEO of Inception.
In an interview with a local news agency, Jackson elaborated, “We envision Jais as highly valuable for generative applications, such as formulating responses to queries, generating documents, performing translations, composing emails, and even dispensing advice and recommendations.”
Jais is adept at capturing the subtleties within various Arabic dialects and possesses the capability to comprehend language, context, and cultural allusions, which renders it notably more precise and contextually pertinent compared to other models, as stated by the collaborating companies.
Dubbed ‘Jais’ after the tallest peak in Ras Al Khaimah, UAE, this innovation was specifically designed for governmental use, as well as for sectors spanning finance, energy, climate, and healthcare.
Mainly developed for the UAE government
Numerous public and private entities within the UAE have joined as launch partners for Jais, including the Ministry of Foreign Affairs, the Ministry of Industry and Advanced Technology, the Department of Health – Abu Dhabi, ADNOC, Etihad Airways, FAB, and e&, the technology conglomerate previously known as Etisalat.
Jais has undergone training on the Condor Galaxy, recognized as the “world’s largest AI supercomputer,” established by G42 and Cerebras in July. This training involved 116 billion Arabic tokens and 279 billion English tokens. The model is continually expanding as more Arabic content is amassed to create fresh instruction sets.
The need for local languages in LLMs and AI
Arabic is widely spoken worldwide, encompassing over 400 million individuals, as indicated by WorldData. It serves as the official language in 22 nations and is partially spoken in 11 others. However, its online presence remains limited, with only around 1 percent of Arabic content accessible on the internet, as per data shared by the collaborating companies.
Jackson said that Jais would aid in elevating this statistic. He noted, “We’re initiating an project to collect more Arabic data from offline sources. This initiative has already been launched earnestly.”
He added, “We’re also exploring novel methods to synthesize Arabic content and translate existing English content into Arabic. Although we have a long way to go, optimism is crucial as we vigorously advance.”
Overall, this has introduced a new battleground within the tech sector, with companies striving to establish an early advantage and broaden their horizons in generative AI.
The availability of LLMs stands to assist these companies in their endeavors, particularly as developers continually enhance AI capabilities.
“Speed performance is a priority for developers, not only because it expedites the introduction of new models to the community, production, or market, but also because it empowers data scientists and machine learning researchers to promptly implement and iterate various models,” emphasized Jackson.
from Firstpost Tech Latest News https://ift.tt/fdsAIUr
Comments
Post a Comment