« The language and culture are two sides of the same coin, » wrote the linguist Emile Benveniste. In this sense, the young developers believe that languages with limited resources on the Internet, such as Darija, are « an essential element of cultural and identity expression. »
Twelve Moroccan and international researchers, from EMINES-Mohammed VI Polytechnic University (Ben Guerir), Mohamed bin Zayed University of Artificial Intelligence (Abu Dhabi), KTH Royal Institute of Technology (Stockholm), AtlasIA (a Moroccan group of ‘Darija NLP Practitioners’), and the École Polytechnique (Palaiseau), announced on September 26th the launch of Atlas-Chat, the first artificial intelligence (AI) specifically designed for Darija, the dialect spoken by the majority of Moroccans.
Named in homage to the Atlas Mountains, « Atlas-Chat is the first collection of models designed for Darija, and more broadly for an Arabic dialect. The Atlas-Chat models are developed using instruction-tuning, which involves preparing a dataset of instruction-response pairs and using it to fine-tune (further train) an existing model, » explains Yousef Khoubrane from EMINES-UM6P, who is part of the research team and responds on their behalf.
« The model thus learns from a vast dataset: over 458,000 instruction-response pairs, » he adds. The researchers aim to « optimally respond to the various user instructions. »
An adapted, adaptable, and confidential AI
Atlas-Chat is « the first open-source model capable of understanding and speaking Darija, » confirms Yousef Khoubrane, emphasizing that there are also « advanced proprietary models in Darija like GPT-4 and Claude Sonnet 3.5, » but they have limitations: « These models are expensive to use, are not accessible for customization, and pose confidentiality issues, » explains the researcher. « Indeed, your data may be shared with the parent companies of the models, such as OpenAI and Anthropic, among others. » On the other hand, « you can download Atlas-Chat and use it locally at no cost and with no risk to confidentiality, as your data remains on your machine without being shared on the internet, » the researcher proudly notes.
Atlas-Chat is also adaptable to specific needs and tasks. « If you have a dataset specific to a use case, you can fine-tune the model to adapt it to your tasks, » he suggests.
A range of free services
Atlas-Chat is an AI that aims to break away from paid models, thus benefiting a large audience of speakers of this « low-resource » dialect on the internet. « The models are available on HuggingFace,
so anyone can download them to their machine, as long as it is capable of hosting them, » the researcher explains, adding that « the usage code is also available on the same page of the model on HuggingFace, » and that « all you need to do is pass your prompt to the model for it to generate the appropriate response. »
This new technology, which « can understand instructions in both Darija and English, although the responses are in Darija, » offers additional services. It can answer questions, summarize texts, rephrase ideas, explain concepts, translate between Darija, Arabic, French, and English, as well as transcribe between the two writing systems of Darija: Arabic letters and Latin letters.

Thus, Atlas-Chat « will be particularly useful for Moroccan users with limited proficiency in other languages, for businesses, for their clients speaking Darija, or even for automating the processing of data in Darija to extract specific information, » emphasizes the researcher.
Gain visibility
“The idea is not simply to create a technological model, but to contribute to the presence and promotion of Darija on the Internet”
« The goal is therefore not simply to create a technological model, but to contribute to the presence and promotion of Darija on the Internet, » emphasizes the researcher. The free access and availability of this technology « will allow Moroccan speakers to use their language in a digital environment, » which is significant, as it can « enhance the visibility of Darija online, thus contributing to the preservation and promotion of this language and the culture it carries, » concludes Yousef Khoubrane.
Written in French by Karim El Haddady, edited in English by Eric Nielson
