422 Million Speakers. 0.6% of the Internet. Now, Three Sovereign AI Models
An aggregated analysis based on public benchmark data, official model documentation, and Usetech’s experience.
Sovereign AI GCC development is redefining how global AI power is distributed, as regional models emerge alongside Western and Chinese systems in the race for digital and computational sovereignty.
There is a number that tells the whole story: 0.6%.
That is the share of online content written in Arabic — a language spoken by 422 million people, including 313 million native speakers. For comparison, English dominates the web at over 50%, yet English native speakers number fewer than half that. The gap is not a quirk of history. It is a structural deficit that has shaped — and constrained — every AI system trained on internet data.
Large language models inherit the biases of their training data. A model trained predominantly on English text thinks in English, reasons in English categories, and fails in Arabic in ways that are hard to see until you are the Arabic speaker on the other end of the conversation. A 2025 study found that LLMs perform significantly worse in Arabic than in English on key educational tasks like tutoring and feedback. In healthcare, law, and government services — the sectors where AI is being deployed fastest across the GCC — that performance gap is not an inconvenience. It is a risk.
The Gulf states decided not to wait for OpenAI to solve this. They built their own models.
Why Sovereign AI in the GCC Matters Globally
Before examining who is building what, it is worth understanding the linguistic terrain they are navigating.
Arabic is not one language in the computational sense. It is a family: Modern Standard Arabic (MSA) is the written formal register used across 22 countries; regional dialects — Gulf, Levantine, Egyptian, Maghrebi — can be mutually unintelligible. A model trained on MSA newspaper text will struggle with the colloquial Gulf Arabic used in a customer service chat. A model trained on Egyptian dialect data will misfire in Saudi contexts.
Arabic content constitutes only about 2–3% of global digital content — despite Arabic speakers representing roughly 5% of internet users. More strikingly: more than 70% of Saudi nationals prefer Arabic content, yet Arabic makes up only 1% of the total content they actually access online. The demand is real. The supply is not.
Much of the Arabic-language data available for AI training today consists of translated English content, often missing cultural nuances and failing to reflect real-world language use. Training an Arabic AI on translated content is the equivalent of training an English model on Shakespeare: technically in the language, practically alien.
This is the problem the GCC’s three sovereign Arabic LLMs — Falcon (UAE), ALLaM (Saudi Arabia), and Fanar (Qatar) — were built to solve. Each takes a different approach. Each reflects the strategic priorities of its home country.
The Three Sovereign AI Models Emerging in the GCC
Falcon-H1 Arabic — UAE: Performance as National Statement
The Technology Innovation Institute (TII) in Abu Dhabi has been in the Arabic LLM race longer than anyone else in the region. The Falcon family began as a general multilingual model; the Arabic-specific branch reflects a deliberate strategic pivot.
Falcon-H1 Arabic, launched January 5, 2026, is built on a hybrid Mamba-Transformer architecture — a complete departure from the transformer-based approach that has dominated LLM development since 2017. The architectural choice is significant: Mamba-Transformer hybrids offer better performance on long-context tasks at lower computational cost, which matters for Arabic text that routinely involves complex morphological analysis across extended passages.
The benchmark results are striking. The 7B model scores an average of 71.47% on the Open Arabic LLM Leaderboard, surpassing all models up to approximately 10B parameters — including Qatar’s Fanar-1-9B and Saudi Arabia’s HUMAIN ALLaM 7B. The 34B model scores 75.36%, outperforming even 70B+ parameter systems including China’s Qwen2.5 72B and Meta’s Llama-3.3 70B.
Outperforming models that are twice your size on a fraction of the compute is not an incremental improvement. It is an architectural statement — and a commercially relevant one, since smaller, more efficient models are cheaper to deploy at scale.
The earlier Falcon-Arabic 7B model, released in May 2025, had already established the methodological approach: rather than training from scratch, TII adapted a strong multilingual foundation — Falcon 3-7B — and extended it with 32,000 Arabic-specific tokens to better capture morphology and dialectal variation. The model was trained on high-quality native Arabic corpora, not translated data. It excels in general knowledge, Arabic grammar, mathematical reasoning, and understanding the rich diversity of Arabic dialects.
In the Arabic LLM landscape, three main approaches exist: training from scratch, adapting multilingual models, or using models that natively support Arabic alongside other languages. Adapted and multilingual models have consistently outperformed others in both efficiency and capability. Falcon-H1 Arabic’s results suggest that the adaptation-and-architecture approach is currently the winning strategy.
What this means for the UAE’s AI strategy: TII’s model is not just a language tool. It is a demonstration that Gulf-based research institutions can produce frontier AI capable of outperforming American and Chinese models on a specific, commercially valuable task. That demonstration has geopolitical weight beyond the benchmark leaderboard.
ALLaM 34B — Saudi Arabia: Scale as Sovereign Mission
Saudi Arabia’s approach to Arabic AI is inseparable from its national AI strategy. ALLaM — Arabic Large Language Model — began as a government research project under SDAIA, Saudi Arabia’s data and AI authority, in 2023. When HUMAIN was established as a PIF company in May 2025, it formally took ownership of the ALLaM roadmap. The continuity from government research to national champion company is not accidental; it is the sequencing Saudi Arabia has used deliberately.
HUMAIN Chat, launched August 25, 2025, is powered by ALLaM 34B — described by HUMAIN as built for the more than 400 million Arabic speakers and 2 billion Muslims worldwide who have been underserved by generative AI. The framing is deliberate: this is not a regional product; it is a global one, addressed to a population that no other AI company has specifically designed for.
The technical ambition matches the rhetorical one. ALLaM 34B was built on the largest known dataset of Arabic language content, consisting of over 500 billion tokens, and refined with input from 600-plus domain specialists and 250 evaluators to ensure cultural authenticity. The dataset scale matters: Arabic AI’s core technical problem is data scarcity, and 500 billion tokens of curated Arabic text is a resource that most organizations cannot replicate.
ALLaM 34B has been independently verified by Cohere on the MMLU benchmark as the most advanced Arabic LLM built in the Arab world. Its features include real-time web search, speech input across multiple Arabic dialects, seamless bilingual switching between Arabic and English within the same conversation, and full compliance with Saudi Arabia’s Personal Data Protection Law (PDPL).
That last feature — PDPL compliance by design — reflects something important about ALLaM’s positioning. It is not a model deployed in Saudi Arabia; it is a model built in Saudi Arabia, for Saudi Arabia, with Saudi data regulations embedded from the beginning. The distinction matters for enterprise customers in regulated sectors.
What this means for Saudi Arabia’s AI strategy: HUMAIN’s stated ambition is not to build a better chatbot. It is to establish Saudi Arabia as the default provider of AI infrastructure for Arabic-speaking markets globally. ALLaM 34B is the model-layer expression of that ambition. The 1.9 GW of data centers planned by 2030 is the infrastructure-layer expression. Both are needed; neither is sufficient without the other.
Fanar 2.0 — Qatar: Depth Over Scale
Qatar’s approach is the most distinctive of the three — and the most misunderstood if viewed through the lens of parameter count or benchmark rankings alone.
Fanar was launched in December 2024 at the inaugural World Summit AI in Doha, developed by the Qatar Computing Research Institute (QCRI) at Hamad Bin Khalifa University. The original model was trained on 1.3 trillion tokens, of which 40% was Arabic language data — with particular attention to Qatari colloquial dialect and Islamic knowledge, domains that other models address poorly if at all.
Fanar 2.0, launched December 9, 2025, at the second World Summit AI in Doha, represents a qualitative leap. The model was built on 256 NVIDIA H100 GPUs with no dependency on external AI providers. The full Fanar 2.0 platform covers Arabic language, speech, vision, Islamic knowledge, classical poetry, translation, and agentic reasoning.
That list deserves attention. Most Arabic AI efforts focus on text generation and translation. Fanar 2.0 includes Fanar-Diwan for classical Arabic poetry generation, FanarShaheen for bilingual Arabic-English translation, Fanar-Sadiq for Islamic knowledge, and Oryx-IVU for Arabic-aware image and video understanding. Together, these components cover modalities that most Arabic AI efforts have not yet addressed.
Fanar 2.0 operates within a fully closed ecosystem to guarantee data privacy and protection, whilst delivering deep understanding of Arabic dialects and cultural terminology. And work has already commenced on Fanar 3.0, planned for December 2026 — a cadence of annual releases that mirrors the development pace of leading Western AI labs.
What this means for Qatar’s AI strategy: Fanar is not trying to win a benchmark leaderboard. It is trying to be the most culturally complete Arabic AI platform in existence. Fanar-Sadiq for Islamic knowledge — covering Quranic text, hadith, and jurisprudential reasoning — addresses a use case that affects more than 2 billion people globally, in a domain where a culturally misaligned AI response is not merely unhelpful but potentially harmful. That specialization is a strategic choice that no Western lab has made and no GCC competitor has replicated at the same depth.
The Real Competition: Not Each Other
A common framing of the Arabic LLM race positions Falcon, ALLaM, and Fanar as competitors. This is partially true at the benchmark level, but it misses the more important competitive dynamic.
The three models are building different capabilities for different use cases, and they are doing so on a timeline that is converging with — not trailing — global frontier AI development. The real competition is not UAE vs. Saudi Arabia vs. Qatar. It is the GCC vs. the default: continuing to rely on Western models that were not built for Arabic, deployed on infrastructure that is not sovereign, and governed by contracts that are not aligned with GCC regulatory frameworks.
The Arabic AI gap is not merely a language problem. It is a data inclusion problem with direct consequences for sectors like healthcare, education, and government services. An AI healthcare triage system that misunderstands Gulf Arabic dialect is not a minor inconvenience; it is a patient safety issue. An AI tutoring system that performs significantly worse in Arabic than English does not merely underperform; it widens educational inequality.
The aiXplain Arabic LLM Benchmark Report, released in June 2025, evaluated 12 LLMs including open and closed models like SILMA, Jais, and ALLaM across 11 real-world tasks such as question answering, reasoning, summarization, and translation. The results confirmed something that GCC practitioners already knew from operational experience: task-specific performance varies significantly across models, and the “best” Arabic LLM depends heavily on the use case. There is no single winner. There is a landscape of specialized capabilities.
What This Means for Companies Deploying AI in the Region
The Arabic LLM race has direct operational implications for any organization building AI-powered products or services in GCC markets.
Model selection is now a meaningful decision, not a default. Until recently, enterprise AI deployments in the GCC defaulted to GPT-4 or Claude for lack of alternatives. That is no longer the case. Falcon-H1 Arabic, ALLaM 34B, and Fanar 2.0 each offer capabilities in specific domains — dialect comprehension, cultural alignment, Islamic knowledge, regulatory compliance — that Western general-purpose models do not match. For regulated industries in particular, the question is not “can we use a Western model?” but “can we justify not using a sovereign one?”
Dialect support is not a nice-to-have. ALLaM 34B supports speech input across multiple Arabic dialects. Fanar 2.0 explicitly targets dialectal variation in both text and audio. The Arab region has 348 million internet users representing 70.2% of the total population of 496 million — and the majority of them communicate in regional dialects, not Modern Standard Arabic. A voice or chat interface that only handles MSA is not reaching most of the market.
Cultural alignment is measurable, not aspirational. The SalamahBench benchmark, introduced in 2026, evaluates the safety of Arabic language models across 8,170 prompts in 12 categories aligned with the MLCommons Safety Hazard Taxonomy. As safety and cultural alignment benchmarks mature, the performance gap between Western models and Arabic-native models will become easier to quantify — and harder to ignore in procurement decisions.
The sovereignty-performance trade-off is closing. The historical argument for using Western models was capability: GPT-4 simply performed better. Falcon-H1 Arabic’s benchmark results — outperforming Meta’s Llama 3.3 70B with a 34B model — demonstrate that this argument is weakening on the dimension that matters most: Arabic-language performance. As sovereign models reach performance parity on general tasks while maintaining domain-specific advantages, the case for defaulting to Western alternatives in GCC deployments becomes progressively harder to make.
Usetech perspective: In our experience working with enterprise clients across the UAE and Saudi Arabia, the decision to use a sovereign Arabic model versus a Western general-purpose model is rarely made on technical grounds alone. It involves regulatory alignment, data residency requirements, and — particularly in government and healthcare — the ability to demonstrate that the AI system understands the cultural context of the users it serves. Sovereign Arabic models increasingly win not just on compliance grounds, but on the trust that cultural alignment creates with end users. That trust is harder to quantify than a benchmark score, and more durable.
The Bigger Picture
The Arabic LLM race is not primarily a technology story. It is a story about who gets to participate in the AI economy on their own terms.
Arabic-speaking communities have been among the most underserved by mainstream generative AI — not because their needs are less important, but because the training data infrastructure of the AI industry was built around English. The GCC’s investment in sovereign Arabic models is, among other things, a correction of that asymmetry.
The correction is happening faster than most Western observers have noticed. In 2023, there were no production-ready Arabic LLMs with sovereign infrastructure behind them. By early 2026, there are three — each with distinct capabilities, each backed by sovereign capital, each on an annual development cadence that matches global frontier AI timelines.
The Arabic LLM race was always a GCC story. The rest of the world is only now beginning to read it.
Methodology note: This analysis aggregates benchmark data from the Open Arabic LLM Leaderboard, aiXplain’s Arabic LLM Benchmark Report (June 2025), and the SalamahBench safety evaluation framework; official model documentation from TII, HUMAIN/SDAIA, and QCRI; and public reporting from Middle East AI News, Arabian Business, Economy Middle East, and the Saudi Press Agency. Usetech perspectives reflect professional judgment based on enterprise AI deployments in the GCC and should be read as informed operational experience, not primary research. All figures are current as of May 2026.
