← Back to About

Our Story

From automation obsession to sovereign Moroccan AI.

One person. Massive data pipelines. Custom models. This is the timeline of how Sawalni — the first large language model built for Moroccan and North African languages — came into existence.

The Seeds (2000s–2022)

Childhood

First A* Algorithm

Omar implements his first pathfinding algorithm. Revelation: automated decision-making emerging from a few lines of code.

2012–2017

Early NLP Experiments

Prompted by the voice assistant craze — and the awkwardness of switching to English just to talk to Cortana — Omar experiments with SpaCy for Moroccan Arabic. The time was not right.

November 2022

The ChatGPT Moment

ChatGPT launches. Omar spends a week sleeping very little, exploring its understanding of Moroccan Arabic. Projects scenarios where non-English speakers risk being left behind.

"LLMs need data. Lots of it. And I knew how to get data" — years of RPA and web scraping become a superpower.

Data & Language ID (Jan–Jun 2023)

Early 2023

The Data Pipeline

Implements massive data-gathering using Monitoro. Quickly discovers: filtering Moroccan Arabic from the web requires its own AI.

5B+

tokens collected

Spring 2023

Gherbal — "The Sieve"

Creates a language identification model for ~50 languages. Beats SOTA for Moroccan Arabic. Later acknowledged in HuggingFace's Fineweb2 paper.

First application of the Crescendo method: small seed → bootstrap → iterate → scale. Each model enables the next.

2023

Sawtone & Daktilo

Tackles the Darija transliteration problem: no two people write the same word the same way. Builds phonetic embeddings (Sawtone) and an LLM-based transliterator (Daktilo).

2023

Tarjamli — Translation Pipeline

Builds a complete translate → score → transliterate pipeline using NLLB-200 as seed. Creates instruction data at scale for the first time.

The First Moroccan LLM (July 2023)

July 2023

Sawalni v1 — First in History

Only 8 months after ChatGPT, the first Moroccan LLM is born. Extremely basic — awkward conversations, funny recipe inventions — but unmistakably alive.

8 months

from ChatGPT to first Moroccan LLM

July 2023

Second Demo

A second early demo showing Sawalni responding in Darija Arabizi.

Public Momentum (2024)

Early 2024

Tarjamli.ma Launch

The first translation app for Moroccans — matching Google Translate UX while supporting Darija in Arabizi for the first time.

Spring 2024

Academic Circuit

Presents at the International Conference of Moroccan Arabic (University of Navarra, Spain). Mentors at an AI hackathon at 1337 coding school.

Summer 2024

National TV Coverage

Sawalni v2 featured on Moroccan national television. Momentum building, but still far from a shipping product.

The Quality Leap (2024–2025)

2024–2025

Sawalni v4 — The Personality Model

First version to supersede Tarjamli and Daktilo. Testers felt attachment to its personality. But poor tool-calling and inconsistent instruction following demand a new approach.

2025

Custom Tokenizer & Knowledge Distillation

Technical breakthroughs: instruction residuals enable affordable pretraining, custom tokenizer delivers huge quality boost.

"LLMs are simply big balls of math. The same algebra I studied in secondary school, except now vectors map into something almost palpable."

2025

Wikilangs

Sponsored by Featherless, Wikilangs bootstraps AI basics for 300+ languages — so future Sawalni-like projects have a head start.

300+

languages via Wikilangs

The Current Era (2026)

2026

Sawalni v5/v6 — Sovereign, Agentic, Multilingual

Full language support across Darija, Hassaniya, Tachelhit, Tarifit, Central Atlas Tamazight, MSA, French, English, Spanish and more. Agentic with 200+ tool types. Live at sawalni.com.

11+

language varieties supported

2026

Published Research

Paper on phonetic embeddings published (doi: 10.14746/linpo.2025.67.1.8). Gherbal work presented at TIM'24, University Hassan II.

The Crescendo Method

Each model enables the next. Small seed → bootstrap → iterate → scale.

Gherbal
clean data
Sawtone
Daktilo
Tarjamli
instruction data
Sawalni

What's Next

  • Scale the Sawalni formula to a much larger model → frontier-grade performance.
  • Graduate from technology demonstrator to daily driver for millions of Moroccans.
  • Eliminate “translationese” — currently limited by single-annotator scale.
  • Voice support exploration.
  • Amazigh language quality improvement — more native annotators needed.
Try Sawalni Now →