Meta’s Fundamental AI Research (FAIR) team is introducing Omnilingual ASR, a suite of models that provide automatic speech recognition for over 1,600 languages, including 500 low-resource languages that have never been transcribed by AI before.
The company says most current ASR systems focus on a limited set of high-resource languages, which exacerbates the digital divide. This new system is a significant step toward delivering a truly universal transcription system.
Omnilingual ASR introduces an “LLM-ASR” model that uses an LLM-style transformer decoder. This system achieves state-of-the-art performance, with character error rates below 10 per cent for 78 per cent of the languages it covers.
New languages with minimal data
A key feature of the new framework is its ability to learn new languages with minimal data. Meta says this shifts the paradigm for adding languages, as users can provide just a “handful” of paired audio-text samples to get usable transcription quality. This in-context learning capability removes the need for large-scale training data or access to high-end compute.
Alongside the models, Meta is open-sourcing Omnilingual wav2vec 2.0, a new 7B parameter self-supervised speech representation model, to be used for other speech-related tasks. The company is also releasing the Omnilingual ASR Corpus, a collection of transcribed speech in 350 underserved languages, curated in collaboration with global partners, including Mozilla Foundation’s Common Voice.
The models are being released under a permissive Apache 2.0 license in a range of sizes, from lightweight 300M versions for on-device use to the 7B models that offer top-tier accuracy.