Omnilingual ASR
Photo credit: Meta

Meta’s Fundamental AI Research (FAIR) team is introducing Omnilingual ASR, a suite of models that provide automatic speech recognition for over 1,600 languages, including 500 low-resource languages that have never been transcribed by AI before.

The company says most current ASR systems focus on a limited set of high-resource languages, which exacerbates the digital divide. This new system is a significant step toward delivering a truly universal transcription system.

Omnilingual ASR introduces an “LLM-ASR” model that uses an LLM-style transformer decoder. This system achieves state-of-the-art performance, with character error rates below 10 per cent for 78 per cent of the languages it covers.

New languages with minimal data

A key feature of the new framework is its ability to learn new languages with minimal data. Meta says this shifts the paradigm for adding languages, as users can provide just a “handful” of paired audio-text samples to get usable transcription quality. This in-context learning capability removes the need for large-scale training data or access to high-end compute.

Alongside the models, Meta is open-sourcing Omnilingual wav2vec 2.0, a new 7B parameter self-supervised speech representation model, to be used for other speech-related tasks. The company is also releasing the Omnilingual ASR Corpus, a collection of transcribed speech in 350 underserved languages, curated in collaboration with global partners, including Mozilla Foundation’s Common Voice.

The models are being released under a permissive Apache 2.0 license in a range of sizes, from lightweight 300M versions for on-device use to the 7B models that offer top-tier accuracy.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Journalism schools lack consistent AI strategy as scattershot policies confuse

Artificial intelligence is becoming deeply embedded in journalistic workflows, yet new research…

AI uses rapid facial ageing to predict cancer survival chances

When battling cancer, the speed at which your face physically ages could…

Lower-income nations lead the world in digital health literacy

It is a common assumption that national wealth automatically translates into stronger…

AI chatbots lose up to 30 per cent accuracy when trained to be friendly

Training chatbots to sound warmer and more empathetic makes them significantly less…

AI ‘photo booth’ reads the faces of lab mice to detect their hidden pain

Assessing pain in laboratory mice is notoriously difficult, often relying on subjective…

Your AI chatbot addiction is a deliberate corporate design, exploiting loneliness

Millions of people are developing severe, life-altering addictions to artificial intelligence chatbots…