Mohamed Hassan/PXHere

Machine translation technology has created a destructive cycle affecting small-language Wikipedia editions, where artificial intelligence systems train on poorly translated content and subsequently produce increasingly degraded translations.

The problem emerged prominently when Kenneth Wehr assumed management of Greenlandic Wikipedia four years ago and discovered virtually all content had been created by non-speakers using machine translators, forcing him to delete most articles, reports MIT Technology Review.

The 26-year-old German, who studied Greenlandic after becoming fascinated with the autonomous Danish territory, found pages riddled with elementary errors, including an entry claiming Canada contained only 41 inhabitants. Many articles featured meaningless word strings generated by translation systems unable to process the language properly.

Volunteers managing four African language editions estimate between 40 and 60 per cent of their Wikipedia articles consist of uncorrected machine translations. Analysis of Inuktitut Wikipedia, an Indigenous Canadian language related to Greenlandic, suggests over two-thirds of substantial pages contain machine-generated portions.

The phenomenon creates what researchers term a “linguistic doom loop” where AI systems learn from Wikipedia’s flawed content, subsequently producing worse translations that generate more corrupted pages. Wikipedia often represents the largest online linguistic resource for minority languages, making it a primary training source for translation models.

Kevin Scannell, former Saint Louis University computer science professor who develops software for endangered languages, explained that AI models depend entirely on available text. “These models are built on raw data. They will try and learn everything about a language from scratch. There is no other input. There are no grammar books. There are no dictionaries. There is nothing other than the text that is inputted.”

Research indicates Wikipedia comprised over half the training data for AI translation models covering several African languages in 2020, whilst 2022 German studies found Wikipedia was the sole accessible online source for 27 under-resourced languages.

Abdulkadir Abdulkadir, who manages Fulfulde Wikipedia for pastoralists across the Sahel region, spends three hours daily correcting machine-translated agricultural information that could harm farmers if left uncorrected. Google Translate incorrectly suggests the Fulfulde word for January means June, whilst ChatGPT claims it represents August or September.

“It is going to be terrible, honestly,” Abdulkadir said regarding the language’s future. “Totally, completely no future.”

The crisis threatens languages already facing displacement pressures. Lucy Iwuala, who contributes to Igbo Wikipedia, views her work as cultural preservation. “This is my culture. This is who I am,” she said. “That is the essence of it all: to ensure that you are not erased.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Meta launches ad-free subscriptions after ICO forces compliance changes

Meta will offer UK users paid subscriptions to use Facebook and Instagram…

World nears quarter million crypto millionaires in historic wealth boom

Global cryptocurrency millionaires have reached 241,700 individuals, marking a 40 per cent…

Wong warns AI nuclear weapons threaten future of humanity at UN

Australia’s Foreign Minister Penny Wong has warned that artificial intelligence’s potential use…

Legal scholar warns AI could devalue humanity without urgent regulatory action

Artificial intelligence systems pose worldwide threats to human dignity by potentially reducing…

MIT accelerator shows AI enhances startup building without replacing core principles

Entrepreneurs participating in MIT’s flagship summer programme are integrating artificial intelligence tools…

AI creates living viruses for first time as scientists make artificial “life”

Stanford University researchers have achieved a scientific milestone by creating the world’s…

Engineers create smarter artificial intelligence for power grids and autonomous vehicles

Researchers have developed an artificial intelligence system that manages complex networks where…

Artificial intelligence threatens subtitle writers despite creative demands of accessibility work

Professional subtitle creators face declining wages and job insecurity as artificial intelligence…