Wikimedia launches free AI vector database to challenge Big Tech dominance

1 minute read

Wikipedia

Up next

OpenAI models reproduce Indian caste stereotypes despite massive user base

Wikimedia Deutschland has launched a free vector database enabling developers to build generative AI applications using Wikidata’s 119 million open knowledge entries, marking the first time this data can be used directly for AI development.

The Embedding Project went live today at https://wd-vectordb.toolforge.org and translates Wikidata’s structured data into vectors that large language models can process through retrieval augmented generation. The technology supports searches in English, French and Arabic, with Spanish and Mandarin to follow by year end.

The database employs a hybrid search approach that combines vector search, keyword search, and descriptive queries, with built-in reranking to surface the most relevant results. Around 24,000 volunteers worldwide maintain and expand Wikidata monthly.

“We want to create an infrastructure that enables everyone to develop generative AI applications based on verifiable, free and open data,” says Lydia Pintscher, Portfolio Lead at Wikimedia Deutschland. “This is an important step toward a digital world in which technologies for the benefit of society are not a footnote but the norm.”

The project aims to reduce AI hallucinations by providing verified data sources, increasing transparency through traceable sourcing, and offering more current information than statically trained models. The codebase is available under an open licence.

Wikimedia Deutschland has developed the project since September 2024 in collaboration with DataStax, an IBM company that provides AI and data solutions, and Berlin-based Jina AI, which supplies the embedding system that transforms Wikidata into vectors. DataStax’s Astra DB vector database stores the data.

A free webinar on 9 October will demonstrate practical applications and usage tips for developers interested in the technology.

Leave a Reply Cancel reply

You May Also Like

AI myths.

The eight dangerous myths derailing modern AI governance

From the belief that bigger data is always better to the excuse…

Niusha Shafiabady
February 26, 2026

Deepfake videos.

Humans beat AI at spotting deepfake videos but fail entirely with photos

As artificial intelligence gets better at generating fake imagery, a new study…

George Hopkin
March 6, 2026

Data centres.

40 million lost days: The real ‘human cost’ of the race for digital capacity

As data centres scale to power the AI era, it’s not just…

Shane Moore
March 4, 2026

Sanctuary Making: Immigrant Families Reshaping Geographies of Deportability.

Grocery stores are new immigration ‘hot spots’ but communities fight back

As immigration enforcement reaches deep into everyday American life, once-safe business spaces…

George Hopkin
February 26, 2026

Supply chains.

The era of the cheap supply chain is over as AI takes the wheel

For decades, global trade was optimised purely for cost. Now, faced with…

Stefan Penthin
February 26, 2026

Online shopping.

The invisible data exchange fueling the artificial intelligence boom

Data’s actual market value remains completely hidden from the public. If regulators…

Laura Veldkamp
March 5, 2026

Spirituality.

A medical taboo: Why neurologists must start talking to patients about faith

Wading into questions of faith, purpose, and mortality is usually left to…

George Hopkin
March 10, 2026

Inflatable crane.

New $0.10 technique is about to democratise the soft robotics industry

Soft robots are increasingly being used for everything from delicate object handling…

George Hopkin
March 10, 2026

Reading a mouse's brain.

Mind-reading milestone lets scientists watch movies inside a mouse’s brain

It sounds like pure science fiction, but researchers have figured out how…

George Hopkin
March 10, 2026

Automotive industry.

High-tech automotive manufacturing revives high-stakes construction risk

As automotive plants evolve into high-voltage, automated ecosystems, the industry is…

Adam Moore
March 9, 2026