Wikipedia

Wikimedia Deutschland has launched a free vector database enabling developers to build generative AI applications using Wikidata’s 119 million open knowledge entries, marking the first time this data can be used directly for AI development.

The Embedding Project went live today at https://wd-vectordb.toolforge.org and translates Wikidata’s structured data into vectors that large language models can process through retrieval augmented generation. The technology supports searches in English, French and Arabic, with Spanish and Mandarin to follow by year end.

The database employs a hybrid search approach that combines vector search, keyword search, and descriptive queries, with built-in reranking to surface the most relevant results. Around 24,000 volunteers worldwide maintain and expand Wikidata monthly.

“We want to create an infrastructure that enables everyone to develop generative AI applications based on verifiable, free and open data,” says Lydia Pintscher, Portfolio Lead at Wikimedia Deutschland. “This is an important step toward a digital world in which technologies for the benefit of society are not a footnote but the norm.”

The project aims to reduce AI hallucinations by providing verified data sources, increasing transparency through traceable sourcing, and offering more current information than statically trained models. The codebase is available under an open licence.

Wikimedia Deutschland has developed the project since September 2024 in collaboration with DataStax, an IBM company that provides AI and data solutions, and Berlin-based Jina AI, which supplies the embedding system that transforms Wikidata into vectors. DataStax’s Astra DB vector database stores the data.

A free webinar on 9 October will demonstrate practical applications and usage tips for developers interested in the technology.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Political misinformation key reason for US divorces and breakups, study finds

Political misinformation or disinformation was the key reason for some US couples’…

Meta launches ad-free subscriptions after ICO forces compliance changes

Meta will offer UK users paid subscriptions to use Facebook and Instagram…

Mistral targets enterprise data as public AI training resources dry up

Europe’s leading artificial intelligence startup Mistral AI is turning to proprietary enterprise…

Anthropic’s Claude Sonnet 4.5 detects testing scenarios, raising evaluation concerns

Anthropic’s latest AI model recognised it was being tested during safety evaluations,…

Film union condemns AI actor as threat to human performers’ livelihoods

SAG-AFTRA has condemned AI-generated performer Tilly Norwood as a synthetic character trained…

Majority of TikTok health videos spread medical misinformation to parents

Most medical and parenting videos shared on TikTok by non-medical professionals contain…

World nears quarter million crypto millionaires in historic wealth boom

Global cryptocurrency millionaires have reached 241,700 individuals, marking a 40 per cent…

Wong warns AI nuclear weapons threaten future of humanity at UN

Australia’s Foreign Minister Penny Wong has warned that artificial intelligence’s potential use…