Wikipedia

Wikimedia Deutschland has launched a free vector database enabling developers to build generative AI applications using Wikidata’s 119 million open knowledge entries, marking the first time this data can be used directly for AI development.

The Embedding Project went live today at https://wd-vectordb.toolforge.org and translates Wikidata’s structured data into vectors that large language models can process through retrieval augmented generation. The technology supports searches in English, French and Arabic, with Spanish and Mandarin to follow by year end.

The database employs a hybrid search approach that combines vector search, keyword search, and descriptive queries, with built-in reranking to surface the most relevant results. Around 24,000 volunteers worldwide maintain and expand Wikidata monthly.

“We want to create an infrastructure that enables everyone to develop generative AI applications based on verifiable, free and open data,” says Lydia Pintscher, Portfolio Lead at Wikimedia Deutschland. “This is an important step toward a digital world in which technologies for the benefit of society are not a footnote but the norm.”

The project aims to reduce AI hallucinations by providing verified data sources, increasing transparency through traceable sourcing, and offering more current information than statically trained models. The codebase is available under an open licence.

Wikimedia Deutschland has developed the project since September 2024 in collaboration with DataStax, an IBM company that provides AI and data solutions, and Berlin-based Jina AI, which supplies the embedding system that transforms Wikidata into vectors. DataStax’s Astra DB vector database stores the data.

A free webinar on 9 October will demonstrate practical applications and usage tips for developers interested in the technology.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Employees happiest with ‘moderate’ AI as excessive automation triggers anxiety

Implementing artificial intelligence in the workplace boosts employee morale — but only…

Forced office returns risk widening Europe’s regional inequality gap

Corporate mandates forcing staff back to desks threaten to reverse work-life balance…

Ambient AI restores eye contact to medicine by slashing clinical burnout

Ambient artificial intelligence is restoring the human connection to medicine by liberating…

‘Breathing’ robots transmit fear through touch alone as humans catch panic

Humans can “catch” fear from machines, according to new research, revealing that…

“Parasocial” crowned Cambridge Word of the Year as fans fall for AI chatbots

The rise of one-sided emotional bonds with artificial intelligence has driven Cambridge…

Net zero transition brings ‘unknown risks’ as workplace illness costs UK £22.9bn

The Health and Safety Executive (HSE) has warned that the transition to…