Plant specimens.
UNC research team check a plant specimen at the UNC Herbarium. Photo credit: Shanna Oberreiter

Billions of plant specimens currently gathering dust in museum cabinets could soon be accessible to scientists worldwide after researchers successfully used artificial intelligence to automate the digitisation of natural history collections.

A new study from the University of North Carolina at Chapel Hill demonstrates that large language models (LLMs) can determine the original collection locations of plant specimens with near-human accuracy, solving a manual bottleneck that has kept vast amounts of ecological data offline.

The research team found that AI tools could complete this “georeferencing” process with an error margin of less than 10 kilometres whilst operating significantly faster and more cost-effectively than traditional methods.

Natural history collections are vital for tracking biodiversity loss, understanding species movement under climate change and analysing ecosystem shifts. However, of the estimated two-to-three billion herbarium specimens worldwide, only a small fraction have been digitised.

Without digital records and precise spatial data, these physical archives remain essentially useless for modern large-scale ecological research. Traditional georeferencing relies on manual interpretation, specialised software or multiple rounds of expert review — a process that has proven too slow and expensive to handle the global backlog.

Biggest bottlenecks

“Our study explores how large language models can take on one of the biggest bottlenecks in digitising plant collections,” said Yuyang Xie, first author and postdoctoral researcher in the Department of Biology at UNC. “We are pioneering the use of these tools for georeferencing, a breakthrough that will accelerate the digitisation of plant specimens and unlock new possibilities for ecological research.”

The study set out to answer whether AI could automate one of the most time-consuming steps in digitisation. The results confirmed that LLMs could outperform existing methods in terms of accuracy, efficiency, and scalability.

By accurately interpreting location descriptions from specimen labels, the technology allows researchers to rapidly process millions of records that would otherwise take decades to digitise manually.

“Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate,” said Xiao Feng, corresponding author and assistant professor in the Department of Biology at UNC. “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”

“This technology allows us to unlock millions of records that are currently sitting in cabinets,” said Xie. “With the power of LLMs, we can rapidly digitise plant specimen data that will be critical for addressing global environmental challenges.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Journalism schools lack consistent AI strategy as scattershot policies confuse

Artificial intelligence is becoming deeply embedded in journalistic workflows, yet new research…

AI uses rapid facial ageing to predict cancer survival chances

When battling cancer, the speed at which your face physically ages could…

Lower-income nations lead the world in digital health literacy

It is a common assumption that national wealth automatically translates into stronger…

AI chatbots lose up to 30 per cent accuracy when trained to be friendly

Training chatbots to sound warmer and more empathetic makes them significantly less…

Your AI chatbot addiction is a deliberate corporate design, exploiting loneliness

Millions of people are developing severe, life-altering addictions to artificial intelligence chatbots…

AI ‘photo booth’ reads the faces of lab mice to detect their hidden pain

Assessing pain in laboratory mice is notoriously difficult, often relying on subjective…