Plant specimens.
UNC research team check a plant specimen at the UNC Herbarium. Photo credit: Shanna Oberreiter

Billions of plant specimens currently gathering dust in museum cabinets could soon be accessible to scientists worldwide after researchers successfully used artificial intelligence to automate the digitisation of natural history collections.

A new study from the University of North Carolina at Chapel Hill demonstrates that large language models (LLMs) can determine the original collection locations of plant specimens with near-human accuracy, solving a manual bottleneck that has kept vast amounts of ecological data offline.

The research team found that AI tools could complete this “georeferencing” process with an error margin of less than 10 kilometres whilst operating significantly faster and more cost-effectively than traditional methods.

Natural history collections are vital for tracking biodiversity loss, understanding species movement under climate change and analysing ecosystem shifts. However, of the estimated two-to-three billion herbarium specimens worldwide, only a small fraction have been digitised.

Without digital records and precise spatial data, these physical archives remain essentially useless for modern large-scale ecological research. Traditional georeferencing relies on manual interpretation, specialised software or multiple rounds of expert review — a process that has proven too slow and expensive to handle the global backlog.

Biggest bottlenecks

“Our study explores how large language models can take on one of the biggest bottlenecks in digitising plant collections,” said Yuyang Xie, first author and postdoctoral researcher in the Department of Biology at UNC. “We are pioneering the use of these tools for georeferencing, a breakthrough that will accelerate the digitisation of plant specimens and unlock new possibilities for ecological research.”

The study set out to answer whether AI could automate one of the most time-consuming steps in digitisation. The results confirmed that LLMs could outperform existing methods in terms of accuracy, efficiency, and scalability.

By accurately interpreting location descriptions from specimen labels, the technology allows researchers to rapidly process millions of records that would otherwise take decades to digitise manually.

“Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate,” said Xiao Feng, corresponding author and assistant professor in the Department of Biology at UNC. “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”

“This technology allows us to unlock millions of records that are currently sitting in cabinets,” said Xie. “With the power of LLMs, we can rapidly digitise plant specimen data that will be critical for addressing global environmental challenges.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Super-intelligent AI could ‘play dumb’ to trick evaluators and evade controls

The dream of an AI-integrated society could turn into a nightmare if…

Universities quietly deploying GenAI to ‘game’ £2bn research funding system

UK universities are widely using generative AI to prepare submissions for the…

AI guardrails defeated by poetry as ‘smarter’ models prove most gullible

The world’s most advanced artificial intelligence systems are being easily manipulated into…

Researchers hijack X feed with ad blocker tech to cool political tempers

Scientists have successfully intercepted and reshaped live social media feeds using ad-blocker-style…

Aggressive dogs mellow on cannabis as owners turn to pot for pets

Aggressive dogs are becoming significantly less hostile after taking cannabis-derived supplements, according…

Doing good buys forgiveness as CSR becomes ‘insurance’ against layoffs

Companies planning to slash jobs or freeze pay should start saving the…