Plant specimens.
UNC research team check a plant specimen at the UNC Herbarium. Photo credit: Shanna Oberreiter

Billions of plant specimens currently gathering dust in museum cabinets could soon be accessible to scientists worldwide after researchers successfully used artificial intelligence to automate the digitisation of natural history collections.

A new study from the University of North Carolina at Chapel Hill demonstrates that large language models (LLMs) can determine the original collection locations of plant specimens with near-human accuracy, solving a manual bottleneck that has kept vast amounts of ecological data offline.

The research team found that AI tools could complete this “georeferencing” process with an error margin of less than 10 kilometres whilst operating significantly faster and more cost-effectively than traditional methods.

Natural history collections are vital for tracking biodiversity loss, understanding species movement under climate change and analysing ecosystem shifts. However, of the estimated two-to-three billion herbarium specimens worldwide, only a small fraction have been digitised.

Without digital records and precise spatial data, these physical archives remain essentially useless for modern large-scale ecological research. Traditional georeferencing relies on manual interpretation, specialised software or multiple rounds of expert review — a process that has proven too slow and expensive to handle the global backlog.

Biggest bottlenecks

“Our study explores how large language models can take on one of the biggest bottlenecks in digitising plant collections,” said Yuyang Xie, first author and postdoctoral researcher in the Department of Biology at UNC. “We are pioneering the use of these tools for georeferencing, a breakthrough that will accelerate the digitisation of plant specimens and unlock new possibilities for ecological research.”

The study set out to answer whether AI could automate one of the most time-consuming steps in digitisation. The results confirmed that LLMs could outperform existing methods in terms of accuracy, efficiency, and scalability.

By accurately interpreting location descriptions from specimen labels, the technology allows researchers to rapidly process millions of records that would otherwise take decades to digitise manually.

“Recent advances in LLMs can potentially transform the georeferencing process, making it faster and more accurate,” said Xiao Feng, corresponding author and assistant professor in the Department of Biology at UNC. “This gives researchers unprecedented opportunities to advance our understanding of global biodiversity distributions.”

“This technology allows us to unlock millions of records that are currently sitting in cabinets,” said Xie. “With the power of LLMs, we can rapidly digitise plant specimen data that will be critical for addressing global environmental challenges.”

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Scientists find ‘brake’ in the brain that stops us starting stressful tasks

We all know the feeling: staring at a tax return or a…

Bosses should fund your knitting: Hobbies can boost workplace creativity

New Year’s resolutions to take up painting, coding or gardening might do…

‘Super agers’ win the genetic lottery twice to keep their memories young

People in their 80s who retain the sharp memories of those decades…

World’s first graviton detector hunts ‘impossible’ ghost particle of gravity

Physicists are building a machine to solve the biggest problem in science…