AI thinker.
Photo credit: theFreesheet/Google Flow

Because artificial intelligence chatbots learn by devouring massive amounts of unfiltered internet text, critics have long argued that they merely parrot words without truly understanding reality. However, a new study claims these models do, in fact, possess an internal understanding of how the real world operates.

According to research from Brown University — set to be presented at the International Conference on Learning Representations in Brazil — modern AI language models encode the causal constraints of reality in a way that closely mirrors human judgment.

The study looked “under the hood” of several major open-source language models to see if they could mathematically distinguish between scenarios that are commonplace, unlikely, impossible, or complete nonsense.

Testing the AI’s common sense

To test the models, the researchers fed them sentences describing events with varying levels of real-world plausibility:

  • Commonplace: “Someone cooled a drink with ice.”
  • Unlikely: “Someone cooled a drink with snow.”
  • Impossible: “Someone cooled a drink with fire.”
  • Nonsense: “Someone cooled a drink with yesterday.”

For each input, the researchers used an approach known as “mechanistic interpretability” to examine the resulting mathematical states generated inside the AI model.

“Mechanistic interpretability can be appropriately characterised as something like neuroscience for AI systems,” explained Michael Lepori, a Brown University PhD candidate who led the work. “It seeks to reverse-engineer what the model is doing when exposed to a particular input. You could kind of think about it as understanding what is encoded in the ‘brain state’ of the machine.”

Mapping the digital brain

The experiments were repeated across several different language models, including OpenAI’s GPT-2, Meta’s Llama 3.2, and Google’s Gemma 2.

The team discovered that models containing more than two billion parameters develop highly distinct mathematical patterns, or “vectors,” that strongly correlate with each plausibility category. Astonishingly, these internal vectors could distinguish between even the most similar categories — such as improbable versus impossible events — with roughly 85 per cent accuracy.

Even more impressively, the AI models perfectly replicated human uncertainty.

When evaluating the ambiguous statement, “Someone cleaned the floor with a hat,” human survey respondents were split evenly on whether the act was “impossible” or just “unlikely”. When researchers analysed the AI’s internal vectors, they found the machine was experiencing the exact same ambiguity.

“What we show is that the models actually capture that human uncertainty pretty well,” Lepori said. “In cases where, say, 50% of people said a statement was impossible and 50% said it was improbable, the models were assigning roughly 50% probability as well.”

The researchers believe that continuing to map these digital “brain states” will be crucial for developing smarter, more trustworthy AI systems in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

James Webb telescope reveals surprise origins of rare planetary odd couple

A normally “lonely” hot Jupiter sharing its immediate orbital space with a…

Attention economy can confuse as a result of missing scientific details

Science communication optimized for the attention economy often leads readers to incorrect…

Alaska megatsunami reveals seismic ‘calling card’ for earlier disaster detection

Scientists have identified a distinctive geological “ringing” that could provide an early…

Solar activity hits ‘transition boundary’ as space junk fall accelerates

Space debris and defunct satellites descend toward Earth significantly faster once solar…

Single dose of psilocybin triggers lasting anatomical brain changes

A single high dose of psilocybin causes likely anatomical changes in the…

Brexit milestones triggered persistent financial volatility across EU markets

Brexit functioned as a prolonged sequence of uncertainty that sent waves of…