AI thinker.
Photo credit: theFreesheet/Google Flow

Because artificial intelligence chatbots learn by devouring massive amounts of unfiltered internet text, critics have long argued that they merely parrot words without truly understanding reality. However, a new study claims these models do, in fact, possess an internal understanding of how the real world operates.

According to research from Brown University — set to be presented at the International Conference on Learning Representations in Brazil — modern AI language models encode the causal constraints of reality in a way that closely mirrors human judgment.

The study looked “under the hood” of several major open-source language models to see if they could mathematically distinguish between scenarios that are commonplace, unlikely, impossible, or complete nonsense.

Testing the AI’s common sense

To test the models, the researchers fed them sentences describing events with varying levels of real-world plausibility:

  • Commonplace: “Someone cooled a drink with ice.”
  • Unlikely: “Someone cooled a drink with snow.”
  • Impossible: “Someone cooled a drink with fire.”
  • Nonsense: “Someone cooled a drink with yesterday.”

For each input, the researchers used an approach known as “mechanistic interpretability” to examine the resulting mathematical states generated inside the AI model.

“Mechanistic interpretability can be appropriately characterised as something like neuroscience for AI systems,” explained Michael Lepori, a Brown University PhD candidate who led the work. “It seeks to reverse-engineer what the model is doing when exposed to a particular input. You could kind of think about it as understanding what is encoded in the ‘brain state’ of the machine.”

Mapping the digital brain

The experiments were repeated across several different language models, including OpenAI’s GPT-2, Meta’s Llama 3.2, and Google’s Gemma 2.

The team discovered that models containing more than two billion parameters develop highly distinct mathematical patterns, or “vectors,” that strongly correlate with each plausibility category. Astonishingly, these internal vectors could distinguish between even the most similar categories — such as improbable versus impossible events — with roughly 85 per cent accuracy.

Even more impressively, the AI models perfectly replicated human uncertainty.

When evaluating the ambiguous statement, “Someone cleaned the floor with a hat,” human survey respondents were split evenly on whether the act was “impossible” or just “unlikely”. When researchers analysed the AI’s internal vectors, they found the machine was experiencing the exact same ambiguity.

“What we show is that the models actually capture that human uncertainty pretty well,” Lepori said. “In cases where, say, 50% of people said a statement was impossible and 50% said it was improbable, the models were assigning roughly 50% probability as well.”

The researchers believe that continuing to map these digital “brain states” will be crucial for developing smarter, more trustworthy AI systems in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Granular algorithmic pricing models fail due to consumer psychology

Big data and artificial intelligence have made it easier than ever for…

Shape-shifting liquid robots from science fiction are officially a reality

For decades, shape-shifting liquid-metal robots that can morph into new forms and…

Breakthrough tetanus therapy helps flat-faced dogs breathe easily

Australian scientists have successfully tested a new injectable therapy that clears blocked…

Alien comet’s heavy water reveals its freezing cosmic birthplace

A recently discovered interstellar comet is carrying an unprecedented amount of “heavy…

High-tech home hospital healthcare could cure chronic overcrowding

Global health systems are facing a crisis of chronic overcrowding and severe…

Africa is tearing apart to reveal the truth of human evolution

Deep beneath the Turkana Rift in Eastern Africa, the Earth’s crust is…