AI chatbots secretly understand how the real world actually works

Because artificial intelligence chatbots learn by devouring massive amounts of unfiltered internet text, critics have long argued that they merely parrot words without truly understanding reality. However, a new study claims these models do, in fact, possess an internal understanding of how the real world operates.

According to research from Brown University — set to be presented at the International Conference on Learning Representations in Brazil — modern AI language models encode the causal constraints of reality in a way that closely mirrors human judgment.

The study looked “under the hood” of several major open-source language models to see if they could mathematically distinguish between scenarios that are commonplace, unlikely, impossible, or complete nonsense.

Testing the AI’s common sense

To test the models, the researchers fed them sentences describing events with varying levels of real-world plausibility:

Commonplace: “Someone cooled a drink with ice.”
Unlikely: “Someone cooled a drink with snow.”
Impossible: “Someone cooled a drink with fire.”
Nonsense: “Someone cooled a drink with yesterday.”

For each input, the researchers used an approach known as “mechanistic interpretability” to examine the resulting mathematical states generated inside the AI model.

“Mechanistic interpretability can be appropriately characterised as something like neuroscience for AI systems,” explained Michael Lepori, a Brown University PhD candidate who led the work. “It seeks to reverse-engineer what the model is doing when exposed to a particular input. You could kind of think about it as understanding what is encoded in the ‘brain state’ of the machine.”

Mapping the digital brain

The experiments were repeated across several different language models, including OpenAI’s GPT-2, Meta’s Llama 3.2, and Google’s Gemma 2.

The team discovered that models containing more than two billion parameters develop highly distinct mathematical patterns, or “vectors,” that strongly correlate with each plausibility category. Astonishingly, these internal vectors could distinguish between even the most similar categories — such as improbable versus impossible events — with roughly 85 per cent accuracy.

Even more impressively, the AI models perfectly replicated human uncertainty.

When evaluating the ambiguous statement, “Someone cleaned the floor with a hat,” human survey respondents were split evenly on whether the act was “impossible” or just “unlikely”. When researchers analysed the AI’s internal vectors, they found the machine was experiencing the exact same ambiguity.

“What we show is that the models actually capture that human uncertainty pretty well,” Lepori said. “In cases where, say, 50% of people said a statement was impossible and 50% said it was improbable, the models were assigning roughly 50% probability as well.”

The researchers believe that continuing to map these digital “brain states” will be crucial for developing smarter, more trustworthy AI systems in the future.

AI chatbots secretly understand how the real world actually works

Up next

Author

Testing the AI’s common sense

Mapping the digital brain

Leave a Reply Cancel reply

You May Also Like