AI models master complex multitasking by learning to ‘talk’ to themselves

Artificial intelligence systems can significantly improve their ability to tackle unfamiliar problems by mimicking the human trait of “inner speech”, according to research published in Neural Computation.

Scientists from the Okinawa Institute of Science and Technology (OIST) have demonstrated that AI models perform better when they combine self-directed “mumbling” with short-term memory structures. This architecture allows the system to organise information and “generalise” across different tasks, rather than relying solely on specific training examples.

The study addresses a persistent challenge in machine learning: “content agnostic information processing”, or the ability to solve problems without having encountered the exact scenario previously.

“This study highlights the importance of self-interactions in how we learn,” says Dr. Jeffrey Queißer, staff scientist within OIST’s Cognitive Neurorobotics Research Unit. “By structuring training data in a way that teaches our system to talk to itself, we show that learning is shaped not only by the architecture of our AI systems, but by the interaction dynamics embedded within our training procedures.”

Memory slots and mumbling targets

The researchers focused on the AI models’ memory architecture, specifically examining how “working memory” aids task generalisation. To test this, they simulated tasks of varying difficulty, such as reversing the order of patterns or regenerating them entirely.

The study found that systems equipped with multiple “working memory slots” — temporary containers for holding pieces of information — outperformed standard models on these tricky cognitive challenges.

However, the real breakthrough came when the team added “self-mumbling targets”. By instructing the system to effectively talk to itself a set number of times before acting, the researchers observed significantly better performance, particularly during multitasking scenarios or tasks involving many steps.

Efficiency with sparse data

Crucially for enterprise applications, the method reduces the reliance on massive datasets. The “inner speech” architecture enables models to learn effectively even with scarce data, offering a potential solution to the high compute and data costs of training large models.

“Our combined system is particularly exciting because it can work with sparse data instead of the extensive data sets usually required to train such models for generalisation,” says Dr Queißer. “It provides a complementary, lightweight alternative.”

The research adopts an interdisciplinary approach, blending developmental neuroscience and psychology with machine learning. The team argues that to mirror human developmental learning, AI systems need to account for external factors found in “messier” real-world environments.

The researchers plan to test the approach in more dynamic settings. Dr. Queißer notes that this knowledge could be applied to develop household or agricultural robots capable of functioning in complex, noisy worlds where rapid task switching is essential.