AIs working on chunks of information
Photo credit: theFreesheet/Google ImageFX

Researchers from Mila, McGill University and Microsoft Research have solved a significant bottleneck in training AI reasoning models, cutting computational costs by three-quarters through a technique called Markovian Thinking that breaks reasoning into manageable chunks rather than processing everything at once.

The breakthrough addresses an expensive problem. When AI models reason through complex issues, they typically build up enormous chains of thought that can stretch to tens of thousands of tokens. Every new token forces the model to reprocess everything that came before, creating quadratic computational costs that spiral out of control.

Markovian Thinking takes a different approach by restructuring the reinforcement learning environment so the AI maintains a constant-size state regardless of how long it thinks. The researchers instantiated this paradigm with Delethink, which organises reasoning into fixed chunks of 8,000 tokens. At each boundary, the system resets, and the model must write a summary of its progress to carry forward. Think of it as a student solving an extended maths problem by working through it page by page, jotting down key findings at the end of each page rather than constantly rereading from the beginning.

Striking results

The results are striking. A 1.5 billion parameter model trained with Delethink can reason through 24,000 tokens whilst only processing 8,000 at a time, matching or beating traditional methods that process all 24,000 tokens continuously. The computational savings prove substantial: training a model to handle 94,000-token reasoning chains requires 27 months of H100 GPU time using standard approaches, compared to just seven months with Delethink.

The system keeps improving where traditional methods plateau. Researchers pushed one model to reason through 96,000 tokens, reaching 49 per cent accuracy on 2024’s notoriously difficult AIME mathematics competition with solutions averaging 36,000 tokens long.

Perhaps most surprisingly, existing AI models already know how to think this way. Analysis shows reasoning models from 1.5 billion to 120 billion parameters naturally produce these Markovian traces without special training. A 120 billion parameter model demonstrated robust Markovian thinking across PhD-level questions, coding challenges, mathematics competitions and crossword puzzles.

The research, led by Milad Aghajohari, Kamran Chitsaz and Amirhossein Kazemnejad, demonstrates “that decoupling thinking length from context size can, in principle, let next-generation reasoning models think for millions of tokens”. The researchers describe the reinforcement learning environment, often treated as fixed, as “a powerful lever for progress”.

The technique works alongside other efficiency methods, making it immediately practical for existing AI infrastructure without requiring architectural changes.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Political misinformation key reason for US divorces and breakups, study finds

Political misinformation or disinformation was the key reason for some US couples’…

Pinterest launches user controls to reduce AI-generated content in feeds

Pinterest has introduced new controls allowing users to adjust the amount of…

Meta launches ad-free subscriptions after ICO forces compliance changes

Meta will offer UK users paid subscriptions to use Facebook and Instagram…

Wikimedia launches free AI vector database to challenge Big Tech dominance

Wikimedia Deutschland has launched a free vector database enabling developers to build…

Film union condemns AI actor as threat to human performers’ livelihoods

SAG-AFTRA has condemned AI-generated performer Tilly Norwood as a synthetic character trained…

Mistral targets enterprise data as public AI training resources dry up

Europe’s leading artificial intelligence startup Mistral AI is turning to proprietary enterprise…

Wong warns AI nuclear weapons threaten future of humanity at UN

Australia’s Foreign Minister Penny Wong has warned that artificial intelligence’s potential use…

Anthropic’s Claude Sonnet 4.5 detects testing scenarios, raising evaluation concerns

Anthropic’s latest AI model recognised it was being tested during safety evaluations,…