AIs working on chunks of information
Photo credit: theFreesheet/Google ImageFX

Researchers from Mila, McGill University and Microsoft Research have solved a significant bottleneck in training AI reasoning models, cutting computational costs by three-quarters through a technique called Markovian Thinking that breaks reasoning into manageable chunks rather than processing everything at once.

The breakthrough addresses an expensive problem. When AI models reason through complex issues, they typically build up enormous chains of thought that can stretch to tens of thousands of tokens. Every new token forces the model to reprocess everything that came before, creating quadratic computational costs that spiral out of control.

Markovian Thinking takes a different approach by restructuring the reinforcement learning environment so the AI maintains a constant-size state regardless of how long it thinks. The researchers instantiated this paradigm with Delethink, which organises reasoning into fixed chunks of 8,000 tokens. At each boundary, the system resets, and the model must write a summary of its progress to carry forward. Think of it as a student solving an extended maths problem by working through it page by page, jotting down key findings at the end of each page rather than constantly rereading from the beginning.

Striking results

The results are striking. A 1.5 billion parameter model trained with Delethink can reason through 24,000 tokens whilst only processing 8,000 at a time, matching or beating traditional methods that process all 24,000 tokens continuously. The computational savings prove substantial: training a model to handle 94,000-token reasoning chains requires 27 months of H100 GPU time using standard approaches, compared to just seven months with Delethink.

The system keeps improving where traditional methods plateau. Researchers pushed one model to reason through 96,000 tokens, reaching 49 per cent accuracy on 2024’s notoriously difficult AIME mathematics competition with solutions averaging 36,000 tokens long.

Perhaps most surprisingly, existing AI models already know how to think this way. Analysis shows reasoning models from 1.5 billion to 120 billion parameters naturally produce these Markovian traces without special training. A 120 billion parameter model demonstrated robust Markovian thinking across PhD-level questions, coding challenges, mathematics competitions and crossword puzzles.

The research, led by Milad Aghajohari, Kamran Chitsaz and Amirhossein Kazemnejad, demonstrates “that decoupling thinking length from context size can, in principle, let next-generation reasoning models think for millions of tokens”. The researchers describe the reinforcement learning environment, often treated as fixed, as “a powerful lever for progress”.

The technique works alongside other efficiency methods, making it immediately practical for existing AI infrastructure without requiring architectural changes.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Super-intelligent AI could ‘play dumb’ to trick evaluators and evade controls

The dream of an AI-integrated society could turn into a nightmare if…

Satellite dataset uses deep learning to map 9.2 million kilometres of roads

Researchers have combined deep-learning models with high-resolution satellite imagery to classify 9.2…

Universities quietly deploying GenAI to ‘game’ £2bn research funding system

UK universities are widely using generative AI to prepare submissions for the…

AI guardrails defeated by poetry as ‘smarter’ models prove most gullible

The world’s most advanced artificial intelligence systems are being easily manipulated into…

Researchers hijack X feed with ad blocker tech to cool political tempers

Scientists have successfully intercepted and reshaped live social media feeds using ad-blocker-style…

Doing good buys forgiveness as CSR becomes ‘insurance’ against layoffs

Companies planning to slash jobs or freeze pay should start saving the…