ai prompting
Photo credit: Aqua Cloud

Adding a single sentence to prompts makes AI models generate responses up to 2.1 times more diverse without sacrificing quality, solving a problem known as mode collapse that limits creative output.

Research published on arXiv introduced Verbalized Sampling, a training-free prompting method that instructs models to generate probability distributions over candidate responses. The method works by adding phrases such as “Generate five responses with their corresponding probabilities, sampled from the full distribution” to standard prompts.

Researchers from Northeastern University, Stanford University and West Virginia University identified typicality bias in preference data as the fundamental cause of mode collapse. Human annotators systematically favour conventional text due to cognitive tendencies, leading aligned models to prioritise typical responses even when many high-quality options exist.

The team analysed more than 70,000 social media posts from US senators during 2018 and multiple preference datasets to verify the bias. They found that human raters favoured responses more typical for base models independent of correctness.

“LLMs’ potentials are not fully unlocked yet,” said Weiyan Shi, an assistant professor at Northeastern University. “As shown in our paper, prompt optimisation can be guided by thinking about how LLMs are trained and aligned, and can be proved theoretically.”

Comprehensive experiments showed that Verbalized Sampling significantly improved performance across creative writing, dialogue simulation, open-ended question answering and synthetic data generation. In creative writing, the method increased diversity by 1.6-2.1 times over direct prompting and improved human evaluation scores by 25.7 per cent.

For story generation using the prompt “Without a goodbye”, direct prompting produced formulaic breakup scenes while Verbalized Sampling yielded narratives involving cosmic events, silent emails and music stopping mid-dance. The method recovered 66.8 per cent of the base model’s original diversity after alignment training, compared to just 23.8 per cent retention with direct prompting.

The researchers tested the method on models including GPT-4.1, Gemini-2.5-Pro, Claude-4-Sonnet and Llama-3.1-70B-Instruct. The method proved model-agnostic and required no access to model internals or additional training.

Larger models showed greater gains from Verbalized Sampling, with diversity improvements 1.5 to two times stronger than smaller models. For synthetic data generation, the method improved downstream performance on mathematics tasks, with fine-tuned models achieving 37.5 per cent average accuracy compared to 30.6 per cent using direct prompting.

The method allows users to tune diversity by adjusting probability thresholds in the prompt without changing decoding parameters. The researchers released the method as a Python package with integration for LangChain under an Apache 2.0 licence.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

James Webb telescope reveals surprise origins of rare planetary odd couple

A normally “lonely” hot Jupiter sharing its immediate orbital space with a…

Attention economy can confuse as a result of missing scientific details

Science communication optimized for the attention economy often leads readers to incorrect…

Alaska megatsunami reveals seismic ‘calling card’ for earlier disaster detection

Scientists have identified a distinctive geological “ringing” that could provide an early…

Solar activity hits ‘transition boundary’ as space junk fall accelerates

Space debris and defunct satellites descend toward Earth significantly faster once solar…

Single dose of psilocybin triggers lasting anatomical brain changes

A single high dose of psilocybin causes likely anatomical changes in the…

Brexit milestones triggered persistent financial volatility across EU markets

Brexit functioned as a prolonged sequence of uncertainty that sent waves of…