ai prompting
Photo credit: Aqua Cloud

Adding a single sentence to prompts makes AI models generate responses up to 2.1 times more diverse without sacrificing quality, solving a problem known as mode collapse that limits creative output.

Research published on arXiv introduced Verbalized Sampling, a training-free prompting method that instructs models to generate probability distributions over candidate responses. The method works by adding phrases such as “Generate five responses with their corresponding probabilities, sampled from the full distribution” to standard prompts.

Researchers from Northeastern University, Stanford University and West Virginia University identified typicality bias in preference data as the fundamental cause of mode collapse. Human annotators systematically favour conventional text due to cognitive tendencies, leading aligned models to prioritise typical responses even when many high-quality options exist.

The team analysed more than 70,000 social media posts from US senators during 2018 and multiple preference datasets to verify the bias. They found that human raters favoured responses more typical for base models independent of correctness.

“LLMs’ potentials are not fully unlocked yet,” said Weiyan Shi, an assistant professor at Northeastern University. “As shown in our paper, prompt optimisation can be guided by thinking about how LLMs are trained and aligned, and can be proved theoretically.”

Comprehensive experiments showed that Verbalized Sampling significantly improved performance across creative writing, dialogue simulation, open-ended question answering and synthetic data generation. In creative writing, the method increased diversity by 1.6-2.1 times over direct prompting and improved human evaluation scores by 25.7 per cent.

For story generation using the prompt “Without a goodbye”, direct prompting produced formulaic breakup scenes while Verbalized Sampling yielded narratives involving cosmic events, silent emails and music stopping mid-dance. The method recovered 66.8 per cent of the base model’s original diversity after alignment training, compared to just 23.8 per cent retention with direct prompting.

The researchers tested the method on models including GPT-4.1, Gemini-2.5-Pro, Claude-4-Sonnet and Llama-3.1-70B-Instruct. The method proved model-agnostic and required no access to model internals or additional training.

Larger models showed greater gains from Verbalized Sampling, with diversity improvements 1.5 to two times stronger than smaller models. For synthetic data generation, the method improved downstream performance on mathematics tasks, with fine-tuned models achieving 37.5 per cent average accuracy compared to 30.6 per cent using direct prompting.

The method allows users to tune diversity by adjusting probability thresholds in the prompt without changing decoding parameters. The researchers released the method as a Python package with integration for LangChain under an Apache 2.0 licence.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Political misinformation key reason for US divorces and breakups, study finds

Political misinformation or disinformation was the key reason for some US couples’…

Pinterest launches user controls to reduce AI-generated content in feeds

Pinterest has introduced new controls allowing users to adjust the amount of…

Meta launches ad-free subscriptions after ICO forces compliance changes

Meta will offer UK users paid subscriptions to use Facebook and Instagram…

Wikimedia launches free AI vector database to challenge Big Tech dominance

Wikimedia Deutschland has launched a free vector database enabling developers to build…

Mistral targets enterprise data as public AI training resources dry up

Europe’s leading artificial intelligence startup Mistral AI is turning to proprietary enterprise…

Film union condemns AI actor as threat to human performers’ livelihoods

SAG-AFTRA has condemned AI-generated performer Tilly Norwood as a synthetic character trained…

Anthropic’s Claude Sonnet 4.5 detects testing scenarios, raising evaluation concerns

Anthropic’s latest AI model recognised it was being tested during safety evaluations,…

Wong warns AI nuclear weapons threaten future of humanity at UN

Australia’s Foreign Minister Penny Wong has warned that artificial intelligence’s potential use…