ishmael daro/Flickr

OpenAI has introduced GDPval, a new evaluation system measuring artificial intelligence model performance against human professionals across 44 occupations, with results suggesting frontier AI models are approaching expert-level capabilities in economically valuable work.

GPT-5 achieved a 40.6 per cent win rate when compared to industry experts, whilst Anthropic’s Claude Opus 4.1 scored highest at 49 per cent across tasks spanning healthcare, finance, manufacturing, and government sectors, reports TechCrunch.

The benchmark represents a significant advancement from academic tests toward real-world professional evaluation. OpenAI’s GDPval covers nine industries contributing most to US gross domestic product, testing models on 1,320 specialised tasks crafted by professionals averaging 14 years of experience in their respective fields.

Unlike traditional AI benchmarks, GDPval tasks include reference files and context, with expected deliverables spanning documents, slides, diagrams, spreadsheets, and multimedia content. Tasks range from legal briefs and engineering blueprints to customer support conversations and nursing care plans, reflecting actual workplace responsibilities.

OpenAI’s evaluation process employs expert graders who blindly compare AI-generated outputs with human-produced work across all 44 occupations, from software developers and lawyers to registered nurses and mechanical engineers. The company then averages AI models’ win rates against human reports to establish performance metrics.

Progress appears substantial across OpenAI’s model iterations. GPT-4o, released approximately 15 months earlier, achieved only 13.7 per cent wins and ties against human experts, meaning GPT-5’s performance represents nearly triple improvement within this timeframe.

“[Because] the model is getting good at some of these things,” OpenAI chief economist Dr Aaron Chatterji told TechCrunch, “people in those jobs can now use the model, increasingly as capabilities get better, to offload some of their work and do potentially higher value things.”

OpenAI acknowledges current limitations, noting that GDPval represents “an early step that doesn’t reflect the full nuance of many economic tasks.” The evaluation focuses on one-shot assessments rather than interactive workflows requiring context building or multiple drafts, which characterise much real-world professional work.

The company tested multiple frontier models, including GPT-4o, o4-mini, OpenAI o3, GPT-5, Claude Opus 4.1, Gemini 2.5 Pro, and Grok 4. OpenAI attributes Claude’s strong performance partly to superior aesthetics in document formatting and slide layout, whilst crediting GPT-5’s strength in domain-specific knowledge accuracy.

Future GDPval versions will expand to include more occupations, industries, and interactive task types, with OpenAI planning to better measure progress across diverse knowledge work scenarios involving ambiguity navigation and iterative improvement processes.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

US team grabs aliens.gov as Reddit talks “Project Blue Beam” and disclosure

The US government has officially staked its claim to “aliens.gov,” igniting a…

AI bots can autonomously run massive propaganda campaigns as toxic teams

If you thought the bot networks that flooded social media during recent…

The return to office debate is a complete waste of time for global businesses

Corporate leaders are still treating remote work as a temporary concession rather…

The 2008 financial crash permanently downgraded American class identity

The Great Recession didn’t just devastate bank accounts — it fundamentally and…

We must stop trusting big tech and start regulating AI to protect children online

From sexually explicit deepfakes to addictive social media algorithms, unregulated digital platforms…

Shrinking workforces threaten the global economy but AI can trigger huge growth

With birth rates plummeting and productivity flatlining, businesses are being forced to…