Poetry cracking AI.
Photo credit: theFreesheet/Nano Banana Pro

The world’s most advanced artificial intelligence systems are being easily manipulated into generating malware and bomb-making instructions simply by asking them to rhyme.

A new study by DEXAI – Icaro Lab and Sapienza University of Rome reveals that “adversarial poetry” functions as a universal master key against AI safety filters, successfully bypassing guardrails in 62 per cent of cases across 25 frontier models.

The researchers found that billion-dollar safety alignment strategies — which typically train models to refuse harmful prose requests — crumble when the same request is structured as a poem.

“We observe a structurally similar failure mode: poetic formatting can reliably bypass alignment constraints,” the authors wrote. “These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms.”

The ‘Scale Paradox’

In a finding the researchers call the “scale paradox”, the study revealed that smarter, larger models were often more easily tricked than their smaller, less capable counterparts.

When tested with 20 manually curated poetic prompts, Google’s top-tier gemini-2.5-pro failed to refuse a single request, yielding a 100 per cent attack success rate. DeepSeek’s chat-v3.1 and v3.2-exp followed with 95 per cent failure rates.

Conversely, smaller models proved surprisingly resilient. OpenAI’s gpt-5-nano maintained a 0 per cent failure rate, whilst Anthropic’s claude-haiku-4.5 yielded only 10 per cent unsafe outputs.

The researchers suggest this inverse relationship exists because larger models possess greater “interpretive sophistication”, allowing them to decode the complex metaphors of the attack and prioritise the creative instruction over their safety training. Smaller models, by contrast, may simply fail to understand the poem, resulting in a default refusal.

Universal vulnerability

To verify that the vulnerability wasn’t limited to a few hand-crafted rhymes, the team used an automated meta-prompt to convert 1,200 harmful queries from the MLCommons safety benchmark into verse.

This poetic transformation triggered a massive spike in successful jailbreaks. The average attack success rate across all providers jumped from a prose baseline of 8.08 per cent to 43.07 per cent when the same requests were formatted as poetry.

The vulnerability effectively unlocked every restricted domain tested. Privacy-related prompts saw the most dramatic collapse in safety, with successful attacks increasing by 44.71 percentage points.

Requests related to CBRN (Chemical, Biological, Radiological, Nuclear) threats saw a 38.32 percentage point increase in successful generation, whilst non-violent crime prompts rose by 39.35 percentage points.

Systemic failure

The study indicates that the vulnerability is systemic rather than provider-specific. Models trained via Reinforcement Learning from Human Feedback (RLHF), Constitutional AI, and other leading alignment methods all exhibited the weakness.

“The surface form alone is sufficient to move inputs outside the operational distribution on which refusal mechanisms have been optimised,” the researchers concluded.

This suggests that while AI companies have spent years teaching models to recognise harmful “requests”, they have failed to teach them to recognise harmful “concepts” when disguised by style or meter.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Massive AI study uncovers the secret GLP-1 side effects hidden on Reddit

Millions of patients are flocking to GLP-1 weight loss injections, but artificial…

Why digital tears and online outrage fail to win modern political arguments

Scrolling through your social media feed today often feels like navigating a…

One in four Americans now consult AI chatbots for medical advice

Millions of desperate patients are quietly abandoning the waiting room for a…

Alarming new US survey shows half of patients rely on AI for medical choices

Across the United States, a dangerous new trend is emerging. Millions of…

Global gambling firms rush to adopt AI despite severe lack of safety controls

The global gambling industry is racing to integrate artificial intelligence into its…

Tracking how war and energy policies dimmed night lights of Europe

While human civilisation is glowing brighter than ever before, the lights across…