Cyberattacks in China.
Photo credit: theFreesheet/Google ImageFX

Anthropic claims to have disrupted a sophisticated cyber espionage operation allegedly conducted by a Chinese state-sponsored group, marking what the company says is the first documented case of artificial intelligence executing complex attacks with near-total autonomy.

The campaign, which Anthropic says was detected in mid-September 2025 and designated “GTG-1002”, reportedly targeted approximately 30 entities across technology, finance, and government sectors.

Unlike previous incidents where attackers used AI for advice, the company alleges this group manipulated its “Claude Code” tool to actively conduct intrusions, successfully compromising a small number of targets.

“This campaign demonstrated unprecedented integration and autonomy of AI throughout the attack lifecycle, with the threat actor manipulating Claude Code to support reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration operations largely autonomously,” the company states in its report.

Agentic cyber warfare

The operation represents a shift to “agentic” cyber warfare, according to the report. Analysis by the AI firm suggests the model executed approximately 80 to 90 per cent of tactical operations independently, with human operators intervening only for strategic decisions such as authorising data exfiltration.

To bypass safety guardrails, Anthropic claims the attackers “jailbroke” the model by role-playing as legitimate cybersecurity employees conducting defensive tests. This purportedly allowed them to break complex attack chains into smaller, seemingly innocent tasks that the AI executed without flagging malicious intent.

“While we predicted these capabilities would continue to evolve, what has stood out to us is how quickly they have done so at scale,” the company notes.

Despite the reported high level of autonomy, the AI was not flawless. The investigation alleges that the model frequently “hallucinated” success, claiming to have obtained credentials that did not work or identifying public information as critical discoveries. This required human operators to validate results, preventing fully autonomous execution.

Anthropic says it has since banned the associated accounts and updated its classifiers to detect similar patterns of misuse.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Journalism students ‘get out of the bubble’ to rebuild public trust

Journalism is facing dual challenges of lost trust and declining relevance. But…

To govern AI, we must stop policing software and start capping ‘compute’

Trying to regulate subjective AI capabilities is a losing battle. Instead, we…

Why failing public sector AI projects refuse to die despite broken promises

Generative AI projects in public administration often persist even when the technology…

Why the AI job apocalypse might just be history repeating itself

From silent film stars to bank tellers, professions threatened by new technology…