Security researchers think they just watched the first real cyberattack run by an AI.

The firm Sysdig published a forensic report on an intrusion that broke into a vulnerable cloud server, hopped to an internal database, and dumped the entire contents in 113 seconds. The full attack chain, from initial access to cleanup, took under an hour.

What made researchers point at AI wasn't the speed alone, it was the pattern. The attacker fired off 12 cloud API calls across 11 different IP addresses in 22 seconds, fanning out through Cloudflare Workers to dodge rate limits. SSH sessions came in short bursts, with a fresh command block roughly every ten seconds. Too fast for a human at the keyboard, too adaptive for a static script.

"We are not watching AI replace attackers. We are watching attackers replace their scripts with AI," said Michael Clark, Director of Threat Research at Sysdig.

The vulnerability itself was nothing exotic. CVE-2026-39987 was a pre-auth flaw in marimo, a Python notebook tool, that let anyone open a terminal session without logging in. The notable part is the timing. The first exploitation attempt landed 9 hours and 41 minutes after the advisory went public, and there was no proof-of-concept code floating around to copy from. Somebody, or something, read the advisory and wrote a working exploit from scratch in less than a workday.

Ryan Dewhurst at watchTowr put the broader trend simply: AI is already cutting down the time it takes to find, validate, and weaponize software flaws. Insurance analytics firm CyberCube said the same thing a different way in its H1 briefing, arguing that AI is compressing the attack lifecycle to the point where impact lands before defenders can even detect it.

This is the part of the conversation that's shifted in a few months. We wrote last week about Anthropic's classified Mythos work with the NSA, where the model was finding zero-day flaws in major operating systems and browsers that nobody else had caught. That was the optimistic version of this story, AI as a force multiplier for defenders. What Sysdig is describing is the same capability showing up on the wrong side of the keyboard.

And the models powering all of this aren't getting harder to abuse. Cisco's AI threat research team tested 15 frontier closed models from OpenAI, Anthropic, Google, Amazon and xAI and found that none of them could be called safe under sustained, multi-turn pressure. The numbers told the story:

  • Attack success rates climbed sharply when researchers stopped firing single prompts and held a back-and-forth with the model. Gemini 3 Pro's success rate jumped 55 percentage points between single and multi-turn testing.
  • Even the best performer in the cohort, Amazon's Nova 2 Lite, still failed nearly 8% of the time.
  • Eight of the fifteen models showed gaps of 15 percentage points or more between single-turn and multi-turn results, meaning the safety scores most labs publish are measuring the wrong test.

"Real adversaries won't stop at the first refusal," said Amy Chang, who leads AI threat and security research at Cisco. "They will build, adapt and apply pressure."

Into the Valley

The cybersecurity industry has been pricing AI risk as a future problem. The marimo attack moves it onto the calendar. Defenders have spent the last year buying AI tools that scan faster and patch faster, on the assumption they'd stay a step ahead of whatever attackers were brewing. 113 seconds suggests that lead is already gone, and the only real question now is whether the next AI agent we hear about belongs to a red team or a real one.