Anthropic dropped Claude Opus 4.8 this week, but the real story isn't the model. It's how cheap it just got to run a lot of them at once.

On the model itself, the upgrades are roughly what you'd expect. Better judgment, cleaner tool use, more patience on long tasks. The version of Claude that handled 100-hour coding sessions earlier this year just got a successor that does the same thing with fewer mistakes.

The price is where it gets interesting. Anthropic introduced a "fast mode" that runs Opus 4.8 at 2.5x the speed of previous models for 3x less money, landing at $10 per million input tokens and $50 per million output tokens. Standard pricing stays at $5 in, $25 out. That math matters because of how people are starting to use these models. The job isn't asking one question and waiting on it. It's setting a fleet of agents loose and checking back later.

Boris Cherny, the creator of Claude Code, put it bluntly on the Training Data podcast: "I just have a bunch of loops running at any time. I sort of feel like loops are the future at this point." His point is that the interesting work is no longer one prompt and one answer. It's spinning up agents that run in parallel, hand things off, and grind on long tasks while you go do something else.

Opus 4.8 reads like a model designed for that world. The companies Anthropic put on stage for the launch are almost all agent platforms, and the benchmarks they cared about are not the usual ones:

  • Browser-use, which runs browser-based agents, said Opus 4.8 scored 83.4% on OSWorld-Verified, a meaningful jump over both Opus 4.7 and OpenAI's GPT-5.5. Their team called it the strongest computer-use model they've tested.
  • Harvey, the legal AI firm, said Opus 4.8 set the highest score ever on its Legal Agent Benchmark, though even the best frontier models still complete less than 10% of tasks on the "all-pass" standard, which measures whether a model can get every step of a legal task right without a single error.
  • Cognition, the company behind Devin, builds autonomous testing workflows where Devin plans tests, operates apps, and returns reviewable artifacts without manual intervention.
  • Genspark said Opus 4.8 was the only model to complete every case end-to-end on its Super-Agent benchmark, beating GPT-5.5 on cost while clearing tasks GPT couldn't finish.

What all four are describing is roughly the same thing. Agents that run on their own for hours, call tools, and don't fall apart halfway through.

That is the actual bet. Cheaper fast inference paired with a model that can hold its train of thought inside a long loop is what makes a swarm of agents financially viable. Running ten Claudes in parallel was expensive at the old prices. At the new ones, it starts to look like something a normal company could afford to leave on overnight.

Anthropic has also been teasing a bigger model called Mythos behind the scenes, the one that came up in the Pentagon dust-up earlier this week. Opus 4.8 isn't that. It's the workhorse meant to fill the gap until Mythos shows up.

Into the Valley

The story of 2026 in AI is quietly shifting from which model is smartest to which one is cheap enough to run in parallel without thinking about the bill. Anthropic just made that math easier. If Cherny is right that loops are the future, the lab that wins won't be the one with the best single model. It'll be the one whose model you can afford to clone a hundred times and forget about.