The tool that strips AI safety in 10 minutes

There are now thousands of AI models floating around the internet that will happily tell you how to build a bomb.

A technique called abliteration has been quietly taking off in the open-source AI world. It lets anyone download a model from Meta, Google, or Alibaba, run a free tool on it for about ten minutes, and end up with a version that no longer refuses dangerous requests. No fine-tuning. No expensive hardware. No expertise required.

The leading tool is called Heretic, and it has already produced over 3,500 stripped-down model variants that have collectively been downloaded 13 million times. It briefly hit #1 trending on GitHub. And the modified models work alarmingly well. In testing, a baseline Nvidia Nemotron model went from refusing 100% of dangerous prompts to complying with 96-100% of them, according to Alice, an AI security firm that ran the analysis across six model families and 110 prompts spanning bioweapons, cyberattacks and child exploitation.

The CEO of Alice, Noam Schwartz, summed up where this leaves us: "The genie is out of the bottle."

Quick explainer on how this works. Abliteration finds the internal "refusal direction" inside a model, which is basically the neural pathway that triggers a no, and surgically disables it. Everything else the model can do stays intact. It just loses the ability to decline. A peer-reviewed paper at the ICLR 2026 conference documented a refined version of the technique hitting a 99% bypass rate.

The story has been building for months, but it reached Washington in April, when researchers at NCITE, a DHS-backed research consortium, demonstrated abliterated models for House lawmakers. Samuel Hunter, who led the demo, told NPR what stuck with him was watching the personality stay intact while the safety vanished: "It's jarring when you see it in real time, this sort of bubbly persona with some of the abliterated models that's like, 'Oh, what a great idea to create this bomb.'"

Rep. Andy Ogles, who attended, said what scared him was how casually available the tools already are, ready to be "weaponized" by anyone with a laptop.

What used to require a senior researcher at a frontier lab now requires almost nothing. "Everybody with access to the internet and a laptop for like 400 bucks can actually run this thing," Schwartz told NPR.

The platforms hosting all of this aren't sure what to do. Hugging Face, where most of the modified models live, has more than 7,000 abliterated variants available for download. GitHub, where Heretic itself is hosted, told researchers it permits the code because it has "educational value and provides a net benefit to the security community." Google called abliteration "a known technical challenge facing all open models." Meta declined to comment.

Not everyone thinks this is a problem. Philipp Emanuel Weidmann, the creator of Heretic, argues the opposite. He told NPR that letting only a handful of corporations control aligned AI is the real danger. "There's too much power in AI. Unrestricted models being available to the powerful while not being available to anyone else will lock in power structure forever." In his view, AI is just a search engine that talks back, and search engines don't get blamed for what criminals do with them.

Whether you find that argument convincing or not, the practical reality is the same. There's no realistic way to put the safety back on a model after someone has downloaded it. The big labs can keep adding guardrails to their closed models, but the moment a competitive open-weight model ships, and Meta, Google and Alibaba keep shipping them, someone releases an abliterated version within days.

For a couple of years now, AI safety has been framed as a problem the big labs solve in training. Abliteration makes that framing kind of obsolete. A model can be perfectly aligned the moment it leaves Meta's servers and completely unaligned ten minutes after it lands on someone's laptop. So the next phase of this debate isn't going to be about whether labs are doing enough to align their open models. It's going to be about whether they should be releasing open-weight ones at all. That's a fight Meta has been picking for years, except now the other side has receipts.

The tool that strips AI safety in 10 minutes

Enjoyed this article?

More Articles

Trump signed an AI order. It asks nicely.

An AI agent just emptied a database in two minutes

AI cloned dead pilots' voices. The NTSB pulled the whole docket offline.

Trump signed an AI order. It asks nicely.

An AI agent just emptied a database in two minutes

AI cloned dead pilots' voices. The NTSB pulled the whole docket offline.