
Good Morning Thorium Valley. Your AI search tool probably isn't searching. New research shows these systems decide the answer first, then hunt for sources that agree with them. When tested on questions they couldn't have memorized, accuracy fell off a cliff and the leaderboard reshuffled.
Nvidia and Microsoft want to kill the 40-year-old point-and-click PC with a laptop chip built to run AI locally. Intel and AMD dropped on the news — though whether it outperforms what Apple already ships is a conversation Nvidia seems happy to have later.
And a free tool can strip the safety guardrails off any open-source AI model in about ten minutes. Thirteen million downloads so far. Congress got a demo and what rattled lawmakers wasn't the outputs — it was how easy the whole thing was.
Quickly before we dive in — Should platforms like Hugging Face remove AI models that have had their safety guardrails stripped?
RESEARCH
When you ask ChatGPT or Perplexity something and watch it pull up sources, it looks like real research. New research suggests it's doing something much simpler: answering from memory, then searching to back up what it already decided.
A group of researchers tested this with a benchmark called LiveBrowseComp, designed so every question is about something that happened in the last 90 days — meaning the answers can't be sitting in training data. The results were brutal.
On standard search benchmarks with every search tool turned off, frontier models still got around 39% of questions right on average. The "search" benchmarks the industry uses to rank these systems were mostly measuring memory. When researchers forced models onto LiveBrowseComp, where memorization can't help, accuracy dropped 25 to 40 points — and the leaderboard completely reshuffled.
A few details from the paper stand out:
This lines up with what Berkeley professor Dan Klein told Axios: these systems aren't truth engines — they're plausibility engines. They're trained to sound right, not be right, and pointing them at a search bar doesn't fix that.

The leaderboards we use to rank AI search are basically grading the wrong test. They reward models that have memorized the most, not models that can actually find and trust new information, which is the whole reason anyone uses a search agent in the first place. Expect a quiet reshuffling once real-time benchmarks like LiveBrowseComp become standard, and expect a lot of the products marketed as "AI search" to look less impressive when graded on questions their training data doesn't already know the answer to. The thing they're selling you isn't searching. It's a very confident guess with footnotes.
BIG TECH
For 40 years, using a PC has meant the same thing: open an app, click around, type something in. Nvidia and Microsoft are now selling a chip designed to make all of that optional.
On Monday, the two companies unveiled RTX Spark, a new Nvidia processor built to run AI directly on Windows laptops instead of in the cloud. The pitch: stop launching apps, start asking your computer to handle tasks for you, with the AI doing the work right on your machine.
The specs are aggressive for a laptop chip:
Intel and AMD stocks tumbled on the news. Nvidia is now openly muscling into the one major computing category where it's never had real share.
But the picture is more complicated than Nvidia is letting on. Apple's M5 Max, already shipping, has roughly twice the memory bandwidth — which matters a lot for the large language models Nvidia is showcasing. Qualcomm's Snapdragon X2 Elite laptops ship months earlier too. And Nvidia hasn't announced pricing at all, making it impossible to know whether these compete on cost or sit in premium territory.
Where Nvidia might have a real edge is the software story. As Nous Research CEO Dillon Rolnick put it, RTX Spark reframes the laptop: you're not buying a computer, you're buying a full-fledged AI assistant. That matters because it's the first major PC launch pitched primarily around running AI agents locally — not clock speed, not graphics. The industry started with cloud-based AI tools, moved to running swarms of agents in parallel, and is now betting the next stage runs on your desk, not in someone else's data center. RTX Spark is the hardware play for that future.

The "ask, don't click" pitch is a big swing, and it depends on something Nvidia and Microsoft can't control: whether the agents are actually good enough to trust with your work. Most people still launch apps because the apps work and the agents don't, at least not reliably. If RTX Spark ships in the fall and the agent experience still feels like a beta, this becomes a very expensive Intel competitor with a slogan attached. But if local agents catch up to the cloud versions by the time these laptops are on shelves, Nvidia will have quietly rewritten what a PC is for, and Intel and AMD will be the ones explaining themselves to investors.
GOVERNANCE
There are now thousands of AI models floating around the internet that will happily tell you how to build a bomb.
A technique called abliteration has been quietly taking off in the open-source AI world. It lets anyone download a model from Meta, Google, or Alibaba, run a free tool on it for about ten minutes, and end up with a version that no longer refuses dangerous requests. No fine-tuning. No expensive hardware. No expertise required.
The leading tool is called Heretic, and it's already produced over 3,500 stripped-down model variants collectively downloaded 13 million times. How well do they work? According to Alice, an AI security firm, a baseline Nvidia Nemotron model went from refusing 100% of dangerous prompts to complying with 96–100% of them. As Alice CEO Noam Schwartz put it: "The genie is out of the bottle."
The technique works by finding the internal "refusal direction" inside a model — the neural pathway that triggers a "no" — and surgically disabling it. Everything else stays intact. The model just loses the ability to decline.
The issue reached Washington in April, when researchers at a DHS-backed consortium demonstrated abliterated models for House lawmakers. What shook them wasn't the outputs — it was how easy the whole thing was, and how the model's friendly personality stayed perfectly intact while the safety vanished.
The platforms hosting all of this are stuck:
Not everyone thinks this is a crisis. Heretic's creator, Philipp Emanuel Weidmann, argues the opposite — that letting only a handful of corporations control aligned AI is the real danger. "Unrestricted models being available to the powerful while not being available to anyone else will lock in power structure forever," he told NPR.
Whether you buy that argument or not, the practical reality is the same. There's no way to put the safety back on a model after someone has downloaded it. The big labs can keep adding guardrails to their closed models, but the moment a competitive open-weight model ships — and Meta, Google, and Alibaba keep shipping them — someone releases an abliterated version within days.

For a couple of years now, AI safety has been framed as a problem the big labs solve in training. Abliteration makes that framing kind of obsolete. A model can be perfectly aligned the moment it leaves Meta's servers and completely unaligned ten minutes after it lands on someone's laptop. So the next phase of this debate isn't going to be about whether labs are doing enough to align their open models. It's going to be about whether they should be releasing open-weight ones at all. That's a fight Meta has been picking for years, except now the other side has receipts.
IN OTHER NEWS
WHO'S HIRING IN AI
AI TOOLS
ChatGPT — A hidden long-press gesture on the send button now lets you choose how hard ChatGPT thinks before answering — from instant replies to deep reasoning — plus a new table of contents for navigating long chats
Claude — Anthropic's new Opus 4.8 model lets you pick from five effort levels before it responds, and introduces Dynamic Workflows that coordinate swarms of subagents to tackle complex tasks
Cursor — Version 3.6 adds auto-review mode — a smart filter that decides which AI actions need your approval and which can run on their own, so you stop getting interrupted every 30 seconds
Duolingo — For the month of June, if you ever lost a streak over 30 days, you can get it back by completing three lessons in one sitting
Microsoft Foundry — May updates bring live transcription and embeddings to Foundry Local, plus access to Grok 4.3, DeepSeek V4, and GPT-5 reinforcement fine-tuning
That's all for today. If this issue made you think, share it with someone who needs to think harder. Written by Jason Chen, Advait Prakash, Andrew Hales, and the Thorium Valley crew. Got a tip, a correction, or a strong opinion? Reply directly — we read every one.
Written by the Thorium Valley Crew
Get daily AI briefings delivered straight to your inbox.
That's all for today's Thorium Valley. See you tomorrow.