Every major AI model can ace a creativity test on its own. They also all ace it the same way.

Researchers at Duke University and the Technion–Israel Institute of Technology tested 22 large language models, including offerings from OpenAI, Google, Meta and Mistral, on three standardized creativity tests alongside 102 human participants.

Each model individually scored as well as or slightly better than the average human. But when the researchers compared AI outputs to each other, the picture changed. AI responses were far more similar to one another than human responses were, across every test.

The gap wasn't subtle. On one measure, the effect size hit 2.2, nearly three times what social scientists consider "large." Emily Wenger, one of the study's authors and a professor of electrical and computer engineering at Duke, said she wanted to investigate whether this homogenization phenomenon would occur in commercial LLMs and what the implications might be.

The reason is pretty straightforward. Every major commercial model has been trained on essentially the same data (the internet) and optimized for the same objective. Co-author Yoed Kenett of the Technion described the result as models that appear creative on the surface while being "overly homogenized" underneath.

Switching to a different model doesn't fix it. The study found that going from a Google model to a Meta model produced barely more variety than switching between two models built by the same company.

We covered AI's productivity problems earlier this week, but this is a separate issue. The output quality might be fine on its own. The problem, as Wenger put it, is that overreliance on these tools "will smooth the world's work toward the same underlying set of words or grammar, tending to make writing all look the same."

That's already showing up across industries. Fashion brands leaning on AI for marketing are adopting the technology at scale, with 34% of fashion executives reporting their teams already utilize generative AI for marketing copywriting. In travel, AI-built booking platforms have been popping up with near-identical language and design, making it harder for real operators to stand apart.

Even people building AI products recognize the risk. Mike Krieger, Instagram's co-founder and now Anthropic's chief product officer, told Axios that the next leap for enterprise teams isn't just smarter models, but improved reliability and AI agents that can consistently take work off users' plates.

"If you're trying to come up with an original concept or product to stand out from the crowd," Wenger said, "this work highly suggests you should bring together a diverse group of people to brainstorm rather than relying on AI."

In the Valley

Companies that lean hardest on AI for creative work are quietly erasing the thing that made them different. Every major model trains on the same internet and optimizes for the same goals, so the more teams rely on them without adding something of their own, the more everything blends together. AI was supposed to democratize creativity. It might just be standardizing it.