AI’s reasoning problems — why ‘thinking’ models may not be smarter

AI reasoning models were supposed to be the industry’s next leap, promising smarter systems able to tackle more complex problems and a path to superintelligence.

The latest releases from the major players in artificial intelligence, including OpenAI, Anthropic, Alphabet and DeepSeek, have been models with reasoning capabilities. Those reasoning models can execute on tougher tasks by “thinking,” or breaking problems into logical steps and showing their work.

Now, a string of recent research is calling that into question.

In June, a team of Apple researchers released a white paper titled “The Illusion of Thinking,” which found that “state-of-the-art [large reasoning models] still fail to develop generalizable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments.”

In other words, once problems get complex enough, reasoning models stop working. Even more concerning, the models aren’t “generalizable,” meaning they might be just memorizing patterns instead of coming up with genuinely new solutions.

“We can make it do really well on benchmarks. We can make it do really well on specific tasks,” said Ali Ghodsi, the CEO of AI data analytics platform Databricks. “Some of the papers you alluded to show it doesn’t generalize. So while it’s really good at this task, it’s awful at very common sense things that you and I would do in our sleep. And that’s, I think, a fundamental limitation of reasoning models right now.”

Researchers at Salesforce, Anthropic and other AI labs have also raised red flags about reasoning models. Salesforce calls it “jagged intelligence” and finds that there’s “significant gap between current [large language models] capabilities and real-world enterprise demand.”

The constraints could indicate cracks in a story that has sent AI infrastructure stocks like Nvidia booming.

“The amount of computation we need at this point as a result of agentic AI, as a result of reasoning, is easily a hundred times more than we thought we needed this time last year,” Nvidia CEO Jensen Huang said at the company’s GTC event in March.

To be sure, some experts say Apple’s warnings about reasoning models may be the iPhone maker shifting the conversation because it is seen as playing catch up in the AI race. The company has had a series of setbacks with its highly-touted Apple Intelligence suite of AI services.

Most notably, Apple had to delay key upgrades to its Siri voice assistant to sometime in 2026, and the company did not make many announcements regarding AI at its annual Worldwide Developers Conference earlier this month.

“Apple’s putting out papers right now saying LLMs and reasoning don’t really work,” said Daniel Newman, Futurum Group CEO on CNBC’s “The Exchange.” Having Apple’s paper come out after WWDC “sounds more like ‘Oops, look over here, we don’t know exactly what we’re doing.'”

Watch this video to learn more.

Source link