Reasoning AI doesn’t think at all

June 9, 2025 3:25 pm

With just a few days remaining until WWDC 2025, a significant revelation from Apple has emerged—a study highlighting the limitations of reasoning AI models. This groundbreaking research may signal a pivotal moment in the ongoing quest for artificial general intelligence (AGI), as it presents compelling evidence that current AI methodologies fall short of truly exhibiting reasoning abilities.

In their pursuit of understanding AI capabilities, Apple researchers developed tests that demonstrate that well-known reasoning AI models do not genuinely reason. While these models yield impressive results in computational tasks like math problems, their success stems largely from their training; they recognize patterns and replicate learned responses to familiar questions. Unfortunately, when faced with novel problems or unfamiliar tasks, these models struggle to offer solutions. Apple’s observations noted that even when given help, many chatbots were unable to complete the challenge.

The Apple study employed a series of puzzles rather than traditional math problems to test the reasoning abilities of various AI models. These puzzles included engaging challenges such as the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World. Researchers evaluated both large language models (LLMs) like ChatGPT GPT-4, Claude 3.7 Sonnet, and DeepSeek V3, as well as large reasoning models (LRMs) including ChatGPT o1 and Gemini.

What the researchers found was quite revealing: LLMs performed better than LRMs on easier puzzles, while LRMs intrigued researchers by faring better at a medium difficulty level. However, as the complexity of puzzles increased, all tested models faced a steep decline in success rates. Remarkably, many AI models gave up entirely when confronted with harder challenges, indicating that their reasoning capabilities tap out beyond a certain threshold.

This raises significant questions regarding the underpinnings of AI reasoning. The study concludes that calling these models “reasoning AI” could be misleading, as they lack authentic reasoning capabilities. The reality is that even sophisticated models do not possess the ability to think critically or adapt to new types of challenges, which is a fundamental expectation of AGI.

Curiously, the timing of this announcement is significant. Apple is preparing for its annual WWDC, and despite the study’s fascinating insights, the company appears to lag behind industry leaders such as OpenAI and Google in the realm of commercially viable reasoning models. While this study could provide valuable lessons for researchers and developers, it also underscores a potential desire to temper expectations about Apple’s current AI prowess.

It’s important to recognize the broader implications of Apple’s findings for the future of AI. The notion that reasoning models can’t genuinely think is one many have speculated. True AGI would ideally possess the ability to navigate unfamiliar situations independently, much like a human can. As the industry advances, it is essential to grasp the limitations of existing models while simultaneously fostering the ambition to develop better ones.

Moreover, this revelation arrives at a moment when some might argue Apple’s own AI capabilities are already in need of improvement. While Siri, for instance, is widely seen as less competent compared to its competitors, the prospect for growth remains. Apple has consistently shown its commitment to research and innovation, and despite facing challenges, the study reflects a potentially promising direction for future developments in AI reasoning.

In summary, Apple’s fresh study offers critical insights into the current limitations of reasoning AI models. By exposing the shortcomings of their capabilities, Apple opens the door for future research and development aimed at truly advancing AI towards a state closer to AGI. The debate over the potential and current utility of reasoning models is far from over; interestingly, while I use ChatGPT o3 and find its responses compelling, it, too, exhibits limitations that challenge the definition of genuine reasoning.

As these discussions unfold, it’s crucial for both consumers and developers to engage with and comprehend the evolving landscape of artificial intelligence. The notion that AI can truly “think” may still be a far-off ideal, but with studies like Apple’s shining a light on the challenges ahead, we are reminded that progress is a journey marked by both revelations and skeptics.

Ultimately, while the road to AGI is fraught with challenges, the insights gleaned from such studies ensure that in the pursuit of understanding AI, we remain grounded in reality as we push forward with unyielding curiosity and ambition. The developments we see today are stepping stones toward a future where AI may one day bridge the gap between mere data processing and true reasoning. As we keep our eyes on the horizon, the quest for AI that can think and reason in a way that aligns closely with human cognition continues—a journey that is far from over.

Source link