As AI models move from simple tools to agents, the "how" becomes increasingly important.
By: Ph.D, IEEE Carlos Pantsios Markhauser*
As today's AI models increase in complexity and are integrated into more critical applications, it becomes more difficult to understand their ways of reasoning, in principle, because their architectures are so sophisticated. This is especially true of deep learning and LLM (Long Language Model) models.
The accuracy of the responses of recent AI models has improved significantly, driving interest in the technological potential to help sectors such as medical diagnostics, definition of therapies, or acting as virtual tutors.
As many AI models are increasingly used as assistants, rather than tools, several recent studies suggest that the way these models reason could have serious implications in critical areas such as healthcare, legal, education, security, and consumer electronics.
AI reasoning models mainly have three limitations: 1) lack of true understanding, 2) dependence on data quality for training, and 3) challenges in handling context and ambiguity.
First, AI models lack a thorough understanding of the concepts. They are good at detecting and processing patterns in data, but they don't understand context or meaning. They misinterpret sarcasm, for example, despite having been trained with sentences with similar structures.
Second, reasoning in AI depends fundamentally on the quality of the data and its scope. Polarizations, voids, or noise in training data directly impact your reasoning. For example, models trained with outdated information cannot reason about recent events, as is the case with a chatbot, ignorant of political changes after 2021. Similarly, driverless vehicle systems trained for sunny climates may have problems on snowy or icy roads.
Third, AI models present problems with ambiguities or dynamic contexts. Human reasoning adapts to evolutionary situations by incorporating real-time feedback and external knowledge. But many AI models operate with fixed parameters. For example, the chatbot may fail when there is an abrupt change of direction in the conversation initiated.
These limitations significantly affect the ability of AI models to replicate human reasoning and limit their practical application in complicated scenarios.
New research on reasoning in AI models suggests that the problem they face is that they reason fundamentally differently from humans, which makes them less suited to solving subtle problems. A recent research paper published in Nature Machine Intelligence highlighted that models have difficulty distinguishing between beliefs and factual facts, and that multi-agent systems, designed to provide medical advice, are subject to errors in reasoning, which can lead to inappropriate diagnoses.
As AI models move from simple tools to agents, the "how" becomes increasingly important, says James Zou, an associate professor of medical data science at Stanford School of Medicine and author of the Nature Machine Intelligence paper.
The distinction between factual facts and beliefs is a particularly important skill in areas such as legal, therapy, and education, Zou says.
Experiments conducted on new reasoning models, such as OpenAI's01 or DeepSeek'sR1, showed good results in factual verifications, consistently achieving correct results above 90%. In contrast, the new models presented problems in cases where false beliefs reported in the first person were processed (i.e., "I believe that ... x", when x is incorrect), showed mismatches in 52% to 62% of cases.
Flaws in the way AI models reach decisions could be particularly problematic in medical group discussions. Here, AI-based multi-agent systems work collaboratively discussing problems, hoping to replace the pool of doctors diagnosing complicated medical conditions, says Lequan Ty, an assistant professor of medical AI at the University of Hong Kong. The best multi-agent systems solved simple problems correctly, reaching 90% accuracy.
However, with complex problems requiring specialists, the systems collapsed reaching only 27% accuracy. Part of the problem was because many of these multi-agent systems were based on the same LLM, for all agents involved in the discussion, says Yinghao Zhu, one of the Ph.D. students who co-authored the research paper. Here, the failure to know the model leads to all agents agreeing to give the same wrong answer.
In conclusion, the challenge of robust and genuine reasoning in AI, particularly in LLM models, is an unsolved challenge. While LLMs excel at generating fluent human-type text based on statistical patterns, they often have problems with genuine, systematic, and multilevel logic deductions, a weakness frequently exposed by the issue of hallucinations. LLM models are weak when facing problems of systematic logic and verification.
AI models that reason are limited by their ability to understand concepts, their high dependence on imperfect training data, and their difficulties in handling ambiguous contexts and rapid changes over time. To do this, it is still necessary to incorporate human experts for the supervision of the processes and the rigorous validation of the data that is continuously fed to the models, and hybrid-symbolic AI approaches.
*Text written by Carlos Pantsios Markhauser, PhD, IEEE. he is a Telecommunications Engineer, PhD in telecommunications electronics, Master in Communications from the Simón Bolívar University, with a Specialization in Satellite Telecommunications and Networks from The George Washington University - School of Engineering & Applied Science, Specialization in Digital Telecommunications from the University of Colorado Boulder. He works as a postgraduate professor in the telecommunications schools at the Simón Bolívar University and Andrés Bello Catholic University. In addition to being a professional consultant in TV projects based in Argentina.

