Recent research reveals that popular AI chatbots, often considered advanced and capable, display signs of mild cognitive impairment when tested using tools designed to detect early dementia in humans.
The study, published in the Christmas issue of the BMJ, raises questions about whether AI will ever fully replace human doctors in clinical settings.
Over the past few years, artificial intelligence has made impressive progress, leading many to wonder if machines could outperform humans in complex tasks, including diagnosing illnesses.
While several studies highlight the ability of large language models (LLMs) to assist in medical diagnostics, this study focuses on their potential vulnerabilities—specifically, how they perform on cognitive tasks that humans with early dementia often struggle with.
To explore this, researchers tested leading chatbots, including ChatGPT versions 4 and 4o (from OpenAI), Claude 3.5 (from Anthropic), and Gemini versions 1 and 1.5 (from Alphabet). They used the Montreal Cognitive Assessment (MoCA), a widely used test to screen for early cognitive impairment in humans.
The MoCA evaluates various mental abilities like attention, memory, language, problem-solving, and visuospatial skills. A score of 26 out of 30 or higher is generally considered normal.
Each chatbot was asked to complete the MoCA tasks, following the same instructions given to human test-takers. A practicing neurologist scored their responses using official guidelines.
Among the chatbots, ChatGPT 4o performed the best, scoring 26 out of 30, which is just above the threshold for normal cognitive function. ChatGPT 4 and Claude scored 25, while Gemini 1.0 lagged significantly behind with a score of 16.
The chatbots showed consistent weaknesses in tasks that require visuospatial skills and executive functions. For instance, they struggled with the “trail-making task,” which involves connecting numbered and lettered circles in order, and the clock-drawing test, where a clock must be drawn to show a specific time.
These tasks are often used to assess problem-solving and spatial reasoning, which were clear areas of difficulty for the AI models. The Gemini models performed particularly poorly in remembering a simple sequence of five words after a delay, a test of memory recall.
However, the chatbots excelled in other areas, such as naming objects, maintaining attention, understanding language, and grasping abstract concepts. These strengths align with the nature of LLMs, which are trained to process and generate language with a high degree of accuracy.
Still, their failure in interpreting complex visual scenes or showing empathy during certain tasks underscores their limitations. Only ChatGPT 4o successfully passed a challenging part of the Stroop test, which evaluates reaction time and decision-making when faced with conflicting information.
The researchers acknowledge that AI and human brains are fundamentally different, so direct comparisons can be misleading. However, the consistent struggles of all tested chatbots in specific cognitive tasks highlight key areas where these models fall short.
These weaknesses could limit their effectiveness in clinical settings, especially in roles that require high-level problem-solving or empathy—qualities critical in medical care.
The study concludes that human neurologists are unlikely to be replaced by AI anytime soon. Interestingly, the authors suggest that neurologists might even face a new challenge in the future: addressing the “cognitive impairments” of virtual patients—AI models themselves.
This humorous but thought-provoking perspective underscores the complexity of human cognition and the hurdles AI must overcome to truly match it.
If you care about dementia, please read studies about Vitamin B9 deficiency linked to higher dementia risk, and flavonoid-rich foods could help prevent dementia.
For more information about brain health, please see recent studies that cranberries could help boost memory, and how alcohol, coffee and tea intake influence cognitive decline.
The research findings can be found in BMJ.
Copyright © 2024 Knowridge Science Report. All rights reserved.