AI shows limitations in heart risk assessment

May 6, 2024

Recent research has raised concerns about relying on ChatGPT for some medical evaluations, specifically in determining the necessity of hospitalizing patients with chest pain.

This study, detailed in the journal PLOS ONE, involved thousands of simulated patient scenarios and revealed significant inconsistencies in ChatGPT’s assessments.

Dr. Thomas Heston, from Washington State University’s Elson S. Floyd College of Medicine, led the study. He noted that ChatGPT, despite its sophistication, returned different cardiac risk levels for the same patient data in multiple tests.

For instance, it would sometimes classify a patient as low risk, then as intermediate, and occasionally even as high risk.

This inconsistency is attributed to the inherent randomness of the generative AI system designed to mimic natural human conversation, which is not ideally suited for medical applications where consistency is crucial.

The study highlights the importance of consistent and reliable assessments in emergency medical settings, particularly in evaluating chest pain—a common and potentially complex complaint in emergency rooms.

Currently, medical professionals often use established risk assessment tools like the TIMI and HEART scales, which rely on a specific set of variables such as symptoms, health history, and age.

These tools are like calculators, providing a consistent risk score that helps doctors decide on the best course of action, whether that involves hospitalization or outpatient care.

In contrast, AI systems like ChatGPT can process an immensely broader range of variables. However, this capability does not necessarily translate into more reliable or accurate assessments.

In the study, ChatGPT’s performance varied significantly, disagreeing with fixed TIMI or HEART scores in about 45% to 48% of cases.

Moreover, when tested with a dataset containing 44 different health variables, ChatGPT often contradicted its own assessments, showing inconsistency in its judgments about the same cases nearly half of the time.

Despite these challenges, Dr. Heston remains optimistic about the potential applications of generative AI in healthcare.

He suggests that AI could be particularly useful in generating differential diagnoses, where it can offer several potential explanations for a patient’s symptoms, helping doctors to broaden their perspective and consider various possibilities.

This could be particularly valuable in complex cases where the diagnosis is not immediately apparent.

Furthermore, assuming the confidentiality of medical records can be protected, AI could be used to quickly summarize the most relevant information about a patient in emergency situations, potentially speeding up the decision-making process.

The study underscores a critical point: while AI can enhance the tools available to healthcare professionals, it is not yet reliable enough to replace traditional methods in critical care scenarios, particularly in cardiac risk assessment.

More research and development are needed to improve AI’s accuracy and consistency in medical applications, ensuring that it can one day be safely relied upon in high-stakes environments.

If you care about heart health, please read studies that apple juice could benefit your heart health, and Yogurt may help lower the death risks in heart disease.

For more information about health, please see recent studies that Vitamin D deficiency can increase heart disease risk, and results showing Zinc and vitamin B6 linked to lower death risk in heart disease.

The research findings can be found in PLOS ONE.

AI shows limitations in heart risk assessment

Trending Now

Can what we eat help prevent cancer?

Best ways to slow down dementia progression

Fig trees could turn CO₂ into stone, study finds

Humanoid robots could help solve surgery delays and hospital staff shortages

We may live in a giant cosmic void—and it could solve a big mystery

Scientists map out future for fast-charging lithium–sulfur batteries