Recent research has raised concerns about relying on ChatGPT for some medical evaluations, specifically in determining the necessity of hospitalizing patients with chest pain.
This study, detailed in the journal PLOS ONE, involved thousands of simulated patient scenarios and revealed significant inconsistencies in ChatGPT’s assessments.
Dr. Thomas Heston, from Washington State University’s Elson S. Floyd College of Medicine, led the study. He noted that ChatGPT, despite its sophistication, returned different cardiac risk levels for the same patient data in multiple tests.
For instance, it would sometimes classify a patient as low risk, then as intermediate, and occasionally even as high risk.
This inconsistency is attributed to the inherent randomness of the generative AI system designed to mimic natural human conversation, which is not ideally suited for medical applications where consistency is crucial.
The study highlights the importance of consistent and reliable assessments in emergency medical settings, particularly in evaluating chest pain—a common and potentially complex complaint in emergency rooms.
Currently, medical professionals often use established risk assessment tools like the TIMI and HEART scales, which rely on a specific set of variables such as symptoms, health history, and age.
These tools are like calculators, providing a consistent risk score that helps doctors decide on the best course of action, whether that involves hospitalization or outpatient care.
In contrast, AI systems like ChatGPT can process an immensely broader range of variables. However, this capability does not necessarily translate into more reliable or accurate assessments.
In the study, ChatGPT’s performance varied significantly, disagreeing with fixed TIMI or HEART scores in about 45% to 48% of cases.
Moreover, when tested with a dataset containing 44 different health variables, ChatGPT often contradicted its own assessments, showing inconsistency in its judgments about the same cases nearly half of the time.
Despite these challenges, Dr. Heston remains optimistic about the potential applications of generative AI in healthcare.
He suggests that AI could be particularly useful in generating differential diagnoses, where it can offer several potential explanations for a patient’s symptoms, helping doctors to broaden their perspective and consider various possibilities.
This could be particularly valuable in complex cases where the diagnosis is not immediately apparent.
Furthermore, assuming the confidentiality of medical records can be protected, AI could be used to quickly summarize the most relevant information about a patient in emergency situations, potentially speeding up the decision-making process.
The study underscores a critical point: while AI can enhance the tools available to healthcare professionals, it is not yet reliable enough to replace traditional methods in critical care scenarios, particularly in cardiac risk assessment.
More research and development are needed to improve AI’s accuracy and consistency in medical applications, ensuring that it can one day be safely relied upon in high-stakes environments.
If you care about heart health, please read studies that apple juice could benefit your heart health, and Yogurt may help lower the death risks in heart disease.
For more information about health, please see recent studies that Vitamin D deficiency can increase heart disease risk, and results showing Zinc and vitamin B6 linked to lower death risk in heart disease.
The research findings can be found in PLOS ONE.
Copyright © 2024 Knowridge Science Report. All rights reserved.