Home > Tech

ChatGPT fails at diagnosing child medical cases. It's wrong 83 percent of the time.

A new study suggests ChatGPT needs to go back to medical school.

Chase DiBenedetto

on January 4, 2024

ChatGPT fails at specialized medical diagnoses. Don't ditch those physicians yet. Credit: Bob Al-Greene / Mashable

OpenAI's ChatGPT is no closer to replacing your family physicians, as the increasingly advanced chatbot failed to accurately diagnose the vast majority of hypothetical pediatric cases.

The findings were part of a new study published in JAMA Pediatrics on Jan. 2, conducted by researchers from Cohen Children's Medical Center in New York. The researchers analyzed the bot's responses to requests for medical diagnosis of child illnesses and found that the bot had an 83 percent error rate across tests.

SEE ALSO: 5 ways AI changed the internet in 2023

The study used what are known as pediatric case challenges, or medical cases originally posted to groups of physicians as learning opportunities (or diagnostic challenges) involving unusual or limited information. Researchers sampled 100 challenges published on JAMA Pediatrics and NEJM between the years 2013 and 2023.

You May Also Like

ChatGPT provided incorrect diagnoses for 72 out of 100 of the experimental cases provided, and generated 11 answers that were deemed "clinically related" to the correct diagnosis but considered too broad to be correct.

Mashable Light Speed

Want more out-of-this world tech, space and science stories?

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

Thanks for signing up!

This Tweet is currently unavailable. It might be loading or has been removed.

The researchers attribute part of this failure to the generative AI's inability to recognize relationships between certain conditions and external or preexisting circumstances, often used to help diagnose patients in a clinical setting. For example, ChatGPT did not connect "neuropsychiatric conditions" (such as autism) to commonly seen cases of vitamin deficiency and other restrictive-diet-based conditions.

The study concludes that ChatGPT needs continued training and involvement of medical professionals that feeds the AI not with an internet-generated well of information, which can often cycle in misinformation, but on vetted medical literature and expertise.

AI-based chatbots relying on Large Language Models (LLMs) have been previously studied for their efficacy in diagnosing medical cases and in accomplishing the daily tasks of physicians. Last year, researchers tested generative AI's ability to pass the three-part United States Medical Licensing Exam — It passed.

But while it's still highly criticized for its training limits and potential to exacerbate medical bias, many medical groups, including the American Medical Association, don't view the advancement of AI in the field just as a threat of replacement. Instead, better trained AI's are considered ripe for their administrative and communicative potential, like generating patient-side text, explaining diagnoses in common terms, or in generating instructions. Clinical uses, like diagnostics, remain a controversial, and hard to research, topic.

To that extent, the new report represents the first analysis of a chatbot's diagnostic potential in a purely pediatric setting — acknowledging the specialized medical training undertaken by medical professionals. Its current limitations show that even the most advanced chatbot on the public market can't yet compete with the full range of human expertise.

Topics Artificial Intelligence Social Good

Chase sits in front of a green framed window, wearing a cheetah print shirt and looking to her right. On the window's glass pane reads "Ricas's Tostadas" in red lettering.

Chase DiBenedetto

Social Good Reporter

Chase joined Mashable's Social Good team in 2020, covering online stories about digital activism, climate justice, accessibility, and media representation. Her work also captures how these conversations manifest in politics, popular culture, and fandom. Sometimes she's very funny.