ChatGPT fails at diagnosing child medical cases. It's wrong 83 percent of the time.

A new study suggests ChatGPT needs to go back to medical school.
 By 
Chase DiBenedetto
 on 
An illustrated magnifying glass hovers over the ChatGPT logo.
ChatGPT fails at specialized medical diagnoses. Don't ditch those physicians yet. Credit: Bob Al-Greene / Mashable

OpenAI's ChatGPT is no closer to replacing your family physicians, as the increasingly advanced chatbot failed to accurately diagnose the vast majority of hypothetical pediatric cases.

The findings were part of a new study published in JAMA Pediatrics on Jan. 2, conducted by researchers from Cohen Children's Medical Center in New York. The researchers analyzed the bot's responses to requests for medical diagnosis of child illnesses and found that the bot had an 83 percent error rate across tests.

The study used what are known as pediatric case challenges, or medical cases originally posted to groups of physicians as learning opportunities (or diagnostic challenges) involving unusual or limited information. Researchers sampled 100 challenges published on JAMA Pediatrics and NEJM between the years 2013 and 2023.


You May Also Like

ChatGPT provided incorrect diagnoses for 72 out of 100 of the experimental cases provided, and generated 11 answers that were deemed "clinically related" to the correct diagnosis but considered too broad to be correct.

The researchers attribute part of this failure to the generative AI's inability to recognize relationships between certain conditions and external or preexisting circumstances, often used to help diagnose patients in a clinical setting. For example, ChatGPT did not connect "neuropsychiatric conditions" (such as autism) to commonly seen cases of vitamin deficiency and other restrictive-diet-based conditions.

The study concludes that ChatGPT needs continued training and involvement of medical professionals that feeds the AI not with an internet-generated well of information, which can often cycle in misinformation, but on vetted medical literature and expertise.

AI-based chatbots relying on Large Language Models (LLMs) have been previously studied for their efficacy in diagnosing medical cases and in accomplishing the daily tasks of physicians. Last year, researchers tested generative AI's ability to pass the three-part United States Medical Licensing Exam — It passed.

But while it's still highly criticized for its training limits and potential to exacerbate medical bias, many medical groups, including the American Medical Association, don't view the advancement of AI in the field just as a threat of replacement. Instead, better trained AI's are considered ripe for their administrative and communicative potential, like generating patient-side text, explaining diagnoses in common terms, or in generating instructions. Clinical uses, like diagnostics, remain a controversial, and hard to research, topic.

To that extent, the new report represents the first analysis of a chatbot's diagnostic potential in a purely pediatric setting — acknowledging the specialized medical training undertaken by medical professionals. Its current limitations show that even the most advanced chatbot on the public market can't yet compete with the full range of human expertise.

Chase sits in front of a green framed window, wearing a cheetah print shirt and looking to her right. On the window's glass pane reads "Ricas's Tostadas" in red lettering.
Chase DiBenedetto
Social Good Reporter

Chase joined Mashable's Social Good team in 2020, covering online stories about digital activism, climate justice, accessibility, and media representation. Her work also captures how these conversations manifest in politics, popular culture, and fandom. Sometimes she's very funny.

Mashable Potato

Recommended For You

Why 'The Pitt' Feels More Real Than Any Other Medical Drama
Noah Wyle and the cast of 'The Pitt' on set filming the show

OpenAI to finally bring ads to ChatGPT
Photo illustration of the chatgpt logo on a smartphone. The same logo can be seen faded in the background

How ChatGPT ends up in children's toys
A small robot, stuffed bear with OpenAI logo, and Grok toy.

Anthropic Super Bowl LX ads mock ChatGPT
screenshot of anthropic super bowl lx ads featuring handsome black actor and words 'ads are coming to chatgpt. but not to claude.'

Trending on Mashable
NYT Connections hints today: Clues, answers for April 3, 2026
Connections game on a smartphone

NYT Connections hints today: Clues, answers for April 4, 2026
Connections game on a smartphone

Wordle today: Answer, hints for April 3, 2026
Wordle game on a smartphone

Google launches Gemma 4, a new open-source model: How to try it
Google Gemma

Wordle today: Answer, hints for April 4, 2026
Wordle game on a smartphone
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!