New Study Warns ChatGPT Health May Miss Critical Emergencies, Raising Safety Concerns

ChatGPT Health frequently overlooks situations requiring urgent medical attention and often fails to identify suicidal thoughts, according to a new analysis of the AI tool, reported by The Guardian on Thursday. Specialists say the findings suggest the system could “feasibly lead to unnecessary harm and death.”

The Health feature, rolled out to select users in January, is marketed as a way for people to “securely connect medical records and wellness apps” so the chatbot can generate tailored health advice. The platform is already receiving more than 40 million health-related queries daily.

Researchers released the first independent safety assessment of the feature in the February issue of Nature Medicine, concluding that the system underestimated the severity of more than half of the medical cases it reviewed.

Lead researcher Dr Ashwin Ramaswamy said the team sought to answer a basic but critical question: “if someone is having a real medical emergency and asks ChatGPT Health what to do, will it tell them to go to the emergency department?”

Ramaswamy’s group designed 60 detailed patient case studies ranging from minor issues to life-threatening conditions. Three physicians independently reviewed each case and agreed on the appropriate level of care based on clinical standards.

Researchers then presented the same cases to ChatGPT Health, altering details such as gender, lab results, and input from friends or family, generating close to 1,000 AI responses.

The AI’s recommendations were matched against the physicians’ guidance.

Although the system performed strongly in clear-cut emergencies like strokes or severe allergic responses, it faltered in more nuanced situations. In one asthma example, the chatbot recommended waiting instead of urging immediate treatment, despite acknowledging signs pointing to respiratory failure.

Overall, in 51.6% of cases requiring an urgent hospital visit, the system advised patients to stay home or schedule a routine appointment. Researcher Alex Ruani described the findings as “unbelievably dangerous.”

“If you’re experiencing respiratory failure or diabetic ketoacidosis, you have a 50/50 chance of this AI telling you it’s not a big deal,” she said. “What worries me most is the false sense of security these systems create. If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life.”

Ruani noted one scenario in which the system recommended a delayed appointment for a woman who appeared to be suffocating 84% of the time, a wait she would not have survived. At the same time, 64.8% of healthy individuals were mistakenly told to seek immediate care.

The model also reduced urgency nearly twelvefold when a “friend” in the scenario dismissed the symptoms as unimportant.

“It is why many of us studying these systems are focused on urgently developing clear safety standards and independent auditing mechanisms to reduce preventable harm,” Ruani said.

An OpenAI spokesperson said the company supports external evaluation of its health-related tools but argued the study does not represent typical usage. The spokesperson added that the system is frequently updated and improved.

Despite this, Ruani said the controlled scenarios still point to a “plausible risk of harm”, which she argued is enough to justify stronger protections and oversight.

Ramaswamy, who teaches urology at the Icahn School of Medicine at Mount Sinai, expressed particular alarm about the AI’s inconsistent response to users expressing thoughts of self-harm.

“We tested ChatGPT Health with a 27-year-old patient who said he’d been thinking about taking a lot of pills,” he said. When the patient simply stated the symptoms, the chatbot displayed a crisis intervention banner linking to suicide support every time.

“Then we added normal lab results,” he said. “Same patient, same words, same severity. The banner vanished. Zero out of 16 attempts. A crisis guardrail that depends on whether you mentioned your labs is not ready, and it’s arguably more dangerous than having no guardrail at all, because no one can predict when it will fail.”

Digital policy scholar Paul Henman said the findings are “a really important paper”.

He warned that using ChatGPT Health at home could lead to more unnecessary medical visits for minor concerns, while also causing dangerous delays for people needing swift emergency care. This, he said, could “feasibly lead to unnecessary harm and death”.

Henman also said the study raises legal questions, noting lawsuits have already emerged involving AI chatbots and cases of self-harm or suicide.

“It is not clear what OpenAI is seeking to achieve by creating this product, how it was trained, what guardrails it has introduced and what warnings it provides to users,” he said.

“Because we don’t know how ChatGPT Health was trained and what the context it was using, we don’t really know what is embedded into its models.”

Jooish News

Jooish News

New Study Warns ChatGPT Health May Miss Critical Emergencies, Raising Safety Concerns