What ChatGPT Health can actually tell you — and what it can’t

How many times have you asked ChatGPT for health advice? Maybe because of a mysterious rash or tightness in your right calf after a long run. I do, on both counts. ChatGPT even correctly diagnosed the mysterious rash I developed when I first experienced winter in Boston Cold urticariaa week before my doctor confirmed it.

More than 230 million people ask health-related questions on ChatGPT every week. according to OpenAI. While people were Plug in Having pushed aside their health fears since the early days of the internet, now the interface has changed: instead of scrolling through endless search results, you can now have something like a face-to-face conversation.

Sign in Here to explore the world's big, complicated problems and the most efficient ways to solve them. Shipped twice a week.

In the past week, two of the largest AI companies have fully embraced this reality. OpenAI launched ChatGPT Healtha dedicated area within the larger chat interface where users can link their medical records, Apple Health data and stats from other fitness apps to receive personalized answers. (It is currently available will only be available to a small group of users, but the company says it will eventually be open to all users.) Just a few days later, Anthropic announced a similar consumer-focused tool for Claude, among many others aimed at healthcare professionals and researchers.

Both consumer-focused AI tools come with disclaimers—not intended to diagnose, but consult a professional—that were likely created for liability reasons. But these warnings won't stop the hundreds of millions of people who already use chatbots to understand their symptoms.

However, it's possible that these companies are lagging behind: AI is great at diagnosis; Several studies show that this is one of the best use cases for the technology. And there are real compromises – everywhere Data protection And AI's tendency to please people – this is what you should understand before linking your medical records to a chatbot.

Let’s start with what AI is actually good at: diagnosis.

Diagnosis is largely pattern matching, which is partly how AI models are trained in the first place. All an AI model needs to do is take symptoms or data, match it with known conditions, and arrive at an answer. These are patterns that doctors have validated over decades – these symptoms mean this disease, this type of picture represents this condition. AI has been trained on millions of these flagged cases, and it shows.

In one Study 2024GPT-4 – OpenAI's leading model at the time – achieved over 90 percent diagnostic accuracy in complex clinical cases, such as patients with atypical peak rashes. Human physicians who use conventional resources now achieve around 74 percent. In one separate study In the study published this year, top models outperformed doctors in identifying rare diseases from images – including aggressive skin cancer, birth defects and internal bleeding – sometimes by 20 percent or more.

Things get bleak during treatment. Doctors have to think about the right medication, but also figure out whether the patient will actually take it. The twice-daily pill may work better, but will they remember to take both doses? Can they afford it? Is there transportation to the infusion center? Will they continue?

These are human questions that depend on context not included in training data. And of course, a large language model can't dictate anything to you and doesn't have the reliable memory you need for long-term case processing.

“Management often doesn’t have the right answers,” said Adam Rodman, a physician at Beth Israel Deaconess Medical Center in Boston and a professor at Harvard Medical School. “It’s harder to teach a model.”

But OpenAI and Claude don't market diagnostic tools. They are marketing something more vague: AI as personal health analysis. With ChatGPT Health and Claude you can now connect Apple Health, Peloton and other fitness trackers. The promise is that AI can analyze your sleep, exercise, and heart rate over time – and find meaningful trends from all of this disparate data.

One problem with this is that there is no published independent research to prove this. The AI may notice that your resting heart rate increases or that you sleep worse on Sundays. But watching a trend is not the same as knowing what it means — and no one has validated which trends, if any, predict actual health outcomes. “There’s a vibe,” Rodman said.

Both companies have tested their products against internal benchmarks – OpenAI developed HealthBench, which was developed with hundreds of doctors and tests how models explain lab results, prepare users for appointments, and interpret wearable data.

However, HealthBench is based on synthetic conversations and not real patient interactions. And it's text only, meaning it doesn't test what happens when you actually upload your Apple Health data. Plus, the average conversation only lasts 2.6 conversations, a far cry from the anxious back-and-forth a worried user might have for days.

That doesn't mean ChatGPT or Claude's new health features are useless. They can help you identify trends in your habits, just as a migraine diary helps people identify triggers. However, at this point it is not validated science and it is worth knowing the difference.

The bigger question is what AI can actually do with your health data and what you risk by using it.

The health conversations are stored separately, OpenAI says, and their content is not used to train models, as is the case with most other chatbot interactions. But neither ChatGPT Health nor Claude's consumer-facing healthcare features are covered by HIPAA, the law designed to protect information you share with doctors and insurers. (OpenAI and Anthropic offer enterprise software to hospitals and insurers that is HIPAA compliant.)

In the event of a lawsuit or criminal investigation, companies would have to comply with a court order. Sara Geoghegan, Senior Counsel at the Electronic Privacy Information Center, said The Record that sharing medical records with ChatGPT could effectively strip those records of HIPAA protection.

At a time when reproductive care and gender-affirming care are in decline legal threat in several statesthis is not an abstract concern. If you ask a chatbot questions about either topic and link your medical records, you're likely creating a data trail that could potentially be subpoenaed.

Furthermore, AI models are not neutral information repositories. You have one documented tendency to tell you what you want to hear. If you are anxious about a symptom – or fishing for reassurance that it's nothing serious – the model can sense your tone of voice and possibly adjust its response in a way that a human doctor shouldn't.

Both Pursue say They have trained their healthcare models to explain information and flag when something warrants a doctor's visit, rather than simply giving users consent. Newer models are more likely to ask follow-up questions when uncertain. However, it remains to be seen how they perform in real-world situations.

And sometimes there's more at stake than a missed diagnosis.

A form The study, published in December, tested 31 leading AI models, including those from OpenAI and Anthropic, on real medical cases and found that the worst-performing model made recommendations with potentially life-threatening harm in about one in five scenarios. A separate study The study of an OpenAI-powered clinical decision support tool used in Kenyan primary care clinics found that when the AI made a rare harmful suggestion, doctors accepted the bad advice nearly 60 percent of the time (about 8 percent of the time).

These are not theoretical concerns. Two years ago, a California teenager named Sam Nelson died after asking ChatGPT to help him use recreational drugs safely. Cases like this are rare and mistakes made by human doctors are real – tens of thousands of people die every year due to medical errors. But these stories show what can happen when people trust AI to make important decisions.

It would be easy to read all of this and conclude that you should never ask a chatbot a health question. But that ignores why millions of people are already doing it.

The average wait for a primary care appointment in the U.S. is now 31 days — and in some cities like Boston, it is over two months. When you enter, the visit lasts about 18 minutes. According to OpenAI, 7 out of 10 health-related ChatGPT conversations take place outside of clinic office hours.

In comparison, chatbots are available 24/7 and “they are infinitely patient,” Rodman said. You will answer the same question in five different ways. For many people, this is more than they get from the healthcare system.

So should you use these tools? There is no single answer. But here's a framework: AI is good at explaining things like lab results, medical terminology, or the questions you should ask your doctor. It's proven impossible to find meaningful trends in your health data. And it is not a substitute for a diagnosis from someone who can actually examine you.