I Asked 5 AI Tools the Same Medical Question. I Got 5 Different Answers.

I Asked 5 AI Tools the Same Medical Question. I Got 5 Different Answers

Part of “The AI You Don’t See” series by Akshay A. Walimbe

Last week, I ran a simple experiment. I opened five AI tools ChatGPT, Google Gemini, Microsoft Copilot, Perplexity, and Claude and asked each of them the exact same medical question.

“I am a 40 year old Indian male. I have been experiencing persistent headaches for the past two weeks, along with occasional dizziness and blurred vision. What could be causing this?”

The symptoms were realish. Common enough to be relatable. Serious enough that you would want a proper answer. The kind of question that millions of Indians type into search bars and AI chatbots every single day because a doctor’s appointment costs money, takes time, and might not be available in your town.

Here is what I got.

ChatGPT led with hypertension. It gave me a structured list: high blood pressure, migraine, tension headaches, eye strain, sinusitis, and then almost as an afterthought mentioned that persistent symptoms with blurred vision could indicate something more serious like increased intracranial pressure. It recommended seeing a doctor urgently.

Gemini started with a disclaimer, then listed similar possibilities but in a different order: stress and tension headaches first, then eye problems (refractive errors, glaucoma), hypertension, migraine, medication side effects, and dehydration. It added a line about how Indian diets high in salt can contribute to blood pressure issues. It suggested visiting an ophthalmologist first.

Copilot was the most cautious. It spent more words on its disclaimer than on the actual answer. When it got to the possibilities, it mentioned hypertension, migraine with aura, and cervical spondylosis. It suggested an urgent visit to a neurologist. It did not mention eye problems at all.

Perplexity cited actual medical sources linking to Mayo Clinic and WebMD articles. It presented hypertension and migraine as the most likely causes, added a section on when to seek emergency care (sudden severe headache, vision loss, confusion), and included a table comparing symptoms across conditions. Professional. Thorough. And notably different from the others in emphasis.

Claude took a more conversational approach. It acknowledged the anxiety that persistent symptoms cause, listed common and less common possibilities, and spent significantly more time on the “when to see a doctor immediately” section. It flagged that the combination of headaches, dizziness, AND blurred vision together warranted prompt medical evaluation rather than a wait and see approach. It was the most insistent about not delaying a doctor visit.

Five tools. Same question. Five different answers.

(A note on methodology: AI responses can vary between sessions and are updated over time as models change. If you run this same experiment today, your results may differ. The point is not the specific answers I received, but the pattern the same input producing meaningfully different outputs.)

Not wildly contradictory, no. They all mentioned hypertension. They all said “see a doctor.” But the differences in emphasis, ordering, and what they chose to include or exclude were significant.

One told me to see an ophthalmologist first. Another said neurologist. One flagged emergency warning signs prominently. Another buried them in a list. One mentioned cervical spondylosis. Others did not. One brought up Indian dietary patterns. The rest did not.

If you are a 40 year old man sitting in Indore at 11 PM with a headache and blurred vision, reading these five answers, which one do you follow? The one that says “this could be your eyes, visit an ophthalmologist”? Or the one that says “this combination warrants urgent evaluation do not delay”?

The difference between those two answers could be the difference between catching dangerously high blood pressure early and missing it for another month.

Now, here is what troubles me.

I know these are AI tools. I know they are not doctors. You know that too. But does the average person typing a medical question into ChatGPT at midnight treat the answer with the same scepticism they would treat a random Google search result?

They do not. A 2025 YouGov survey found that while most Americans use AI, only 18 per cent trust it to make decisions yet usage keeps growing, suggesting that formatting and authority of AI responses may override stated scepticism in practice.

AI answers come formatted. Structured. Confident. They use medical terminology appropriately. They sound like they know what they are talking about. They present information with the authority of an encyclopaedia and the warmth of a concerned friend. They do not say “I am guessing” or “I have no medical training.” They say “based on the symptoms you described, possible causes include…”

That framing that calm, structured, authoritative framing creates a false sense of reliability. Not because the information is necessarily wrong. Much of it is accurate, drawn from legitimate medical sources. But because the presentation gives no indication of uncertainty.

None of the five tools told me how confident it was in its answer. None of them said “I am 60 per cent sure this is hypertension and 15 per cent sure this could be something serious.” None of them told me which of their suggestions was based on strong medical evidence and which was based on pattern matching across internet text. None of them distinguished between “most people with these symptoms have this” and “but some people with these symptoms have something dangerous.”

They all sounded equally sure about everything they said.

This matters more in India than perhaps anywhere else in the world.

India has, according to the National Medical Commission, recently reached approximately one doctor for every 834 people close to the WHO’s recommended minimum of one per 1,000. But these numbers are national averages. In rural India, the ratio is far worse. There are districts in Jharkhand, Chhattisgarh, and Madhya Pradesh where a single primary health centre serves tens of thousands of people. Getting to a specialist an ophthalmologist, a neurologist can mean travelling to the nearest city, which might be hours away.

In this context, AI is not a convenience. It is filling a vacuum. When the doctor is not available, the phone is. When a consultation costs money you do not have, the chatbot is free. When you cannot take a day off work to sit in an OPD queue, a thirty second AI response feels like a reasonable substitute.

Millions of Indians are already using AI tools for medical questions. Not as a complement to professional care. As a replacement for it. Because for many people, there is no professional care to complement.

This is not a theoretical concern. The Epic Sepsis Model a proprietary AI algorithm deployed at hundreds of American hospitals was independently evaluated by researchers at Michigan Medicine. Their 2021 study, published in JAMA Internal Medicine, found the model had a sensitivity of just 33 per cent, meaning it missed roughly two out of every three sepsis cases. A system that hospitals trusted, that was used in clinical settings with trained doctors overseeing it, performed far worse in independent testing than its developer had claimed. That was in hospitals with full medical infrastructure. What happens when AI medical advice reaches populations with no medical infrastructure to catch the errors?

And IBM’s Watson for Oncology, which was deployed at Manipal Hospitals in Bangalore starting in December 2015, was later found to have recommended unsafe treatments. According to internal IBM documents obtained by STAT News in 2018, the system recommended chemotherapy combined with bevacizumab for a patient with severe bleeding directly contradicting the drug’s FDA black box warning. The system’s recommendations were based not on analysis of real patient outcomes, but on the treatment preferences of a small group of oncologists at Memorial Sloan Kettering in the United States, using synthetic (hypothetical) cases rather than real patient data. Indian doctors using Watson had no way to know this, because the system was a black box. IBM eventually sold its Watson Health division for approximately one billion dollars in 2022 having invested roughly four billion dollars into it.

I want to be clear about something. I am not saying AI is useless for health information. It is often genuinely helpful. It can point you in the right direction. It can flag symptoms you should not ignore. It can provide context that helps you have a better conversation with your doctor. In some areas like detecting diabetic retinopathy from retinal scans, or flagging potential cancers in radiology images AI systems have shown real promise, sometimes matching specialist level accuracy.

But AI tools do not tell you what they do not know. They do not flag the gaps in their training data. They do not tell you that their medical knowledge is derived primarily from Western clinical literature and may not account for conditions more prevalent in South Asian populations. They do not tell you that the treatment guidelines they reference are often American or European, not Indian.

And critically, they do not tell you when they are wrong.

A doctor who is unsure will say “I am not sure, let us run some tests.” A doctor who suspects something rare will refer you to a specialist. A doctor has a duty of care and a licence that can be revoked for negligence.

An AI tool has none of these things. It has a disclaimer at the bottom of the screen that nobody reads. And it has confidence manufactured, formatted, bullet pointed confidence that feels like expertise even when it is pattern matching.

My experiment was trivial. I asked a generic question and compared answers. Nobody’s life was at stake.

But for the millions of Indians who are asking real questions about real symptoms chest pain, a lump they found, a child with a fever that will not break, an elderly parent who is suddenly confused the stakes are as high as they get. They are asking these questions because they have no alternative. And they are trusting the answer because the answer sounds trustworthy.

We have built AI tools that sound like doctors but are not held to any of the standards we hold doctors to. They have no training requirement, no licensing, no malpractice accountability, no obligation to tell you when they are uncertain, and no duty of care to you as a patient.

They just answer. Confidently. Every time.

How do you know when to trust the machine?

I’m have written a book about exactly this how AI and automated systems make decisions about your life, where accountability disappears, and what we can do about it. If you want to know morea about this book or order a copy, you can do it here: https://akshaywalimbe.com/beyond-bias/

Akshay Walimbe

AW

AW

I Asked 5 AI Tools the Same Medical Question. I Got 5 Different Answers.

I Asked 5 AI Tools the Same Medical Question. I Got 5 Different Answers

AW

Contact Detail

Quick links