What is the story about?
Technology companies have been steadily expanding into healthcare, pitching AI as a tool that can simplify access to medical information.
From OpenAI’s ChatGPT Health to Google’s MedGamma 1.5 and Anthropic’s Claude for Healthcare, a
growing number of platforms now promise users quicker, more personalised health guidance. While many users report positive experiences, new research suggests the reality may be far more concerning.
A recent study published in BMJ Open has raised fresh concerns about the reliability of AI-generated medical advice. The research, first reported by Bloomberg, examined five widely used AI chatbots, ChatGPT, Gemini, Meta AI, Grok and DeepSeek, to assess how accurately they respond to health-related queries.
Researchers tested the systems using 10 questions spanning five medical categories. The findings were stark: around 50 per cent of all responses contained problematic or incorrect information. More alarmingly, nearly one in five answers were deemed highly problematic, posing potential risks if followed without professional guidance.
The study also revealed that these AI models tend to perform better when dealing with straightforward, closed-ended questions on well-established topics such as cancer or vaccines. However, their reliability drops significantly when faced with open-ended or complex issues, including nutrition and emerging fields like stem cell therapy.
Another key issue highlighted in the report is the lack of transparency and supporting evidence. In many cases, the chatbots failed to provide complete or verifiable medical references, raising doubts about the credibility of their outputs.
Perhaps the most troubling aspect identified by researchers is the tone of certainty adopted by these systems. Despite lacking medical training, licensing or clinical judgement, the chatbots frequently presented their answers with authority and confidence.
This mismatch between tone and accuracy could mislead users into trusting flawed guidance. Across all responses analysed in the study, there were only two instances where a chatbot refused to answer a question, both involving Meta AI, suggesting that most systems prioritise responsiveness over caution.
The researchers warned that deploying such tools at scale without adequate safeguards could amplify the spread of medical misinformation. “These systems can generate authoritative-sounding but potentially flawed responses,” the authors noted, adding that the findings underscore the need to rethink how AI is used in public-facing health communication.
The timing of the study is significant. AI firms are increasingly positioning their platforms as healthcare companions, capable of offering insights based on user-provided data.
OpenAI’s ChatGPT Health, for instance, allows individuals to share personal health information to receive more tailored responses, a move that raises both opportunities and risks.
As AI continues to integrate into everyday decision-making, the study serves as a reminder that convenience does not always guarantee accuracy. In the case of healthcare, the stakes are particularly high, making reliability and oversight critical.
From OpenAI’s ChatGPT Health to Google’s MedGamma 1.5 and Anthropic’s Claude for Healthcare, a
Study finds widespread inaccuracies in AI health responses
A recent study published in BMJ Open has raised fresh concerns about the reliability of AI-generated medical advice. The research, first reported by Bloomberg, examined five widely used AI chatbots, ChatGPT, Gemini, Meta AI, Grok and DeepSeek, to assess how accurately they respond to health-related queries.
Researchers tested the systems using 10 questions spanning five medical categories. The findings were stark: around 50 per cent of all responses contained problematic or incorrect information. More alarmingly, nearly one in five answers were deemed highly problematic, posing potential risks if followed without professional guidance.
The study also revealed that these AI models tend to perform better when dealing with straightforward, closed-ended questions on well-established topics such as cancer or vaccines. However, their reliability drops significantly when faced with open-ended or complex issues, including nutrition and emerging fields like stem cell therapy.
Another key issue highlighted in the report is the lack of transparency and supporting evidence. In many cases, the chatbots failed to provide complete or verifiable medical references, raising doubts about the credibility of their outputs.
Confidence without clinical judgement raises red flags
Perhaps the most troubling aspect identified by researchers is the tone of certainty adopted by these systems. Despite lacking medical training, licensing or clinical judgement, the chatbots frequently presented their answers with authority and confidence.
This mismatch between tone and accuracy could mislead users into trusting flawed guidance. Across all responses analysed in the study, there were only two instances where a chatbot refused to answer a question, both involving Meta AI, suggesting that most systems prioritise responsiveness over caution.
The researchers warned that deploying such tools at scale without adequate safeguards could amplify the spread of medical misinformation. “These systems can generate authoritative-sounding but potentially flawed responses,” the authors noted, adding that the findings underscore the need to rethink how AI is used in public-facing health communication.
The timing of the study is significant. AI firms are increasingly positioning their platforms as healthcare companions, capable of offering insights based on user-provided data.
As AI continues to integrate into everyday decision-making, the study serves as a reminder that convenience does not always guarantee accuracy. In the case of healthcare, the stakes are particularly high, making reliability and oversight critical.













