AI's Health Claims
Contrary to the hype surrounding artificial intelligence, current large language models (LLMs) are not yet equipped to provide superior health advice compared
to conventional research methods. A recent investigation reveals that individuals relying on these chatbots for medical inquiries were only successful in identifying their health issue approximately one-third of the time. Furthermore, only about 45 percent of users managed to determine the appropriate course of action. This outcome is disconcerting, especially given the advanced capabilities these AI models demonstrate in acing rigorous medical licensing examinations. The gap between theoretical performance and practical application suggests a significant challenge in their real-world utility for health guidance, potentially leading to harmful consequences if not approached with extreme caution.
Study Design & Findings
To assess the practical effectiveness of AI chatbots in health diagnostics, a team of British researchers designed a comprehensive study. They presented close to 1,300 participants from the UK with ten distinct health scenarios, ranging from common ailments like a headache after alcohol consumption to more specific conditions such as the symptoms of gallstones or a new mother experiencing extreme fatigue. Participants were then randomly assigned to interact with one of three leading AI chatbots: OpenAI's GPT-4o, Meta's Llama 3, or Command R+. A control group utilized traditional internet search engines for comparison. The results were stark: those interacting with the AI chatbots performed no better than the control group in accurately identifying their health problems or deciding on the right course of action. This parity in outcomes, despite the advanced nature of the AI tools, underscores the limitations of current chatbot technology in providing reliable health advice.
Communication Breakdown
The discrepancy between the AI chatbots' high scores on medical benchmarks and their disappointing performance in real-world user interactions can be attributed to a fundamental communication breakdown. Unlike the controlled, simulated patient-doctor scenarios often used for AI testing, real human users frequently fail to provide chatbots with all the necessary and relevant information about their symptoms and history. Conversely, users also struggle to interpret the often complex or ambiguous advice offered by the chatbots, leading to misunderstandings, misinterpretations, or outright disregard for the provided guidance. This two-way communication challenge means that even with vast medical knowledge, the AI may not receive the full picture or its advice may not be effectively understood or acted upon by the user, leading to potentially dangerous outcomes.
Risks & Recommendations
The growing trend of individuals, estimated at one in every six US adults, seeking health information from AI chatbots at least monthly, highlights the urgent need for public awareness regarding the potential dangers. Experts emphasize that asking LLMs about symptoms can be perilous, leading to incorrect diagnoses and a failure to recognize when immediate medical attention is required. Bioethicist David Shaw from Maastricht University stresses the real medical risks posed by chatbots and advises the public to rely solely on trusted and verified sources for health information, such as national health services. The findings suggest that while AI may evolve, it is not yet a substitute for professional medical consultation, and users should exercise extreme caution and critical judgment when using these tools for health-related queries.











