Mass General Study Reveals AI Chatbots' Limitations in Medical Diagnostics
A study by Mass General Brigham highlights the limitations of generative AI models in medical diagnostics, particularly in generating differential diagnoses. The research evaluated 21 large language models (LLMs) on standardized clinical cases, revealing that while models like GPT-5 and Gemini 3.0 Flash can achieve high accuracy in final diagnoses, they struggle with the initial stages of clinical reasoning. The study emphasizes that these AI models are not yet ready for unsupervised clinical deployment, despite improvements. The findings suggest that AI can augment but not replace physician reasoning, highlighting a gap in AI's ability to handle uncertainty in medical contexts.