Artificial
intelligence is evolving every day. Its use in medical sciences is not something new. Several researchers and tech giants have already initiated projects where AI models are helping doctors in analysing patient symptoms and even recommending cures. In a similar light, a new study has found that a large language model (LLM) was able to diagnose patients more accurately than human doctors in a real-life emergency room.
Here’s What The Study Found
As per the study published in the Science journal at Harvard Medical School and Beth Israel Deaconess Medical Center, an attempt was made to examine the performance of
LLMs in medical contexts that involved the real-life medical emergency room. It was found that at least one LLM could diagnose patients correctly even better than real doctors. It almost gave the exact and very close diagnosis in 67 per cent of cases when compared to the real doctors, as their accuracy rate was found to be 50-55 per cent.Based on the trials that actually analysed the responses of hundreds of doctors against LLMs, the findings highlighted that AI-powered systems are potentially inching closer to supporting real-life doctors with decision-making. The AI adoption in healthcare is gaining traction, and as per the American Medical Association (AMA), nearly one out of five physicians in the United States are using AI tools to assist diagnosis.The co-author of the study and assistant professor of biomedical informatics in the Blavatnik Institute at HMS, Arjun (Raj) Manrai, said in a release, “We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines.”In one of the experiments, the study used 76 emergency room cases where decisions were needed to prioritise care to admissions to the ICU. It compared diagnoses by physicians to those generated by OpenAI’s o1 and 4o models. Notably, when two other doctors, unaware of the diagnosis coming from AI, evaluated the results, they concluded that at each stage of the emergency room diagnosis process, o1 model performed nominally better or at par with the human physicians and 4o model.Moreover, the other co-author, Peter Brodeur,
HMS clinical fellow in medicine at Beth Israel Deaconess, stated, “Models are increasingly capable. We used to evaluate models with multiple-choice tests; now they are consistently scoring close to 100 percent, and we can’t track progress anymore because we’re already at the ceiling.”
What Lies Ahead
It is noteworthy that researchers believe the study does not highlight that AI can replace real-life human doctors. It only shows how AI can be studied as new medical developments take place through rigourous clinical trials and carefully controlled medical scenarios.