Harvard Study: AI Outperforms Doctors in ER Diagnoses

Emergency rooms are among the most high-pressure environments in medicine. Less information, less time, and higher stakes than almost any other clinical setting. A new Harvard study just found that AI outperforms human doctors at exactly that moment.

The Harvard Study AI Emergency Room Results

The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Centre. In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two internal medicine attending physicians to those generated by OpenAI’s o1 and 4o models.

The results were clear. With that information, the O1 model managed to offer “the exact or very close diagnosis” in 67% of triage cases, compared to one physician who had the exact or close diagnosis 55% of the time, and to the other who hit the mark 50% of the time.

However, as more patient information became available, the Harvard study AI emergency room advantage grew even larger. When more details were available, the AI’s accuracy in diagnosing rose to 82 per cent, compared to 70 to 79 per cent for the doctors.

Researchers based at Harvard Medical School and Beth Israel Deaconess Medical Centre found that an AI reasoning model, developed by OpenAI, excelled at diagnosing patients and making decisions about managing their care. It matched and often outperformed doctors and the earlier AI model, GPT-4.

One case illustrated the stakes perfectly. A patient came into the emergency department with a pulmonary embolism. The condition initially improved with treatment, then worsened. Doctors suspected the medication was failing. The AI model, using the same electronic health records available at the time, flagged a possible history of lupus, an autoimmune disease that can lead to heart inflammation. That turned out to be the correct explanation.

The data handling was deliberately challenging. In fact, in Harvard Medical School’s press release, the researchers emphasised that they did not “pre-process the data at all.” Instead, the AI models were presented with the same information available in the electronic medical records at the time of each diagnosis.

What the Study Does and Does Not Claim

The researchers were careful about the limits of their findings. The study did not claim that AI is ready to make real-life or death decisions in the emergency room. Instead, the results show “an urgent need for future trials to evaluate these technologies in real-world patient care settings.” TokenMix

“I think it does mean that we’re witnessing a really profound change in technology that will reshape medicine,” Manrai said. What comes next is harder: testing these systems in live clinical settings.

Dr Adam Rodman, a clinical researcher at Beth Israel, was equally measured but clear. “This is the big conclusion for me. It works with the messy real-world data of the emergency department. It works for making diagnoses in the real world.”

Therefore, the Harvard study’s AI emergency room findings are not a replacement argument. They are a readiness argument. The technology is ready to help. The question now is whether health systems are ready to let it.