AI diagnosis outperforms emergency room doctors

A landmark study out of Harvard Medical School has found that artificial intelligence can outperform human physicians in diagnosing patients and the results are hard to ignore.

Researchers from Harvard Medical School and Beth Israel Deaconess Medical Center published the study this week in the journal Science, putting OpenAI’s models head-to-head with real emergency room doctors on real patient cases. The findings? AI diagnosis outperforms emergency room doctors at the most critical moment initial triage, when the least information is available and the stakes are highest.

The study examined 76 actual patients who walked into the Beth Israel emergency room. Two attending physicians made their diagnoses, and so did OpenAI’s o1 and 4o models. A separate pair of physicians then evaluated all the diagnoses without knowing which came from humans and which came from a machine.

OpenAI’s o1 model nailed the exact or near-exact diagnosis in 67% of triage cases. The two human doctors? One hit the mark 55% of the time, the other just 50%. The AI didn’t just edge ahead, it consistently held its ground or pulled further ahead at every stage of the diagnostic process.

“We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,” said Arjun Manrai, who leads an AI lab at Harvard Medical School and was one of the study’s lead authors, according to Harvard’s official press release.

Crucially, the AI was given no special treatment. It received exactly the same information from electronic medical records that the attending physicians had no pre-processing, no shortcuts.

That said, researchers were careful not to declare AI ready to take over emergency rooms. The study calls for urgent real-world trials before any such move, and notes that the models were only tested on text-based data, a significant limitation in complex clinical environments. There are also unresolved questions around accountability. Beth Israel doctor and co-lead author Adam Rodman told The Guardian that there is currently “no formal framework right now for accountability” around AI diagnoses, and that patients still want human guidance through life-or-death decisions.

Still, the fact that AI diagnosis outperforms emergency room doctors even in a controlled research setting is a milestone that the medical world will not be able to brush aside easily. The conversation about AI’s role at the hospital bedside just got a lot more serious.