AI Medical Scribes Are Hallucinating in Clinics
Doctors are under pressure to move faster, document more, and spend less time typing during visits. That pressure explains the rise of AI medical scribes, software that listens to appointments and turns them into clinical notes. The pitch is easy to like. Less screen time for doctors. More face time for patients. But an Ontario audit, reported by Ars Technica, found a problem that should stop anyone from treating these tools as harmless back-office software. Some systems appear to be making things up in medical records.
That matters now because a bad note is not just a clerical error. It can shape diagnosis, billing, treatment decisions, and follow-up care. If an AI system inserts symptoms that were never mentioned or leaves out details that were, the damage can spread quietly through the chart. And once bad data lands in a record, it tends to stick.
What stands out
- AI medical scribes can introduce false or missing details into patient charts.
- An Ontario audit suggests these tools are not reliable enough to trust without close review.
- Errors in clinical notes can affect treatment, referrals, billing, and legal records.
- Hospitals and clinics need strict human review, not blind faith in automation.
Why the AI medical scribes problem is bigger than it looks
People hear “note-taking tool” and assume low stakes. That is the wrong frame. In healthcare, the note often becomes the official version of what happened. Other clinicians read it later. Insurers may review it. Lawyers might too.
Look, if a chatbot writes a clumsy email, you shrug and fix it. If a medical documentation system invents chest pain, denies a family history, or drops a drug side effect, the consequences are far more severe. The record becomes a map for everyone who follows.
Clinical documentation is not a rough draft. It is part of patient care.
This is where AI hype runs into clinical reality. Vendors often sell ambient documentation tools as if they are digital assistants with minor risk. But healthcare notes are dense, contextual, and full of edge cases. People speak vaguely. Doctors interrupt. Patients jump between timelines. Background noise matters. So does phrasing.
And large language models are still prone to hallucination, even when the task seems narrow.
What the Ontario audit tells us about AI medical scribes
According to the Ars Technica report, an Ontario audit found that AI note-taking systems used in medical settings could generate inaccurate information. That includes details that did not come from the patient visit. For a sector already grappling with AI reliability, this is a flashing red light.
The core issue is not surprising to anyone who has covered AI tools closely over the past few years. Language models do not “know” facts in the way many buyers assume. They predict plausible text. Usually that looks impressive. Sometimes it goes off the rails. In medicine, “plausible” is not good enough.
Here’s the thing. Clinical notes are a terrible place to tolerate confident guessing.
What can go wrong in a patient note?
- Invented symptoms
A system may add complaints or qualifiers that sound medically tidy but were never said. - Missing context
It may drop uncertainty, timing, or negation. “No chest pain” turning into “chest pain” is the nightmare scenario, but subtler errors can matter too. - Medication mistakes
A bad summary can mix up dose, frequency, or side effects discussed during the visit. - False precision
The note may present guesses as settled facts, which can mislead the next clinician.
Why these systems slip: transcription is only half the job
Many people think an AI scribe just transcribes speech. It does not. It interprets, organizes, compresses, and rewrites. That is a much harder task. The software has to decide what matters, what belongs in the history of present illness, what should be left out, and how to phrase uncertainty.
That is like asking a line cook to plate a tasting menu from overheard kitchen chatter. Some dishes will come out fine. Some will have the wrong ingredients.
A raw transcript can contain noise, overlaps, slang, and fragmented speech. The model then turns that mess into polished medical prose. But polished prose can hide shaky reasoning. Honestly, that is part of the danger. The note may look cleaner than what a rushed human would write, which can make the error harder to spot.
What doctors and clinics should do next
If your organization is using AI documentation tools, the standard should be simple: no note gets signed without careful human review. Not skimmed. Reviewed. That slows things down, yes, but the alternative is letting the software write parts of the medical truth unchecked.
Clinics should also push vendors on specifics instead of marketing gloss.
Questions every buyer should ask
- What independent audits has the product undergone?
- How often does the system add unsupported content?
- Can clinicians see the source transcript next to the generated note?
- How does the tool handle negation, medication names, and multiple speakers?
- What happens to audio and note data after the visit?
And there is a policy angle here. Health systems should treat these products less like convenience software and more like high-risk infrastructure. That means logging edits, tracking error rates, and giving clinicians a clear way to flag failures. If a product repeatedly invents facts, it should not stay in the workflow just because it saves time.
What patients should know about AI medical scribes
Patients are often told these systems help doctors focus on the conversation. Fair enough. But you should also know that any generated note can contain mistakes. Asking whether AI is being used in the room is a reasonable question. So is asking how the doctor checks the final chart.
If your health system offers visit summaries or portal access, read them.
That one step can catch glaring errors before they spread to referrals, prescriptions, or future appointments. Patients have long been expected to correct paperwork mistakes after the fact. AI could make that burden heavier unless hospitals build stronger review processes from the start.
The real lesson for healthcare AI
This story is not an argument against AI in medicine. It is an argument against lazy deployment. There are real uses for speech recognition, summarization, and workflow support. But healthcare buyers keep making the same mistake seen in other industries. They confuse smooth demos with dependable systems.
A veteran clinician can often hear when a patient is unsure, scared, evasive, or contradicting an earlier statement. That texture matters. A model may flatten it into neat paragraphs and move on. The result can look solid while quietly stripping out the nuance that good care depends on.
The more polished the AI output looks, the more discipline humans need when checking it.
Where this heads next
Expect more audits, more procurement scrutiny, and more pressure on vendors to show real-world error data. They should have to. Medical records are too central to care to become a playground for probabilistic text generation dressed up as certainty.
If AI medical scribes are going to stay in exam rooms, they need tighter oversight, better disclosure, and a lot less hype. Otherwise, the biggest time-saver in the clinic may turn into one more source of hidden risk. And if that is where the industry is headed, what exactly are we automating for?