Clinical AI Does Not Need to Be Perfect. It Needs to Make Care Safer.

The baseline is not perfection. The baseline is the current error rate.

Jun 16, 2026

The most dangerous assumption in clinical AI governance is not that AI will make mistakes. It is that the current system does not.

Much of the healthcare AI conversation starts from an understandable place. If artificial intelligence is used in a clinical setting, it must be held to an extremely high standard because mistakes can harm patients. In the most serious cases, mistakes can cost lives. That concern is valid. But the framing is incomplete.

Too often, clinical AI is being compared against an imaginary version of healthcare: one where diagnoses are consistently timely, handoffs are complete, documentation is accurate, medication lists are clean, abnormal results never fall through the cracks, and every clinician has enough time, context, and cognitive bandwidth to make the right decision every time.

That is not the system patients experience today.

The real comparator for clinical AI is not perfection. The real comparator is the current clinical operating environment: fragmented records, overloaded clinicians, missed follow-ups, delayed diagnoses, medication errors, handoff failures, documentation burden, alert fatigue, staffing pressure, and enormous variation in care delivery.

So the right question is not: Can AI ever make a mistake?

The better question is: Can a governed AI-enabled workflow reduce the mistakes, omissions, delays, and preventable harms we already tolerate?

The status quo is not a zero-harm baseline

Healthcare leaders do not need to be reminded that clinical care is complex. But the AI governance conversation often treats “not using AI” as the safe default.

It is not.

The World Health Organization estimates that roughly one in ten patients is harmed in healthcare, with more than three million deaths annually linked to unsafe care. The WHO also estimates that more than half of patient harm is preventable.

In the United States, researchers from Johns Hopkins and collaborators estimated that diagnostic error contributes to roughly 795,000 deaths or permanent disabilities each year across clinical settings. The National Academies has warned that most people are likely to experience at least one diagnostic error in their lifetime.

These are not abstract numbers. They represent missed cancers. Delayed sepsis treatment. Stroke symptoms mistaken for something less urgent. Abnormal findings that never receive follow-up. Patients who bounce between sites of care while critical context disappears between systems, shifts, and specialties.

This is the uncomfortable truth: healthcare is already dangerous in ways we have normalized.

That does not mean clinicians are careless. Quite the opposite. Much of modern healthcare depends on extraordinary clinicians compensating for systems that are too fragmented, too complex, and too overloaded. But heroism is not a safety model. And asking human beings to continuously overcome broken workflows is not governance.

The wrong standard will produce the wrong decisions

If the standard for clinical AI is “never wrong,” then almost no technology will qualify.

But that is not how healthcare evaluates other interventions.

We do not require medications to have no side effects. We ask whether the benefits outweigh the risks for the right patient, at the right dose, under the right monitoring conditions.

We do not require surgery to be risk-free. We ask whether the procedure is clinically justified, whether the patient is properly selected, whether the team is trained, whether the risks are disclosed, and whether outcomes are monitored.

We do not require new clinical workflows to be flawless. We evaluate whether they improve care compared with the prior workflow.

Clinical AI should be held to a serious standard. But it should be the right serious standard. Not perfection. Net patient harm reduction.

The question is not whether an AI system introduces risk. It does. Every meaningful intervention in healthcare introduces risk. The question is whether the governed workflow, including the AI model, the clinician, the EHR, the escalation path, the monitoring process, the accountability model, and the patient population, reduces total risk compared with the workflow it replaces.

That distinction matters because an AI model can be imperfect and still help make care safer. Conversely, an AI model can score well in a technical evaluation and still make care worse if it is poorly integrated into clinical work.

The model is not the intervention

One of the biggest mistakes in healthcare AI governance is treating the model as the entire object of risk.

The model matters. Accuracy matters. Bias matters. Explainability matters. Security matters. Data quality matters. Drift matters. But in clinical care, the model is rarely the whole intervention.

The intervention consists of the model and the workflow.

It is the model plus the clinician reviewing the output. The model plus the EHR screen where the recommendation appears. The model plus the alarm threshold. The model plus the handoff process. The model plus the escalation pathway. The model plus the governance committee deciding what gets monitored after deployment.

A technically impressive AI tool can fail if it adds noise, creates false confidence, disrupts clinical judgment, or inserts itself at the wrong moment in the workflow.

A more modest AI tool can create significant value if it reduces omissions, closes loops, prioritizes attention, or helps clinicians recover context that the system otherwise hides.

That is why clinical AI governance must move beyond asking, “Is this model accurate?”

It must ask, “Does this AI-enabled system of care perform better than the current system of care?”

The hardest objection is real

A cautious CMO, CMIO, or CISO could fairly respond: human error and AI error are not the same kind of risk.

A tired clinician may make a mistake one patient at a time. A flawed model update, a drifted threshold, a broken data pipeline, a hidden integration error, or a biased training pattern can produce the same wrong output across thousands of patients before anyone notices.

That is the correlated-failure problem. And it is one of the strongest arguments for serious AI governance.

The answer is not to pretend AI risk is just ordinary clinical risk with new branding. It is not. AI can concentrate and scale failure in ways traditional workflows often do not. A lower average error rate is not enough if the system carries a much fatter tail risk. So governance cannot only measure average performance. It has to bound systemic failure.

That means monitoring drift. Testing model behavior across populations. Validating data pipelines. Controlling model updates. Watching for automation complacency. Creating back-pressure when output quality declines. Requiring human escalation paths. Auditing performance after deployment. Designing kill switches and rollback procedures before they are needed. This does not weaken the net-harm argument. It completes it.

The goal is not to ignore AI’s unique risks. The goal is to govern those risks while also refusing to ignore the harm already embedded in current clinical workflows.

What safer-than-the-status-quo can look like

We do not need to start with science fiction examples of autonomous AI physicians. In fact, that framing may be part of the problem.

Some of the most practical opportunities for clinical AI may be less dramatic and more useful: reducing documentation burden, improving handoffs, closing follow-up gaps, surfacing missed information, supporting medication reconciliation, prioritizing worklists, and helping clinicians see the signal inside fragmented data.

Microsoft’s Dragon Copilot (formerly DAX Copilot) is one example. The point is not that ambient documentation solves clinical safety. It does not. But documentation burden and clinician burnout are not merely workforce issues; they are reliability issues. A Providence study reported that clinicians using DAX Copilot, now part of Microsoft Dragon Copilot, experienced reduced documentation burden, reductions in burnout and documentation frustration, and 2.5 fewer hours per week of off-hours documentation.

Oracle Health offers a similar workflow example. Its Clinical AI Agent is aimed at documentation, coding, scheduling, and workflow coordination across more than 30 specialties, with Oracle reporting a nearly 30% reduction in physician documentation time. That figure should be treated as vendor-reported rather than independent clinical proof. But the use case is relevant: reducing manual burden and workflow friction can create more room for clinical attention.

Google Cloud’s work with HCA Healthcare on an AI-assisted Nurse Handoff app is another example of the right category of problem. Handoffs are among healthcare’s most fragile moments. Context gets lost. Tasks get buried. Risk transfers silently from one shift to the next. HCA and Google Cloud describe the tool as a way to support safer exchange of patient information during nurse handoffs, with nurses involved in development and review.

None of these examples proves measured patient-harm reduction by itself. That matters. Time saved, burden reduced, and handoffs improved are not the same as fewer adverse events.

But these examples do illustrate the kind of workflow target that leadership should care about: known sources of fragility where errors, omissions, and missed context can accumulate into harm.

That is the governance frame healthcare needs: not AI as magic, but AI as a measurable intervention against known failure modes.

Doing nothing is also a decision

Healthcare leaders are right to be cautious. Poorly governed AI can absolutely amplify harm. It can introduce bias. It can create automation complacency. It can generate false confidence. It can widen disparities. It can create new cybersecurity and privacy risks. It can be deployed by people who do not understand the clinical reality into which it is being inserted.

So yes, healthcare needs serious AI governance. But we should be honest about the alternative.

Saying “not yet” can feel conservative. Sometimes it is the right answer. But if the current workflow is producing measurable preventable harm, delay is not neutral. Doing nothing preserves the current error rate.

If abnormal results are being missed, doing nothing preserves that failure mode. If handoff communication is inconsistent, doing nothing preserves that failure mode. If clinicians are drowning in documentation, doing nothing preserves that failure mode. If diagnostic delays are harming patients, doing nothing preserves that failure mode.

The ethical question is not only whether AI might cause harm. It is also whether refusing to improve broken workflows allows existing harm to continue.

A leadership test for clinical AI

This is where CMOs, CMIOs, CIOs, CISOs, Chief Quality Officers, and boards have a more important role than approving or rejecting tools.

They need to change the question.

Not: “Is this AI safe?”

That question is too vague.

Instead, leadership should ask five more useful questions:

What existing harm, delay, error, omission, or burden are we trying to reduce?
What is the current baseline failure rate in the workflow we are trying to improve?
How will we measure whether the AI-enabled workflow reduces net patient harm, including both average performance and the risk of systemic failure?
What happens when the AI is wrong, uncertain, biased, incomplete, unavailable, or wrong at scale?
Who is accountable for monitoring performance, safety, equity, security, drift, and correlated failure after deployment?

These questions shift AI governance from a product approval exercise to a clinical safety discipline.

They also force healthcare organizations to confront something uncomfortable: many cannot answer these questions because they lack a clear measurement system for the workflow before AI is introduced.

That is a leadership problem, not a technology problem.

If we do not know the current miss rate, delay rate, follow-up failure rate, documentation burden, escalation failure rate, or handoff defect rate, then we are not governing against reality. We are governing against intuition.

Governance should be continuous, not ceremonial

This is also where regulatory and risk frameworks are moving. The FDA’s draft guidance for AI-enabled device software emphasizes documentation and information that support evaluation of safety and effectiveness, within a lifecycle approach. The NIST AI Risk Management Framework similarly treats AI risk as something organizations must govern, map, measure, and manage over time.

That is the right direction.

Clinical AI governance cannot be a one-time committee approval followed by passive trust. It has to be continuous. It has to include real-world performance monitoring, bias and equity review, incident reporting, clinician feedback, cybersecurity review, patient impact assessment, rollback procedures, and clear escalation paths.

Most importantly, it has to remain connected to the outcome that matters: whether care is safer, more reliable, and more effective with the AI-enabled workflow than without it.

The leadership challenge

The future of clinical AI will not be decided by slogans.

“AI will transform healthcare” is not a strategy.

“AI is too risky for clinical care” is not a strategy either.

The leadership challenge is to build governance mature enough to hold two truths at once.

First: AI can cause harm if it is poorly designed, poorly validated, poorly secured, poorly monitored, or poorly integrated into clinical work.

Second: the current system already causes harm through errors, omissions, delays, fragmentation, overload, and missed context.

Both are true. That is why the goal cannot be perfect AI. The goal must be safer care.

For CMOs and CMIOs, this means insisting that clinical AI be tied to specific quality and safety outcomes. For CIOs and CISOs, it means ensuring the technology is secure, resilient, integrated, auditable, and governable. For Chief Quality Officers, it means connecting AI oversight to existing patient safety infrastructure. For boards, it means asking whether the organization has the discipline to measure risk before and after deployment.

Healthcare leaders should not be asked to choose between blind adoption and blanket resistance. They should be asked to lead. And leadership starts with a better baseline. The baseline is not perfection. The baseline is the current error rate.

Patients do not need perfect AI. They need safer care.

Thanks for reading! This post is public so feel free to share it.

World Health Organization. Patient safety. Fact sheet.
Johns Hopkins Medicine. A Better Measure of Medical Error. October 4, 2023. (Newman-Toker et al., BMJ Quality & Safety.)
National Academies of Sciences, Engineering, and Medicine. Improving Diagnosis in Health Care. 2015.
Microsoft. Microsoft Dragon Copilot. Product page.
Providence. Providence study finds AI clinical assistant reduces provider burnout. July 2025.
Oracle. Oracle Health Clinical AI Agent Reduces Physician Documentation Time by 30%. March 4, 2025.
Google. Can AI save nurses millions of hours of paperwork? The Keyword, July 29, 2025.
U.S. Food and Drug Administration. Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations. Draft guidance, January 2025.
National Institute of Standards and Technology. AI Risk Management Framework.

Paul J. Swider is CEO and Chief AI Officer at RealActivity, a Microsoft Partner specializing in mission-critical AI for healthcare systems. He has 30+ years in healthcare technology, has trained over 3,000 engineers across GE, IDX, and Microsoft, and is the founder of BOSHUG, the Boston Healthcare Cloud & AI Community spanning 50+ countries.

Discussion about this post

Ready for more?