AI in Healthcare: Where the ROI Is Real and Where It Isn't

CommonSpirit Health is saving over $100 million a year from AI. Nebraska Medicine used it to free inpatient capacity without building anything. UNC Health pulled $14 million in combined savings from three separate deployments in a single year.

And 80% of healthcare AI projects fail to scale beyond the pilot phase.

Both of those things are true at the same time. The returns are real in specific places, applied in specific ways. The failure rate is just as real everywhere else. The question isn't whether AI works in healthcare. It's whether your organization is doing the thing that works or the thing that doesn't.

80%

of healthcare AI projects fail to scale beyond the pilot phase

Health Technology Digital, 2025

73%

of AI projects show negative ROI at the 12-month mark

AWP Life / CapTech, 2025

3.4x

How much higher actual workflow connection costs run vs. initial vendor estimates

HealthTech Digital analysis, 2025

$100M+

Annual savings reported by CommonSpirit Health across 230+ AI and automation tools

Becker's Hospital Review, 2025

Where the ROI Is Real

The clearest pattern in the 2024 and 2025 data: administrative and operational AI pays off. Clinical AI is harder, slower, and riskier. That's not a permanent state, but it's the current one.

Revenue cycle management is where most health systems are seeing the fastest returns. AI audits clinical documentation in real time before a claim goes out, catching coding errors and missed diagnoses that would otherwise get denied. UNC Health's automated prior authorization tool alone contributed $3 to $4 million in direct savings and accelerated revenue. Their ambient scribing deployment added another $6 million. Their AI-assisted infusion scheduling added $5 million on top of that.

Capacity management is producing numbers that would be hard to believe without the source data. Nebraska Medicine deployed Palantir-based predictive algorithms to manage patient flow, length of stay, and post-acute placement. The result, since the partnership began: a 2,000% increase in discharge-lounge utilization and roughly an hour cut from the time between a discharge order and the patient actually leaving. That freed inpatient capacity without a single dollar of construction, which let the hospital take more inter-facility transfers and elective surgical cases that would otherwise have been turned away.

Ambient clinical documentation is the use case with the broadest adoption for a reason. Seattle Children's ran a 90-day pilot with 58 users on the Abridge platform. Documentation effort dropped 79%. Time spent on notes per visit dropped 15.5%. At Ohio State University Wexner Medical Center, DAX Copilot cut what physicians call "pajama time," the hours spent documenting at home after a full shift. That matters financially because replacing a specialized physician who burns out costs a hospital up to $500,000 in recruitment fees and lost clinical revenue alone.

Sepsis and deterioration monitoring is where clinical AI has the clearest financial case. Tampa General Hospital used Palantir for end-to-end care coordination and continuous sepsis detection. The system is credited with saving more than 700 lives and measurably improving throughput. Sepsis is one of the most expensive conditions to treat in inpatient care. Catching it hours earlier changes both the patient outcome and the cost trajectory of the admission.

Mount Sinai Health System built an internal tool specifically for malnutrition detection. It scanned inpatient records, flagged at-risk patients, and routed them to clinical nutrition for early intervention. The result was roughly $20 million in revenue impact from shortened lengths of stay and improved recovery outcomes.

The health systems generating real returns from AI are not doing anything exotic. They picked high-volume, high-cost problems with measurable baselines and deployed AI directly against those baselines.

Where the ROI Isn't

The 80% failure rate isn't random. It concentrates around specific, predictable failure modes. Understanding them is more useful than reading another case study about what worked.

Vendor estimates are structurally wrong. A standard vendor pitch for a 500-bed hospital's document processing looks persuasive on paper. The hospital processes roughly 2.3 million documents annually at $3.40 per document in labor costs, totaling $7.82 million per year. The vendor's AI processes those same documents at $0.08 each. Theoretical savings: $7.64 million. Year-one connection costs for a facility that size average $4.2 million. That covers custom API development, EHR customization, compliance validation, and staff retraining. None of it is optional.

Vendors quote a six-month payback period. The architectural reality of healthcare data systems dictates a minimum 18-month payback. That gap is where most projects die. The organization expected savings by month six, saw none, and pulled the plug.

Healthcare documents are chaotic in ways AI systems aren't built for. A standard hospital admission packet can run anywhere from 12 to 347 pages. A single facility processes over 340 distinct document variations, handwritten physician notes, fax artifacts, and multiple conflicting date formats. An AI system with 99.5% character recognition accuracy can still fail completely in production because it can't reconcile that "Dr. Smith," "J. Smith, MD," and "John Smith, Medical Doctor" refer to the same person. A coffee stain on a faxed document might cause the system to misread a patient's date of birth by 60 years. In a clinical setting, that kind of error destroys staff trust in the system fast.

EHR systems have hidden behaviors nobody maps. An AI agent correctly extracts a patient allergy and writes it to the database. That single action triggers a cascade of secondary validation rules inside the EHR. The allergy entry automatically populates a consent form requiring an attending physician's signature, which triggers an automated page to a resident at 3am. The gap between what the data flow diagram showed during procurement and what actually happens in the live system can exceed 340%. Hospitals that discover this in production, rather than before go-live, end up with disrupted operations and frustrated clinical staff who refuse to use the system.

Speed without accuracy is worthless. If an AI system processes a document in 200 milliseconds but requires a human administrator to spend 20 minutes manually verifying the output for clinical safety, the financial case is gone. The healthcare industry average for "time-to-trust," the number of days from go-live until staff accept AI outputs without parallel manual checking, is 90 days. During those 90 days, the organization pays both the AI software license and the full cost of the human labor it was supposed to replace.

The Costs Nobody Budgeted For

Beyond the workflow connection problem, enterprise-wide AI deployment in healthcare carries a specific set of costs that appear in no vendor proposal.

Data preparation. Gartner estimates that 85% of AI models fail due to poor data quality. In healthcare, patient data is split across laboratory systems, imaging archives, and unstructured clinical notes in ways that make clean aggregation genuinely hard. Deploying a population health predictive platform costs $150,000 to $600,000 before the system produces a single actionable output. That covers predictive modeling, multi-source data pipelines, and the infrastructure to make disparate data clinically usable.

Dedicated project management. Running an AI deployment as a side project for existing IT staff is one of the more reliable ways to ensure it fails. These projects need dedicated project managers with AI deployment experience, at salaries between $80,000 and $120,000. They're needed for vendor negotiation, milestone tracking, and managing the gap between what was promised and what's been delivered.

Staff training that actually works. Up to 70% of AI-related change initiatives fail because of employee pushback or inadequate management support. Training has to reach clinical users, technical administrators, and leadership with different content for each group. CommonSpirit's AI Learning Academy registered over 5,000 employees and formally upskilled more than 1,400 staff members. That's not a one-day onboarding session. It's a sustained program that costs real money and takes real time.

The Commercial LLM Problem

Health systems evaluating AI for clinical documentation or coding face a specific architectural question with a significant financial answer: do you use commercial large language models like GPT-4, or do you run smaller, locally deployed models trained on clinical data?

The cost difference is not marginal. A large academic health system that relies on commercial LLMs for free-text classification across its annual clinical notes would spend between $115,000 and $4.6 million per year depending on volume and model selection. ICD coding alone was estimated to cost $4.15 million annually using GPT-4's token pricing at enterprise scale.

A locally trained clinical model called Clinical-BigBird was tested directly against GPT-4 on the same dataset. For chronic kidney disease classification, Clinical-BigBird hit 95.1% accuracy versus GPT-4's 89.0%. For heart failure classification, it was 94.7% versus 75.4%. Processing time for the full dataset: 2 minutes for the local model versus 4 to 6 hours for GPT-4.

The privacy dimension compounds the cost argument. Routing protected health information through external commercial APIs creates HIPAA exposure that the local model architecture eliminates entirely. Local models run on existing hospital server infrastructure. Patient data never leaves the hospital's network.

For high-volume clinical applications, the choice between commercial LLMs and specialized local models is not really a debate about capability. The local model is faster, cheaper, more accurate on clinical tasks, and less legally exposed. The main barrier is the upfront cost of training it, which runs $10,000 to $500,000 depending on the model size and dataset scope.

The Malpractice Risk in the ROI Calculation

There's a cost that almost never appears in AI business cases: the liability exposure from clinical AI errors.

For most of its history, the electronic health record was a passive system of record. It documented what happened. Attorneys used it to establish what a clinician knew and when. But EHR platforms are now embedding generative AI directly into the clinical workflow. The system summarizes patient histories, filters data, and suggests differential diagnoses to the attending physician. When it gets something wrong, it's no longer a neutral observer. It's an active participant in the care decision.

In 2024, malpractice claims involving AI tools increased 14%, concentrated in radiology, oncology, and cardiology where AI allegedly missed cancer presentations. The legal exposure is distributed across three parties: the physician who accepted or overrode the recommendation, the hospital that deployed the tool without adequate vetting, and increasingly the software developer whose algorithm produced the flawed output.

Insurance carriers are responding. Some are raising premiums across AI-using facilities. Others are adding AI-specific exclusions to malpractice policies or requiring AI literacy training as a coverage condition.

A single malpractice verdict in the $10 to $20 million range can wipe out years of savings from an AI program. That's not a reason to avoid clinical AI. It's a reason to treat governance as a financial requirement, not an administrative one.

What the Health Systems Getting Returns Actually Do

CommonSpirit attributes its $100 million annual AI impact directly to its Ethics, Data, Algorithm, and Governance committee, called EDAG. The committee includes physicians, nurses, data scientists, legal counsel, ethicists, and privacy officers. It meets every two weeks to review every tool before deployment. It maintains a live dashboard of deployed tools, tracks their risk profiles, and actively kills projects that don't meet clinical safety standards.

That level of formal oversight is not bureaucracy. It's what separates the organizations generating eight-figure returns from the ones in the 80% failure statistic.

The practical pattern across successful deployments is consistent:

Start with administrative, not clinical. Revenue cycle, prior authorization, ambient scribing, and capacity management have faster payback cycles, lower risk profiles, and clearer success metrics than clinical AI. They also build internal credibility that funds the harder work later.
Measure time-to-trust explicitly. Set a target (21 days is achievable with proper onboarding). Track it. If staff are still running parallel manual checks at day 60, the system isn't working yet regardless of what the accuracy metrics say.
Budget for connection costs at 3x the vendor estimate. Not because vendors are dishonest, but because they price against a clean environment and yours isn't one. Build the real number into the business case before approval.
Evaluate local models for high-volume clinical tasks. The cost and accuracy data favor local specialized models over commercial LLMs for anything involving clinical coding, documentation classification, or PHI at volume.
Treat governance as a financial control. A formal review process that kills 30% of proposed AI projects is saving money, not slowing things down. The projects it kills would have consumed budget without producing returns.

The 20% that scale aren't smarter or better-funded. They budgeted for the real costs and measured the right things. And they didn't go live without a governance process in place.

Where This Is Headed

The market data points in one direction: healthcare AI adoption is accelerating whether individual organizations are ready or not. 86% of health systems now report using AI across some part of their operations. Venture capital put $18 billion into healthcare AI in 2025 alone, representing 46% of all healthcare VC investment that year.

The next wave is agentic AI: systems that don't wait for a human prompt but autonomously sense their environment, set objectives, and execute multi-step processes. Analysts project that by 2028, 30% of large health systems will rely on AI agents to run core administrative processes including patient scheduling, supply chain, and end-to-end revenue cycle negotiation with payers.

That's a significant claim and the timeline may slip. But the direction is clear. Health systems that are still navigating pilot projects in 2026 are building the operational foundation that will determine whether they can participate in the next wave or watch it happen to someone else.

The macroeconomic pressure is real too. The industry faces a projected 350,000 unfilled registered nursing positions in 2026. Rural healthcare workforce shortages are projected to grow 10% in the same period. For many health systems, AI deployment isn't a strategic option. It's the only available mechanism to expand capacity without hiring staff that don't exist in sufficient numbers.

Find out where healthcare AI makes sense for your organization.

The Healthcare AI Assessment maps your revenue-cycle gaps, documentation burden, and capacity constraints against proven deployment patterns before you commit budget.

Healthcare AI Details

Sources

Health Technology Digital — "The AI Implementation Gap: Why 80% of Healthcare AI Projects Fail to Scale Beyond Pilot Phase" (2025)
Becker's Hospital Review — "700 lives, $100M saved: Healthcare AI ROI in '25" (2025)
Nebraska Medicine / Palantir — "Pioneering Partnership to Use AI Technology to Advance Healthcare" (Sep 2024)
AWP Life / CapTech — Healthcare AI failure and ROI analysis (2025)
KLAS Research — "Healthcare AI Update 2025: What Use Cases Are Adopted the Most?" (2025)
Bain & Company — "Healthcare IT Investment: AI Moves from Pilot to Production" (2025)
Premier Inc. — "Redefining AI ROI in Healthcare: The New Framework that Puts Clinical Use Cases First" (2025)
Orion Health — "Why AI projects fail in healthcare and what to do about it" (2025)
Murphi.ai — "Cost of Implementing AI in Healthcare 2025: Crucial Factors" (2025)
PMC / NCBI — "Generative AI costs in large healthcare systems: an example in revenue cycle" (2025)
Invisible Technologies — "Small language models vs. large language models" (2025)
ClickIT — "Small vs Large Language Models: The LLM Showdown for 2026" (2025)
Bell Law Firm — "AI in Healthcare Is Accelerating — But Who Pays When It Fails?" (2025)
The Doctors Company — "AI on Trial: The Rising Liability Risks of Artificial Intelligence in Healthcare" (2025)
Stanford Health Policy — "Legal Risks and Rewards of Artificial Intelligence in Health Care" (2025)
Gartner — "Hype Cycle for Artificial Intelligence, 2025"
OrbDoc — "Hype Cycle for Healthcare Provider Applications, 2025: The Rise of the Autonomous Healthcare Organization"
Forrester — "Predictions 2026: The Year AI Tests The Heart Of Healthcare"
Bessemer Venture Partners — "State of Health AI 2026"
Deloitte — "2026 US Health Care Executive Outlook"
McKinsey — "The coming evolution of healthcare AI toward a modular architecture" (2025)
AHA — "Building and Implementing an Artificial Intelligence Action Plan for Health Care" (2025)
Rivanna Medical — "AI is Revolutionizing Healthcare Quality: Tackling the 138,000 Nurse Shortage in 2025"