Why most AI deployments fail: The four levels of responsibility every leader must master

Most AI deployments fail not because the technology is flawed, but because leadership responsibility is. Professor of the Practice of Leadership, Baba Prasad, outlines four levels every leader must master to deploy AI wisely, sustainably and competitively.

Here's a number that should make every CEO uncomfortable: companies invested $47 billion in AI initiatives during the first half of 2025. How many delivered meaningful returns? 11%.

That is not a typo. 89% of AI investments produced minimal or no returns, according to CMSWire's October 2025 research. While the industry chases superintelligence and the tech elite pursue the holy grail of artificial general intelligence, most organizations can't even get basic AI deployments right.

The problem isn't technical capability. It is a responsibility problem.

The stack nobody talks about

When leaders think about responsible AI, they typically focus on bias testing, ethical guidelines and compliance frameworks. These matter, but they're incomplete. Responsible AI isn't a checklist you complete before deployment. It's a continuous management challenge across four distinct levels, each requiring different skills and addressing different types of risk.

Your deployment becomes vulnerable if you miss any one level. Miss multiple levels, you're headed for a public failure that will cost far more than the original investment.

Level 1: Technical responsibility

This is what most people think of when they hear "responsible AI." It's about the gap between what an AI system does and what you think it does. In the lab, teams work with controlled data sets, one-time validation and optimal conditions. In production, they face real-world messiness, continuous drift and edge cases nobody anticipated.

Take the Workday hiring AI case from May 2025. The system passed initial fairness audits, and hundreds of employers used it to hire employees. Then an individual sued claiming the system discriminated against older people and people with disabilities. A federal judge found grounds to certify a nationwide collective action that could cover millions of applicants over age 40. Settlement amounts are expected to reach tens of millions of dollars.

What happened? The AI encountered data it wasn't trained on. Performance degraded in ways nobody was monitoring, and no one caught it until lawyers did.

Technical responsibility demands you answer three questions continuously:

  1. What happens when our AI encounters data it wasn't trained on?
  2. How do you know when performance is degrading?
  3. Who is monitoring systems in real-time across different demographic groups?

If you can't answer these questions for every production deployment, you have a technical responsibility problem.

Level 2: Operational responsibility

This is where organizations experience de-skilling. When AI handles tasks, humans lose the capability to recognize when AI is wrong, step in when systems fail and maintain judgment in complex situations. The system works perfectly until it doesn't, and by then, nobody remembers how to do the job without the AI system.

Mount Sinai study from April 2025 exposed this perfectly. Researchers analyzed 1.7 million AI-generated medical recommendations from nine different large language models. Same symptoms, only demographic details changed. The result: a 31% higher misdiagnosis rate for minority patients in critical care and a 23% higher false-negative rate for pneumonia detection in rural populations.

The troubling part? The systems could "explain" their recommendations. The explanations just didn't reveal the demographic bias. Explainability without accuracy is dangerous because it creates false confidence.

Operational responsibility requires you to map decisions clearly. Is AI making decisions, informing decisions or supporting decisions? Each level carries different risks and requires different safeguards. High-risk decisions need the strongest guardrails, but those guardrails only work if humans maintain real capability to intervene meaningfully.

Level 3: Stakeholder responsibility

Who wins? Who pays? AI optimizes for what you choose to measure. If you measure only shareholder value, guess who bears the cost? Usually customers or employees.

The Air Canada chatbot case from February 2024 illustrates the risk. A chatbot told a customer he could retroactively apply for bereavement fares. The company's website said the opposite. Air Canada argued the chatbot was a "separate legal entity" responsible for its own actions.

The tribunal wasn't buying it: "A chatbot is still just a part of Air Canada's website. It should be obvious that Air Canada is responsible for all information on its website. Companies cannot dissociate themselves from the actions of their AI tools."

In controlled evaluations cited by the New York Times, hallucination rates in controlled chatbot environments run between 3% and 27%. Meanwhile, 19% of consumers who used AI customer service reported seeing zero benefits. The company optimized for cost reduction. The customer paid the price in degraded service.

Stakeholder responsibility demands that you map who captures benefits and who bears costs. If one group gets all the upside while another gets all the downside, you have a responsibility problem that will eventually become a legal problem.

Level 4: Systemic responsibility

What happens when everyone does what you're doing? One company automating customer service gains competitive advantage. An entire industry automating customer service creates collective harm and a race to the bottom. Customer service automation has already contributed to $3.7 trillion in lost revenue globally, according to Qualtrics.

Systemic responsibility forces you to think second-order effects. If your entire industry adopted this approach, what happens to the labor market? What skills disappear from the workforce? What vulnerabilities do you create? What can no longer be done without AI?

These questions matter because industry-wide adoption changes the game for everyone. Your clever optimization becomes everyone's collective problem.

The action framework

Understanding the four levels is necessary. Acting on them is what separates leaders from laggards. Here are three interventions you can implement immediately:

  1. The pre-mortem

Before deploying any AI system, run this exercise with your team: "It's 18 months from now. Our AI deployment has failed spectacularly and embarrassed us publicly. What happened?"

Force your team to imagine specific failures across all four levels:

  • What technical problems did nobody catch in testing?
  • Which groups bore the costs?
  • What organizational dynamics prevented people from speaking up?
  • What became obvious too late?

Then build safeguards for each scenario before deployment. The pre-mortem exercise surfaces risks that optimistic planning sessions miss.

  1. The red team

Assign someone credible to argue why your AI deployment is irresponsible. Give them access to all technical documentation, permission to interview stakeholders, protection from organizational pressure and a formal mechanism to present findings.

This isn't about creating internal opposition. It's about stress-testing your thinking before reality does it for you.

  1. Quarterly responsibility review

Every quarter, score your AI deployments across all four levels.

  • For technical responsibility, ask: Are we monitoring continuously?
  • For operational responsibility: Can our humans intervene effectively?
  • For stakeholder responsibility: Are benefits distributed fairly?
  • For systemic responsibility: Would we be comfortable if everyone did this?

Score each level out of ten. Total score out of 40. Anything below 30 needs immediate attention.

When to pull the plug

Some red flags demand immediate action. If performance is degrading for specific demographic groups, you have a technical problem that could become a legal problem fast. If humans are rubber-stamping AI recommendations without real evaluation, your operational safeguards have failed. If one group captures all benefits while another bears all costs, your stakeholder distribution is unsustainable.

These aren't hypothetical risks. They're patterns visible in every major AI deployment failure of the past two years.

The competitive reality

Here's what most organizations miss: Responsible AI is competitive advantage. Organizations that master this will move faster because they won't suffer embarrassing public failures. They'll deploy more sustainably because they won't need costly rollbacks. They'll build more trust with customers, employees and regulators. And they'll attract better talent.

The organizations winning with AI in 2026 aren't deploying faster. They're deploying more thoughtfully and responsibly.

Related news

Professor of the Practice of Business Analytics, Andrew Banasiewicz, Ph.D., explains why even experienced leaders can be misled by instinct and why grounding decisions in evidence leads to better outcomes. He explores how an evidence-based approach helps leaders cut through bias, make clearer decisions and build more confident, informed organizations.
Read Article