What AI Agents Actually Are
The term has been diluted to the point of uselessness. Here's what agents actually are, why most production deployments fail, and what the infrastructure gap looks like from the inside.
Founder, Majhi Group & Majhi OS

Most people using the term "AI agent" are describing a chatbot with extra steps. I know this because I spent months building what I thought were agents before I understood what actually separates an agent from a very capable function.
The distinction matters more than the vocabulary suggests — and the gap between hype and production reality is larger than most organisations realise when they start.
The market is moving fast. The executions are not.
The global AI agents market was valued at approximately $7.6 billion in 2025 and is projected to reach $10.9 billion in 2026, growing at a compound annual rate above 40% through the decade. Gartner projects that 40% of enterprise applications will embed task-specific agents by end of 2026 — up from under 5% a year earlier.
40% of enterprise applications will embed AI agents by end-2026. Under 5% had them a year ago.
That adoption curve is real. What's also real: the executions failing to match the ambition. Fiddler AI's analysis found that AI agents fail between 70% and 95% of the time in production environments, depending on task complexity. A separate analysis found that 88% of AI agent projects never reach production at all.
88% of AI agent projects fail before reaching production. Fewer than 1 in 8 agent initiatives successfully make it to live operation.
Gartner's 2025 research goes further: over 40% of agentic AI projects are forecast to be cancelled by 2027, driven by unclear ROI and inadequate risk controls. IBM's 2025 CEO study found only 25% of AI initiatives delivered expected ROI.
Understanding why requires understanding what agents actually are — and where most implementations break.
The loop is the thing
An AI agent perceives its environment, makes a decision, takes an action, and observes what happened — then uses that observation to decide what to do next. That four-part loop is what makes it an agent. Without the feedback loop, you have a tool. A very impressive tool, but a tool.
When I started building Majhi OS — autonomous hiring operations infrastructure — the first version was a tool pretending to be an agent. It could generate a recruiting sequence, analyse a mandate's health score, identify where a pipeline was degrading. But it couldn't observe what happened after it acted, and it couldn't adjust. Every output was a terminal state. I had built an excellent report generator and called it an autonomous system.
The moment it became an agent was when it could observe that the outreach it triggered hadn't moved response rates, form a hypothesis about why, modify the approach, and try again — without me intervening between each step. That feedback loop changed the nature of the system entirely. Not the model. Not the prompting. The loop.
Why multi-step workflows collapse
The mathematics of multi-step agents are unforgiving. If an agent has an 85% success rate at each individual step — which sounds reasonable — and must complete eight sequential steps to finish a task, the probability of completing the full workflow correctly is approximately 27%.
At 85% per-step accuracy across 8 steps: 27% overall task completion. That's the reliability math most agent roadmaps don't account for.
This is not a model capability problem. The best GPT-4-based agents achieved a 14.41% end-to-end task success rate on the WebArena benchmark — compared to human performance of 78.24%. The gap isn't intelligence. It's error compounding across steps, and the lack of infrastructure to catch and recover from failures mid-workflow.
In hiring systems, this problem has real consequences. When I'm running a VP-level search through Majhi Group, a wrong intermediate action — the wrong message to a candidate, the wrong status update to a client — can damage a relationship that took weeks to build. That's not a model problem. It's a system design problem, and it's why the agentic layer in Majhi OS is built around failure recovery first, autonomy second.
What I got wrong when I started
I assumed the intelligence was the hard part. It isn't. Language models are, by now, capable enough for most business tasks. The hard part is the infrastructure around the model: state management, error handling, deciding when to proceed versus escalate, and — most underestimated — knowing when the agent is wrong.
Enterprise AI spending reached $37 billion in 2025, more than triple the 2024 figure of $11.5 billion. That acceleration is largely going into model access and tooling. The unsexy infrastructure work — observability, state persistence, failure recovery — remains the execution gap that explains the 70%+ failure rates.
I see the same pattern in the hiring technology companies I work with. They invest in the AI layer. They underinvest in the operational scaffolding that makes the AI layer reliable. The result is a system that performs well in demos and fails in production — exactly what the failure statistics describe.
The three patterns most people conflate
When someone tells you they're "using AI agents," they're usually describing one of three things:
Single model with tool access. One model that can call APIs, read files, run searches. It decides when to use each tool and synthesises results. This is useful. It is not autonomous. Most of what is currently being sold as "agentic AI" is this pattern.
Multi-agent with orchestration. Multiple specialised models coordinated by an orchestrating layer. Each handles a narrower task. The orchestrator routes work, handles exceptions, and aggregates. This is where genuine operational value starts to appear, and where most serious enterprise deployments are heading.
Long-horizon autonomous systems. Systems that maintain state across many steps and operate over hours or days. These exist in research and specialised deployments. They are not yet routine infrastructure — despite being the pattern most vendors are pitching.
Knowing which pattern you're actually buying — or building — changes every decision that follows.
The infrastructure gap from the inside
Building Majhi OS, I spend most of my time not on the model but on the scaffolding. Making sure the system knows what it's already tried. Making sure a failed outreach sequence doesn't orphan a mandate's state. Making sure I can tell the difference between an agent that's working correctly and an agent that's confidently wrong.
That last problem — detecting confident wrongness — is harder than it sounds, and it's the one that causes most production failures. Scope creep and data quality issues account for 61% of all AI agent failures in enterprise deployments. Both are scaffolding problems, not model problems.
This is visible at the regional level too. In India — where I've placed C-suite leaders across Odisha, Bengaluru, and Mumbai — the organisations building serious AI capability are not the ones with the most sophisticated models. They're the ones that have invested in clean operational data and the infrastructure to act on it reliably. The model is almost incidental. The scaffolding is everything.
The practical implication
If you're evaluating whether to deploy agents in your operations: the test isn't whether the model is smart enough. It almost certainly is. The test is whether you've designed the feedback loop, whether you understand your failure modes, and whether you can tell when the system is wrong before the consequences become expensive.
Start with a task where wrong intermediate actions are recoverable. Build the loop. Measure output quality at each step. Then expand. The organisations getting value from agents in 2026 are not the ones with the most ambitious autonomous roadmaps. They're the ones with the clearest definition of success for each individual step, and the discipline to not expand until the previous step is reliable.
That's a slower approach. It's also the one that compounds.
Sources: Fiddler AI — AI Agent Failure Rate · Digital Applied — 88% Failure Before Production · Gartner / Joget — AI Agent Adoption 2026 · Paul Okhrem — Enterprise AI Statistics · WebArena Benchmark — arxiv · Digital Applied — 120+ Enterprise Data Points
Did this land? Push back? Add something I missed?
Reply to Manas →Continue Reading
Related writing
Future of WorkHow Companies Should Actually Think About AI Adoption
88% of organisations use AI somewhere. Fewer than 40% have scaled beyond a pilot. The gap isn't technology — it's how companies frame the problem from the start.
Future of WorkThe Knowledge Problem in AI
AI captures what was written down. The most valuable knowledge in any organisation was never written down. That gap is not a bug in the technology — it's the central challenge of deploying AI that actually works.
What the History of Automation Actually Tells Us About AI
The optimists cite history to argue we shouldn't worry. The pessimists cite history to argue this time is catastrophically different. Both are misreading it. The history of automation is more complicated — and more honest — than either side is willing to be.