Agentic AI in Software Project Management

Artificial Intelligence · Software Transformation · Project Management

March 9, 2026
Dr. Arnaud Fietzke
Dr. Arnaud Fietzke

Principal Software & AI Engineer

Dr. Markus Pizka
Dr. Markus Pizka

Managing Director & IT Strategy Consultant

The Governance Gap No One Is Talking About

Agentic AI is arriving in enterprise software delivery on its own schedule – faster, in most cases, than the teams responsible for delivery are ready for. Gartner estimates that 40% of enterprise software will feature task-specific AI agents by the end of 2026. By 2028, autonomous agents are projected to account for up to 15% of day-to-day enterprise decisions. The tooling is moving quickly. The governance is not.

That gap is where the real risk lives.

Most of the discourse around agentic AI in software projects focuses on productivity: faster code generation, automated ticket triage, agents that can plan sprints, flag dependencies, and draft architecture proposals in minutes. Those benefits are genuine. But they come bundled with a set of failure modes that are qualitatively different from anything earlier automation has produced – and that most delivery teams are not yet equipped to handle.

McKinsey research puts the scale of the problem plainly: 80% of organizations have already encountered risky behavior from AI agents. At the same time, 62% of enterprises say they are experimenting with agents – but two thirds of those have not begun any meaningful rollout. The organizations running the most experiments are also the ones accumulating the most ungoverned risk.

This article is not an argument against agentic AI in software delivery. It is an argument for building the governance layer before the problems compound.

Not Just a Faster Developer

The first mistake most project leads make is treating agentic AI as a productivity multiplier for existing workflows – a developer that codes faster, a PM that writes status reports in seconds. That framing misses what makes agentic AI genuinely different.

Traditional automation is rule-bound. A deployment script runs a defined sequence of steps. A workflow tool routes a ticket according to fixed logic. Both are deterministic: given the same input, they produce the same output. You can reason about them, test them, and trust that what passed QA yesterday will pass QA tomorrow.

Agentic AI operates on intent. Given a high-level goal – “resolve this issue,” “refactor this module,” “identify blockers in this sprint” – an agent plans its own sequence of actions, selects tools, makes intermediate decisions, and adapts based on what it encounters. It does not execute a script. It reasons toward an objective.

That is what makes it powerful. It is also what makes it ungovernable under the assumptions inherited from traditional project management.

As McKinsey Partner Rich Isenberg has put it: “Agency isn’t a feature – it’s a transfer of decision rights. The question shifts from ‘Is the model accurate?’ to ‘Who’s accountable when the system acts?’” That reframing has direct consequences for how software projects are planned, tracked, and controlled.

Three Failure Modes That Break Standard PM Frameworks

1. Non-Determinism at Scale

When an agent plans its own workflow, the same instruction can produce different outcomes on different runs. On a small, bounded task this is manageable. On a large enterprise codebase with cross-cutting concerns and multiple in-flight features, it becomes a structural problem.

Real-world implementations have found that without a deterministic workflow engine enforcing phase transitions, agents routinely skip steps, create circular dependencies, or get stuck in analysis loops. The agent is capable within a bounded problem scope. It struggles with meta-level decisions about sequencing, dependencies, and the organizational context that experienced project leads carry implicitly.

This non-determinism is not a defect that will be patched away. It is an inherent property of systems that reason toward goals rather than execute fixed logic. Standard PM tools – Gantt charts, sprint velocity tracking, fixed definition-of-done criteria – assume a level of output predictability that agentic systems do not provide.

2. Vanishing Audit Trails

In regulated industries – banking, insurance, pharma, public sector – the ability to reconstruct every decision is not optional. Audit trails underpin compliance, root-cause analysis, and contractual accountability. They are also the first thing that breaks when agents start making intermediate decisions autonomously.

AI agents often function as black boxes. The decision-making logic is neither transparent nor comprehensible, even to their developers. When a human developer makes a significant architectural decision, there is typically a commit message, a PR review, a Jira comment, or at minimum a Slack thread. When an agent makes the same decision as part of a longer autonomous workflow, that rationale may simply not exist.

The EU’s approach to this is hardening fast. The new EU Product Liability Directive, to be implemented by December 2026, explicitly includes software and AI as products – opening enterprises to strict liability if an AI system is found to be defective. Singapore’s IMDA published a draft governance framework for agentic AI in January 2026, following a similar framework from the World Economic Forum in November 2025. Both explicitly acknowledge that existing AI governance frameworks do not adequately cover the risks that agentic systems introduce.

For project leads in regulated sectors, the absence of a structured logging and review layer for agent decisions is not a technical debt item to address later. It is an exposure that exists today.

3. Accountability Blur

When a human engineer makes a mistake on a project, accountability is clear. When an agent makes a mistake – and agents do make mistakes at a rate that scales with the number of agents deployed – the question of who owns that failure becomes genuinely complex.

Is it the engineer who prompted the agent? The PM who approved the agent’s output without reviewing the intermediate steps? The vendor whose model produced the erroneous output? The enterprise that deployed the system without adequate guardrails?

A flaw in one agent can propagate downstream and massively amplify the impact. Agent risk is not just wrong answers – it is wrong answers at scale and at speed, often without a human in the loop at the moment of failure. Vendor supply chain risk compounds this: many agentic deployments rely on layered ecosystems of model providers, tool vendors, and integration partners. An upstream model update can materially change agent behavior downstream, without any direct modification by the enterprise.

Standard RACI matrices, escalation paths, and contractual liability clauses in project governance frameworks were not designed for this. They need to be.

What Good Governance Actually Looks Like

The answer is not to avoid agentic AI. It is to build an oversight layer that matches the autonomy level of the agents being deployed. Several principles are emerging from early enterprise implementations and regulatory guidance.

Bound autonomy before deployment. Before any agent is deployed into a delivery pipeline, its action-space should be explicitly scoped: what tools can it access, what decisions can it make unilaterally, and which actions are irreversible. Singapore’s governance framework calls this out directly – assess and bound risks upfront, narrowing the agent’s access to systems based on what it actually needs. An agent that can read the codebase should not also have write access to production infrastructure.

Make governance a circuit breaker, not a checkpoint. The traditional PM model treats governance as a phase-gate – a review at the end of a sprint or milestone. With agentic systems operating continuously and asynchronously, that model breaks down. Governance needs to be embedded in the pipeline: enforced phase transitions, artifact state machines, escalation triggers when agents encounter uncertainty or edge cases. As one industry leader has framed it: “Governance isn’t a checkpoint anymore; it’s a circuit breaker built into the pipeline.”

Log everything the agent assumes. One of the most practical patterns from real-world agentic implementations is a dedicated knowledge agent that tracks every question an autonomous agent cannot answer from context – and logs the assumption it makes in order to continue. Because those interactions happen through structured tool calls, every assumption appears as structured data in the agent’s output. That log becomes the audit trail. It is not a perfect solution, but it is a tractable one.

Tier your approvals by reversibility. Not all agent decisions carry the same risk. Generating a draft requirement document is low-stakes and easily revised. Merging a branch into main, modifying a database schema, or triggering a downstream integration is not. A tiered approval model – where the threshold for human review scales with the reversibility and blast radius of the action – allows teams to capture agentic efficiency on low-stakes tasks while maintaining meaningful oversight where it matters.

Inventory what is running. As McKinsey has noted, with agentic AI you cannot govern what you cannot see. If agents are not inventoried and identity-bound, enterprises are not scaling agents – they are scaling unknown risk. An agent registry, with documented ownership, defined action-space, and monitoring hooks, is a prerequisite for any meaningful governance program.

Where to Start

The governance gap between agentic AI adoption and agentic AI readiness is real, but it is not insurmountable. The organizations that will benefit most from this technology are not the ones moving fastest – they are the ones moving with structure.

A practical entry point is an AI Readiness Health Check applied to your current delivery pipeline: mapping where agents are already operating or being piloted, assessing the audit trail coverage, reviewing accountability structures, and identifying the highest-exposure gaps before they produce an incident.

At itestra, we have been working at the intersection of enterprise software delivery and AI adoption for over 20 years and with 90+ enterprise clients. That experience tells us that the failure modes of agentic AI are not hypothetical – they are already showing up in live projects, in the form of unexplained regressions, missed compliance requirements, and accountability disputes that standard PM processes are not equipped to resolve.

The productivity case for agentic AI in software delivery is compelling. So is the governance case for getting the oversight layer right before the agents proliferate. Those two cases are not in conflict – but only if you treat governance as a design problem, not an afterthought.