Blogs

Why Spec-Driven Development Is the Operating Model Infrastructure Automation Has Been Missing

Ankit Bhansali

Principal Architect – AI Solutions & Strategy

Key Points

Infrastructure automation teams need a consistent network automation framework for turning intent into a delivered outcome. Spec-Driven Development (SDD) closes that gap by applying structured lifecycle discipline to every automation build, keeping AI agents inside a governed delivery model rather than operating around one. The result is faster delivery, tighter governance, and automation knowledge that survives beyond the engineer who built it.

Most infrastructure automation teams aren’t short on tools. They have platforms, skilled engineers, and increasingly, AI. What they often don’t have is a consistent way to turn customer intent into a delivered automation outcome. That gap matters more now than ever – because AI makes it worse before it makes it better.

AI makes output cheaper and faster. It can generate workflows, configs, and documentation at a pace that would have seemed impossible two years ago. But faster output doesn’t fix a broken delivery model. It exposes it. When delivery is unmanaged, acceleration just means you arrive at confusion sooner.

The Real Problem Is the Operating Model

Infrastructure teams have invested heavily in the platforms, adapters, and orchestration tooling needed to deliver automation at scale. The capability is there. What’s often missing is a delivery motion to go with it.

A use case lands as three lines in a ticket, or a conversation in a meeting that half the team missed. Someone starts building – directly in the platform, no requirements artifact, no design review. Three weeks in, the adapter doesn’t behave the way anyone expected. Constraints that should have been discovered in hour one get discovered in week three, after the implementation is already shaped around assumptions that turned out to be wrong. Design decisions get made inside the build, invisibly, by whoever is holding the keyboard.

The automation ships. It works. But nobody wrote down why it was built that way – what was considered, what was ruled out, what the edge cases are. When the engineer who built it moves on, that knowledge goes with them. The next person to touch it is reading YAML like it’s archaeology.

That’s not a tooling problem. It’s a delivery problem. And it’s been hiding inside infrastructure teams for years because the pace of building has always been slow enough to absorb it. AI removes that buffer.

The Debt Nobody Talks About

When people talk about technical debt in software, they usually mean code that was written quickly, without tests, without abstraction, and now costs twice as much to change as it did to write. That’s real. But infrastructure has a different kind of debt – and it’s harder to see.

Infrastructure automation debt isn’t usually bad code. It’s coupling. It’s the moment a team made a practical decision – we use this IPAM, this ticketing system, this cloud provider – and encoded that decision directly into every automation they built. Not because they were careless. Because at the time, it was the right call. You build for what you have.

The problem surfaces years later. A vendor gets acquired. A contract ends. A platform reaches end-of-life. Suddenly the practical decision from three years ago is load-bearing in a hundred places, and the cost of changing it isn’t one migration – it’s a hundred. Most teams don’t migrate. They absorb the constraint, build around it, and quietly stop automating the things that touch that system. The backlog grows. The ad hoc work grows with it.

This is why infrastructure automation coverage stays low even at mature organizations. Teams make a rational choice: the cost of maintaining automation against a shifting environment is high enough that it’s often cheaper to keep doing certain things manually. So they prioritize. They automate their top use cases and leave the rest. The result is an estate where 20% of processes are governed and the other 80% run on tribal knowledge and heroics.

That tradeoff made sense when automation required a human to write and maintain every line. It stops making sense when the cost of producing governed, tested, documented automation approaches zero.

A Structural Shift Worth Understanding

The infrastructure industry spent twenty years working in one direction: take human knowledge, encode it into machine-executable syntax, and run it. Infrastructure-as-code. Playbooks. Pipelines. The artifact that mattered – the thing you versioned, reviewed, and promoted – was the syntax.

The problem with that model is that syntax ages badly in infrastructure environments. Networks change. Platforms evolve. Vendors get swapped. The intent behind the automation – provision this service, validate this change, enforce this policy – often stays constant for years. But the implementation has to be rewritten every time the environment shifts.

So the syntax becomes a liability. Teams spend more time maintaining existing automation than building new capability. The people who understood why something was built a certain way are gone, and what’s left is a pile of YAML that nobody wants to touch.

What changes with a design-first model is which artifact is durable. The design captures intent – what the automation is supposed to accomplish, what systems are involved, what decisions need to be made along the way.

The syntax is generated from it. When the environment changes, you update the design and regenerate. The implementation is a build artifact, not institutional knowledge. It can be discarded and rebuilt without losing anything that actually matters. That’s the structural shift in the operating model. And it has implications that go well beyond cleaner delivery.

Spec-Driven Development for Infrastructure

Spec-Driven Development (SDD) applies structured lifecycle discipline to infrastructure automation delivery. Each stage produces a named artifact and requires explicit human approval before the next begins. The result is a delivery system where AI agents operate inside a governed model, not around one. The model is five stages:

Requirements → Feasibility → Design → Build → As-Built

Each stage has a purpose. Each stage has an artifact. Each stage has an approval point. Each stage can be driven by an agent, but not hidden by one.

Requirements locks the use case. The Spec Agent helps refine the request into an approved statement of what’s being built, why it matters, what constraints apply, and how success will be measured. The requirements artifact is owned by the person closest to the business need – the network architect who knows exactly what a VLAN provisioning flow should do, the security engineer who understands the firewall policy, the platform owner who knows how service activation is supposed to work. Not a developer translating requirements secondhand. The people who actually know what needs to happen. Most downstream failure in infrastructure delivery isn’t caused by poor implementation. It’s caused by requirements that were never explicit – captured informally in a Slack thread or a meeting that half the team missed. This stage forces that clarity before anything else happens.
Feasibility answers a different question: can the platform actually support this? Before design begins, the Solution Architecture Agent checks live platform reality: what adapters are available, what integrations exist, where the dependencies are, what’s been built before that could be reused. The outcome is a real decision: feasible, feasible with constraints, or not feasible. The agent asks the human when it doesn’t have enough to go on. It surfaces constraints early, when changing direction costs almost nothing, instead of letting teams discover them during the build, when the cost is highest.
Design is where intent becomes a concrete implementation plan: what gets built versus reused, which components are required, how adapters are wired, what the dependency order is, how the solution will be tested, and how acceptance criteria maps to validation. The design is specific to the customer’s environment. Two organizations running the same use case get two different designs, because one is running on their actual platform, with their actual integrations, against their actual constraints. Another human approval gate here, because design is the last moment where changing direction is cheap.
Build is where AI becomes genuinely powerful, but only because earlier stages removed the ambiguity. The Builder Agent implements an approved design. It isn’t inventing the delivery model on the fly. The build happens in a controlled sequence: child workflows first, validation before dependency layering, parent orchestrator last, acceptance criteria verified against what was actually delivered. The output is a complete, tested set of automation assets – not a prototype that needs cleanup.
As-Built is the most neglected stage in infrastructure delivery, and the one that determines whether knowledge survives. Most teams treat delivery as done when the automation works. But the as-built record is what makes the next iteration possible. It captures what was actually delivered, where it deviated from the design and why, and what changed from the original requirements. When the environment shifts and the automation needs to change, the team isn’t reverse-engineering what was built and why. They’re starting from a document that already answers those questions.

One thing worth being honest about: speed depends on the humans, not the agents. With clean intent and stakeholders who can make decisions, moving from requirements through a working build in a single day is realistic. In most enterprises it will take weeks – requirements live with one team, feasibility with another, design review needs a third. That’s fine.

What changes isn’t the calendar. It’s the quality of every handoff, the tightness of every feedback loop, and the fact that when a stage needs to iterate, the reasoning is visible and the artifact is right there.

Three Agents. Five Stages. Full Traceability.

Itential builder-skills turns this model into something runnable — a set of agent skills that implement the SDD lifecycle on the Itential Platform.

The system is intentionally small:

Spec Agent – owns requirements
Solution Architecture Agent – owns feasibility and design
Builder Agent – owns build and as-built

Three agents because the goal isn’t agent proliferation. The goal is lifecycle clarity. Each agent has a defined responsibility. Each stage produces a named artifact. Each artifact becomes context for the next stage.

That gives you full traceability across the path from customer intent to delivered automation: what was agreed, what the platform supported, what design was approved, what was actually built, and what changed along the way. That traceability matters most when something breaks in production six months later, or when the engineer who built it is gone, or when a stakeholder asks why the automation works the way it does. The answer exists. It’s in the artifacts.

That is the difference between assisted generation and a delivery operating model. The skills are published in the Anthropic Claude marketplace – installable in Claude Code with a single command, no custom tooling required.

Why This Matters Now

The infrastructure industry is at a familiar inflection point. Automation tooling is maturing faster than delivery practice. Teams are adding AI into the stack while still running delivery through undocumented, person-dependent, late-discovery workflows.

There’s also a governance question that gets louder as AI takes on more of the implementation work. When an experienced engineer builds an automation, there’s implicit trust in the outcome – people understand who built it, what they knew, and where to ask questions when something breaks. When an agent builds it, that implicit trust disappears. Suddenly everyone wants to know: what did it build, how was it tested, who approved it, and where’s the audit trail? Those are good questions. They deserve real answers, not reassurances.

A governed delivery lifecycle is part of the answer. Not because it slows AI down – but because it gives every stakeholder, from the engineer to the auditor, a clear record of every decision, every approval, and every artifact from requirements through delivery. The agent operates inside a system people can inspect, not as a black box they have to trust.

The deeper opportunity is what changes when the cost of building governed automation stops being the constraint. Most infrastructure organizations make a rational tradeoff: automate the high-frequency, high-value processes and accept that everything else will be handled manually. Remove that cost and the tradeoff disappears. The question stops being which processes the team can afford to automate and becomes which decisions actually require a human. That’s a much more interesting question – and it leads to a much better operating model.

The design is the intent. The syntax is the operational model you choose. When those two things are separate – when the design is the durable artifact and the implementation is generated from it – infrastructure teams can change the shape of a solution without changing its purpose. Swap platforms. Change vendors. Evolve from heavy human review toward more autonomous execution as confidence grows. The business logic stays intact through all of it. That’s not something the infrastructure industry could do at scale before.

The repo is public. There are a few ways to start depending on where you are: run the full lifecycle from scratch with new work, use FlowAgent to Spec to move exploratory agent output into a real delivery path, or use the spec extraction mode to bring existing undocumented automation into a governed state. If you work in infrastructure automation, network automation, or platform engineering, try it, challenge it, and build on it.

Because when specs are real, feasibility happens before build, design is a genuine artifact, and as-built documentation is part of the baseline – infrastructure delivery stops being something that happens to your team and starts being something your team controls.

Ankit Bhansali

Ankit Bhansali is a Principal Architect – AI Solutions & Strategy at Itential. Drawing on a strong research background in software and networking, he designs innovative solutions to address the industry’s most complex challenges. His strategic approach empowers businesses to achieve transformative growth through robust automation and end to end orchestration.

Keep Learning