Spec-Driven Development - Network & Infra Automation

TL;DR

What’s In This Guide

The SDD Framework

What Spec-Driven Development is, how the five phases work, what each produces, and how the two approval gates govern phase transitions. Includes the core principles and the execution model that keeps builds governed.

Why It’s Faster & Where It Compounds

How SDD affects time to value on initial build, and why the real payoff is Day 2 – every extension, debug session, and handoff starting from an as-built record instead of archaeology.

SDD Applied to Network & Infrastructure Automation

How the operating model maps to real network automation work, what it changes organizationally, how AI-driven and agentic operations fit within it, and why a governed execution platform is what makes SDD enforceable at scale.

Who This Guide is For

Network & Infrastructure Engineers

See SDD applied to the work you actually do — provisioning, compliance, patching, change management — and understand what each phase requires from you.

Automation Managers & Leads

See what consistent SDD application produces across an automation estate — measurable delivery, transferable knowledge, and a compounding asset base rather than recurring discovery work.

VPs, Directors & CTOs

The case for a governed execution platform — why SDD at scale requires system enforcement, not discipline, and what that means for compliance, AI-readiness, and operational trustworthiness.

The Operating Model Gap

Why Network & Infrastructure Automation Needs a Governed Operating Model

Most network and infrastructure teams have automation. They have orchestration platforms, cloud APIs, integration libraries, AI-assisted tooling, and years of accumulated automation scripts. What most don’t have is a governing operating model – a defined system for how a user request becomes an approved requirement, how that requirement becomes a tested design, how that design becomes working automation, and how every engagement produces a record the next team can actually start from.

Without that operating model, automation accumulates. It doesn’t compound. Each new workflow requires the same discovery the last one required. Each Day 2 engagement starts by reverse-engineering what was built and why. Each team rotation loses the institutional knowledge that left with the previous engineer. The automation estate grows in size but not in trustworthiness.

Network and infrastructure automation are the domains where this gap is most costly – because both operate against production systems where the blast radius of informal practice is operational risk, not a UI bug. A misunderstood requirement in network automation produces a routing change that takes a service down. In infrastructure automation, it produces a server fleet patched to the wrong standard, a cloud environment provisioned outside approved IAM boundaries, or a compliance automation enforcing an interpretation that was never formally agreed. These aren’t documentation problems. They’re incidents, audit findings, and FinOps surprises.

Informal Practice

Ticket → Someone figures it out

✗Requirements captured informally — tickets, Slack threads

✗Design happens during build, never formally reviewed

✗Scope changes absorbed silently into the automation

✗Knowledge lives with the engineer who built it

✗Every Day 2 engagement starts from archaeology

Spec-Driven Development

Request → Governed operating model

✓Requirements captured in a written spec, approved at Gate 1

✓Design reviewed and approved before build begins

✓Scope changes visible at gates — not discovered at deployment

✓As-built record produced after every engagement

✓Every Day 2 engagement starts from reconciled truth

Spec-Driven Development fixes this structurally – not by adding better documentation practices or more rigorous ticket templates, but by defining the operating model that governs how every engagement moves through five phases: requirements, feasibility, design, build, and as-built reconciliation. Two approval gates enforce the phase boundaries. Two artifacts – the approved spec and the approved solution design – govern what gets built and how. The as-built record closes the loop, making every engagement a better starting point for the next.

This guide covers SDD applied to network and infrastructure automation – the use cases your team actually runs, why scale makes manual compliance structurally unreliable, and what a governed execution platform enforces that discipline alone cannot. If you’re looking for the foundational framework, start with Guide 1.

4 Use Cases

SDD Applied: Network & Infrastructure Automation

Four use cases across both domains – two network, two infrastructure – showing what SDD produces when it’s applied, what each gate protects against, what the as-built record enables on Day 2, and what makes the operating model hold in practice.

Network Automation

Use Case 01 – Provisioning

Multi-Domain Network Provisioning

The use case where scope drift is most expensive

01

Multi-Domain Provisioning – SDD Phase Flow

All integration surfaces locked at Gate 1 – before any environment access

Requirements

IPAM · CMDB · NMS
Devices · Ticketing

G1

Spec locked

Feasibility

API inventory
Reuse candidates

→

Design

Component map
Integration sequence

G2

Design locked

Build

Leaf first
Test each layer

→

As-Built

Deviations documented
Reuse catalogued

Day 2 Reuse

Next provisioning engagement starts from the as-built record – not a blank feasibility run

Without SDD

Integration scope is discovered during build. One system’s constraints silently reshape what gets built for another. What gets provisioned reflects what the environment supports, not what was requested. Nobody agreed on the scope before environment access began.

Result: the delivered automation works for today’s environment – but the requirement the consumer asked for was never formally agreed. The next provisioning request starts with the same discovery process from scratch.

With SDD

R

Requirements: All integration dependencies – IPAM, CMDB, NMS, device scope, ticketing – captured before any platform access. Acceptance criteria locked in writing.

G1

Gate 1: Spec approved. All integration surfaces locked before any environment work begins.

F

Feasibility: API availability inventoried, data model compatibility confirmed, reuse candidates identified – all scoped by the approved spec.

D

Design: Component inventory, integration sequence, reuse decisions locked before a single workflow component is built. Approved at Gate 2.

AB

As-built: Platform-specific adaptations and reuse vs. rebuild decisions documented. Authoritative baseline for the next similar provisioning engagement.

What enforces this

Gate 1 must be a system boundary, not a verbal sign-off. In environments with multiple integration surfaces across multiple teams, “spec approved” in a document someone emailed is not enforcement. A governed execution platform records Gate 1 as the condition for environment access – no authentication, no API calls, no discovery until the spec is formally approved in the system.

Who this matters to:

Network Engineers
NOC / Service Delivery
Automation Leads
Service Architects

Use Case 02 – Compliance

Configuration Compliance & Drift Detection

The use case where an undocumented interpretation becomes a compliance liability

02

Configuration Compliance – What “Compliant” Means Without vs. With SDD

Without SDD

“Compliant” = engineer’s interpretation

✗No formal policy definition reviewed before build

✗Compliance logic encodes one engineer’s reading

✗Audit cannot prove the definition was approved

✗Policy updates silently change comparison logic

With SDD

“Compliant” = Gate 1 approved definition

✓Compliance definition in spec, approved before build

✓Build implements the approved definition exactly

✓Audit has Gate 1 record + as-built as evidence

✓Policy update triggers Gate 1 amendment, not silent change

Without SDD

“Compliant” is defined by the engineer who built the comparison logic, based on their interpretation of the policy. Nobody reviewed that interpretation before it was encoded in the automation.

When audited, the team cannot demonstrate the compliance definition was formally approved. When policy changes, the engineer updates the comparison logic without review or record.

With SDD

R

Requirements: Compliance definition captured – which policy, which devices, what constitutes drift, what remediation is in scope, what the audit artifact must contain.

G1

Gate 1: Compliance definition approved in writing before any device platform access.

G2

Gate 2: Remediation scope & logic approved before build begins. The build executes the approved definition – it doesn’t interpret it.

AB

As-built: What the automation actually enforces documented. Policy updates require a new Gate 1 amendment – not a silent code change.

What enforces this

The compliance definition must be an artifact the platform gates build against – not a document in a folder. When Gate 2 is enforced as a system boundary, the compliance logic the builder implements answers to the approved design. Without platform enforcement, the build interprets the design. With it, the build executes it.

Who this matters to:

Network Engineers
Security Ops
Compliance & Audit Teams
VPs & CTOs

Infrastructure Automation

Infrastructure automation operates against a different surface – cloud APIs, compute fleets, IaC toolchains, and server estates – but it carries its own category of blast radius. A misconfigured IAM policy provisioned at scale grants cloud access nobody approved. A CIS benchmark applied at the wrong level enforces the wrong security standard across hundreds of servers. A drift remediation workflow that runs against an unapproved baseline corrupts the configuration estate it was supposed to protect. The operating model problems are identical to network automation. So is the structural cost of applying it without enforcement.

Use Case 03 – Cloud & Compute

Cloud Resource Provisioning & Lifecycle

The use case where IAM scope, cost allocation, and compliance evidence are all decided during build – or decided in the spec

03

Cloud Resource Provisioning – Where Scope Drift Enters Without SDD

Without SDD

✗Tagging policy decided during build

✗IAM scope not formally agreed

✗Budget limits applied inconsistently

✗CMDB entries incomplete or missing

With SDD

✓Tagging standards in spec, approved at Gate 1

✓IAM scope defined before any cloud access

✓Budget guardrails in solution design (Gate 2)

✓CMDB update scope locked before build

Without SDD

The request says “provision a dev environment.” What that actually means – which cloud, which instance types, which IAM roles, what tagging policy, which CMDB fields, what budget guardrail – gets decided by the engineer doing the work. Those decisions are never formally agreed, never reviewed, and never documented.

Result: inconsistent environments across teams, FinOps unable to attribute cost, CMDB out of date on day one, and IAM permissions that expand beyond what was actually needed because nobody defined the boundary up front.

With SDD

R

Requirements: Cloud provider, resource types, instance sizing, IAM scope, tagging standards, CMDB fields, budget guardrails, and lifecycle policy all captured before any cloud API is called.

G1

Gate 1: Spec approved. Cloud scope, IAM boundaries, and tagging policy locked before environment access begins.

F

Feasibility: Available cloud APIs, existing IaC modules for reuse, CMDB integration compatibility, and cost allocation constraints assessed against the approved spec.

D

Design: IaC module selection, resource dependency order, tag enforcement logic, CMDB update sequence, and lifecycle automation all specified before build begins.

AB

As-built: Actual resource configuration, IAM grants made, CMDB records created, and any deviations from the approved design documented. Authoritative for teardown and future environment requests of this type.

What enforces this

Cloud provisioning automation touches IAM, billing, CMDB, and production-adjacent environments simultaneously. Gate 1 as a system boundary – not a verbal agreement – ensures that tagging policy, IAM scope, and cost allocation are agreed before any cloud API is called. The as-built record is what FinOps, security, and the next infrastructure team member all need – and it only exists reliably when the platform produces it as a required output.

Who this matters to:

Cloud Ops Engineers
Platform Engineers
FinOps Teams
Security Ops

Use Case 04 – Server Compliance

Server Configuration Compliance & Hardening

The use case where “hardened” means whatever the automation engineer thought it meant

04

Server Hardening – Who Approved the Standard Being Enforced?

Without SDD

“Hardened” = engineer’s CIS interpretation

✗No formal hardening standard reviewed before build

✗Exemptions decided during implementation

✗Audit can’t prove the standard was formally approved

✗Standard drift: different servers, different interpretations

With SDD

“Hardened” = Gate 1 approved standard

✓CIS level, exemptions, and scope approved before build

✓Remediation logic approved at Gate 2 before deployment

✓Audit has Gate 1 record + as-built as proof

✓Standard updates require Gate 1 amendment – not silent change

Without SDD

“Harden these servers” is a work order with no formal definition of what hardened means. The engineer applies CIS benchmarks at whatever level they judge appropriate, grants exemptions where application dependencies require them, and ships automation that enforces their interpretation – without any of those decisions being formally reviewed.

When audited, the team cannot demonstrate that the hardening standard was formally approved, who authorized the exemptions, or whether the same standard is being applied consistently across server fleets.

With SDD

R

Requirements: CIS benchmark level, applicable server scope, known application dependencies requiring exemptions, audit artifact format, and remediation vs. detect-only scope captured before any server access.

G1

Gate 1: Hardening standard and exemption list formally approved before any server configuration is assessed.

G2

Gate 2: Remediation logic, rollback behavior, and audit output format approved before implementation. The automation enforces the approved standard – it doesn’t interpret it.

AB

As-built: Actual controls enforced, exemptions applied, and any platform-specific adaptations documented. Standard updates require a Gate 1 amendment – not a silent policy change buried in a script.

What enforces this

Server hardening automation touches security policy, application availability, and audit evidence simultaneously. The approved spec is the formal record that “hardened” means a specific, agreed thing – not whatever the build produced. Without that artifact and the gate that approved it, every compliance audit is an assertion rather than evidence.

Who this matters to:

Infrastructure Engineers
Security Ops
Compliance & Audit
CISOs & VPs

See SDD in Action

Watch Itential execute the full SDD motion – from approved spec to deployed automation – or talk to our team about applying the operating model in your environment.

Watch the demo

Reconciliation

The As-Built Discipline & What It Produces

The as-built phase is the most common place where SDD discipline breaks down in practice. The automation works. The ticket is resolved. Writing the as-built record feels like overhead after the win – and without a system requiring it, it gets deferred, abbreviated, or skipped entirely.

What that produces over time is a growing gap between approved artifacts and operational reality. Debugging gets harder because the design document no longer reflects what runs. Reuse gets less reliable because the as-built record doesn’t exist or doesn’t capture platform-specific adaptations. Compliance claims become unverifiable because the chain of authority ends at delivery.

When reconciliation holds – because a platform produces the as-built record as a required phase output rather than a document an engineer writes from memory – the automation estate compounds in value. Each engagement produces a reconciled baseline the next one starts from. Reuse is identified rather than rediscovered. Day 2 work starts from truth, not from the workflow code.

Without reconciliation, the operating model produces successful outcomes that don’t compound. Each future engagement on the same use case must rediscover what the previous one resolved. With reconciliation, every engagement becomes a better starting point – and the automation estate becomes progressively easier, not harder, to operate.

By Role

What SDD Changes & Why It Matters By Role

When SDD is enforced as a system rather than practiced as a discipline, the automation estate compounds in value with each engagement. That compounding effect lands differently depending on where you sit – and what you’re accountable for.

Network & Infrastructure Engineers

→ Every engagement starts from an approved spec and an approved design – not from a Slack thread or a stale ticket interpretation

→ Deviations are documented, not absorbed into the workflow code. The as-built record protects the next engineer from archaeology

→ Day 2 work – extensions, debug sessions, modifications – starts from the reconciled baseline, not from reverse-engineering what was built

→ Reuse is identified during design, before build, when the full asset inventory can actually be assessed

→ The compliance audit trail is a byproduct of the delivery process – not a documentation sprint after the fact

Automation Managers & Leads

→ Delivery is measurable: time to approved spec, deviation rate, reuse rate, reconciliation completeness across the estate

→ Scope changes are visible at approval gates – not discovered at deployment when they’re expensive to address

→ Onboarding is faster because knowledge is in artifacts, not in the heads of engineers who may have left

→ The operating model that makes AI-assisted and agentic automation governable is already in place – you don’t need to retrofit governance when agents are ready

→ Consistent delivery regardless of which engineer handles the request – the model scales without scaling individual expertise

VPs, Directors & CTOs

→ The automation estate compounds in value with each engagement rather than accumulating technical debt that compounds in cost

→ Compliance is demonstrable through artifact chains – Gate 1 records, Gate 2 decisions, deviation logs, as-built reconciliation – not asserted through memory

→ The case for AI acceleration is credible because the governance layer is already in place. Agents run inside a governed model – they don’t require a new one

→ Infrastructure and network teams shift from cost centers that absorb tickets to strategic partners delivering trusted, auditable automation at scale

Why Enforcement Matters

Why Scale Demands More Than Discipline

SDD is the right operating model. The question at enterprise scale isn’t whether the framework is correct – it is whether discipline alone can hold it consistently across the operating conditions that define network and infrastructure automation. It can’t. Not because engineers fail, but because the environment works against it structurally. These four conditions are why enforcement matters.

Multi-Vendor, Multi-Domain Complexity

Every integration surface is a potential source of undocumented scope discovery during build. Network provisioning touches IPAM, CMDB, NMS, and multiple device vendors. Cloud provisioning touches AWS/Azure/GCP APIs, IAM, FinOps tagging, and CMDB simultaneously. Each surface has its own API behavior and compatibility constraints. Gate enforcement across all of it requires a system of record – not a document someone emailed.

High Blast Radius – in Both Directions

In application development, an undocumented design decision produces a UI that needs to be redesigned. In network automation, it produces a routing change that takes a service down. In infrastructure automation, it produces a server fleet patched to the wrong standard or a cloud environment with IAM permissions that exceed what anyone approved. The discipline cost of SDD is the price of not having an incident.

Volume & Team Scale

A team running five automation engagements per quarter can manage SDD through shared discipline. A team running fifty – across network engineers, infrastructure engineers, cloud ops, and multiple project managers – cannot. Gate approvals that rely on verbal sign-offs fail when the approver changes. As-built records that rely on individual initiative fail when team composition changes. Both happen constantly at scale.

Delivery Pressure & AI Acceleration

Tight timelines are the operating condition, not the exception – and delivery pressure is the most reliable predictor of gate bypass. AI amplifies every one of these pressures across both domains: it compresses timelines, makes it trivial to generate spec-shaped artifacts nobody enforces, and turns “is the as-built accurate?” into the entire governance question.

The operating model is right. The environment demands a system to hold it.

These aren’t failures of SDD – they are the operating conditions that make a governed execution platform necessary. The framework defines what should happen at every phase. The platform is what makes it happen consistently, regardless of who’s on the project, how much pressure the deadline carries, or how many engagements are running in parallel. That is the difference between SDD as a practice and SDD as a system.

Where Manual Compliance Degrades

What Happens Without Enforcement: Five Specific Failure Modes

With the operating conditions established – complexity, blast radius, volume, delivery pressure – here is exactly how manual SDD compliance degrades in practice, and what that produces.

🚧

Gate enforcement degrades under pressure

Without a system that enforces the gate as a precondition for the next phase – that refuses to authorize build until the design is formally approved – the gate is optional under pressure. When the project is running late, Gate 2 becomes a conversation instead of a decision. The gate exists in the process document. It doesn’t exist in practice. And it will be bypassed, consistently, across team changes and project cycles.

📄

As-built records don’t get written

Without a platform that produces the as-built record as a required phase output, it gets deferred, abbreviated, or skipped. The artifact exists in theory. What runs in production diverges from it quietly, engagement by engagement. Over time, the gap between what was approved and what actually runs becomes the source of every Day 2 archaeology problem – forcing every extension to start from the workflow code instead of a reconciled baseline.

🔀

Deviation tracking disappears

During build, engineers encounter conditions the approved design didn’t anticipate. Without a platform requiring deviation documentation as a condition of build completion, those conditions get resolved through engineering judgment – absorbed into the implementation without a record. The workaround works. Nobody records why. The next engineer inherits it with no context. Deviations compound into future archaeology.

♻️

Reuse identification fails at volume

At scale – hundreds of automation assets across multiple teams and domains – manual reuse assessment breaks down. Engineers build what they know rather than discover what exists. The automation estate accumulates redundant assets that fragment rather than compound. Each engagement starts closer to scratch than it should. The estate grows in size but not in trustworthiness.

🔍

Compliance claims become unverifiable

“We followed SDD” is an assertion. “Here is the Gate 1 approval record, the Gate 2 decision, the deviation log, and the reconciled as-built artifact” is evidence. Without a platform producing that artifact chain automatically – as a byproduct of enforcing the operating model – compliance claims rest on individual memory and document management discipline. Both degrade over time and across team changes.

The Platform Argument

What a Governed Execution Platform Enforces & What DIY Cannot

A functional description of what enforcement produces at each phase – what a governed, deterministic execution platform does that manual discipline cannot reliably do at scale across either domain.

SDD Phase

DIY SDD – What Degrades

Platform-Enforced – What Holds

Requirements

Gate 1

Gate 1 is a document sign-off or verbal approval. Feasibility begins before the spec is stable under time pressure.

Gate 1 is a system boundary. No environment access, no API calls, no discovery until Gate 1 is formally recorded in the system.

Feasibility

Reuse identification is manual and incomplete at volume. Feasibility findings aren’t connected to the spec that authorized them.

Platform-assisted asset inventory surfaces reuse candidates. Feasibility is scoped by the approved spec as a system constraint.

Design

Gate 2

Gate 2 is a design review meeting. Build begins when the project manager says proceed. Design and build are not formally connected.

Gate 2 is a system boundary. The approved design becomes the execution contract. Build cannot begin until Gate 2 is recorded.

Build

Deviations are resolved through engineering judgment without documentation. The approved design and the implementation quietly diverge.

Deviations are surfaced as documented conditions. Build executes against the approved design as the platform’s authoritative reference.

As-Built

As-built documentation is written after delivery when time allows – abbreviated, delayed, or skipped. Reflects memory, not reality.

Required phase output – what was built, what deviated, what the production state is – generated as the platform closes the engagement.

AI & Agentic

AI artifacts look governed but aren’t enforced by anything. Agents define their own scope. As-built records disconnected from execution.

Agents operate within platform-enforced boundaries. The spec is the agent’s authorization limit. Every engagement leaves an auditable artifact chain.

The table above is the difference between SDD as a practice and SDD as a governed system. Guide 3 covers how Itential’s platform enforces every row in this table – for human execution, AI-assisted execution, and agentic operations at scale.

The Platform Requirement

SDD Requires a Platform to Hold at Scale

Every section of this guide makes the same structural argument: SDD works when it’s enforced. Requirements approval, design review, as-built reconciliation – each one produces the outcomes described above when it happens consistently. None of them happen consistently without a platform that makes them preconditions rather than conventions.

The platform requirement is not a product argument. It is a logical conclusion. The five failure modes in the previous section – gate bypass under pressure, as-built records skipped at delivery, deviations absorbed silently, reuse missed at volume, compliance claims that can’t be verified – are all structural consequences of treating the operating model as a team discipline rather than a system boundary. Discipline degrades. System boundaries don’t.

The Itential Platform

Itential is the agentic operations platform for network and infrastructure automation. It is what makes SDD executable at scale – not as a practice teams try to maintain, but as a governed system that holds regardless of team size, delivery pressure, or how much AI acceleration you bring to the process. The platform has three capabilities that make this possible.

Every Integration Surface as a Governed API Skill

Every system your automation touches – Cisco, Juniper, AWS, Azure, ServiceNow, Ansible, Terraform, IPAM, CMDB, and hundreds more – is exposed as a governed API skill. Not open API access engineers manage themselves. Skills that execute against an approved spec, within defined boundaries, with every action traceable.

Deterministic Execution Against the Approved Design

Workflows, lifecycle models, and compliance automation execute against the approved design as a locked execution contract. Gate 1 is a system boundary – no environment access until the spec is approved. Gate 2 is the contract build answers to. The as-built record is a required output, not something someone writes afterward.

Governed Execution for Human & Agentic Operations

The same governed stack that holds for human-executed automation holds for AI agents running at machine speed. The spec is the agent’s authorization boundary. The approved design is the execution contract. Every engagement – human or agentic – produces an auditable artifact chain. Guide 3 covers how agents operate inside this model.

The operating model is the same whether a human or an agent executes it. The gates are system boundaries either way. The approved design is the execution contract either way. The as-built record is a required output either way. Guide 3 covers how Itential’s agents execute inside that same governed model – and how the trust progression from AI-assisted to autonomous operations works in practice.

Continue the Series

GUIDE 01

What Is Spec-Driven Development?

Read Guide 1 →

GUIDE 02

SDD for Network Automation

You Are Here

GUIDE 03

Agentic SDD & Autonomous Network Operations with Itential

Read Guide 3 →

Spec-Driven Development for Network & Infrastructure Automation

What’s In This Guide

The SDD Framework

Why It’s Faster & Where It Compounds

SDD Applied to Network & Infrastructure Automation

Who This Guide is For

Why Network & Infrastructure Automation Needs a Governed Operating Model

SDD Applied: Network & Infrastructure Automation

Network Automation

Infrastructure Automation

See SDD in Action

The As-Built Discipline & What It Produces

What SDD Changes & Why It Matters By Role

Network & Infrastructure Engineers

Automation Managers & Leads

VPs, Directors & CTOs

Why Scale Demands More Than Discipline

What Happens Without Enforcement: Five Specific Failure Modes

What a Governed Execution Platform Enforces & What DIY Cannot

SDD Requires a Platform to Hold at Scale

The Itential Platform

Go Deeper on Spec-Driven Development