Spec-Driven Development for Product Managers: Beyond Vibe Coding

Why Vibe Coding Has a Ceiling

Andrej Karpathy coined "vibe coding" in February 2025: accept all LLM suggestions, move fast, iterate by feel. For a solo developer building a weekend project or a PM prototyping a concept in v0, it was a genuine unlock. But by early 2026, Karpathy himself acknowledged that the vibe coding era was ending for serious engineering work.

The failure mode is predictable. AI coding agents that work from natural language prompts without a formal specification tend to produce code that works in isolation but drifts from architectural intent. As the codebase grows, the agent misses edge cases, hallucinates APIs, and generates code that conflicts with existing patterns it cannot fully hold in context. The result is a codebase that looks right in individual files and breaks at integration.

Works well with vibe coding

• Early-stage prototypes and demos
• Single-file scripts and utilities
• Solo developer projects under ~500 lines
• Throwaway proof-of-concept code

Breaks down with vibe coding

• Multi-file features with shared state
• APIs consumed by other services
• Production features with SLA requirements
• Code touched by more than one developer or agent

GitHub reports that teams using SDD on internal projects ship features with roughly an order of magnitude fewer "regenerate from scratch" cycles than teams using ad hoc prompting. Early adopters at AWS report 3 to 10 times higher first-pass success rates on non-trivial tasks. The methodology works because it gives agents the context they need to stay aligned across a multi-file, multi-session implementation.

What Spec-Driven Development Actually Is

Spec-driven development is a software methodology where the executable specification, not the code, is the source of truth. The team writes a detailed spec describing what the system should do. An AI coding agent (or a human engineer) then derives an implementation plan, breaks it into atomic tasks, and generates code against that plan. The spec persists between sessions and anchors the agent whenever work resumes.

Write the spec

A structured document that describes the feature: purpose, user stories, data model, edge cases, acceptance criteria, security constraints. Typically 1 to 3 pages for a well-scoped feature.

Agent generates implementation plan

The AI coding agent reads the spec and proposes a breakdown of files to create or modify, dependencies to install, and the order of implementation. PM and engineer review the plan for architectural alignment.

Agent implements task by task

The agent works through the implementation plan atomically. Each task is a small, verifiable unit of work. The spec is referenced at each step, not just at the start.

Spec-anchored review

Code review validates against the spec, not just against the code itself. Reviewers check that the implementation satisfies the acceptance criteria written in the spec before the feature is merged.

The key insight is that the spec file persists between sessions. An AI coding agent that picks up a feature mid-implementation can read the spec to recover context that would otherwise require re-prompting or produce drift. This is especially important on features that take more than a single session to build.

Anatomy of an Effective Spec

A spec that guides an AI coding agent is different from a traditional PRD. PRDs are written for humans: they can tolerate ambiguity, fill in gaps from context, and ask questions. AI agents interpret the spec literally and fill gaps with their own assumptions, which may be wrong. An effective spec for AI implementation is more precise than a PRD but narrower in scope.

Purpose and context (1 paragraph)

What problem does this feature solve? What is the user-facing behavior? One paragraph. This is what the agent reads first to orient itself.

Example:

Add a retry mechanism to the AI summarization endpoint so that transient provider errors do not surface to the user. On first failure, retry once after 500ms with the same request. On second failure, fall back to a cached summary if one exists within 24 hours. If no cache is available, return an error with a user-visible message.

User stories with explicit acceptance criteria

Each user story should have a 'given / when / then' structure. Acceptance criteria should be verifiable, not aspirational. 'The response should feel fast' is not a criterion. 'The P95 response time on the /summarize endpoint should be under 800ms' is.

Example:

Given: a user submits a document for summarization. When: the AI provider returns a 429 or 503 error. Then: the system retries exactly once after 500ms. The user sees no error during the retry window.

Data model and interfaces

Define any new data structures, database schema changes, or API interface changes. Agents generate inconsistent interfaces when these are left implicit. Be explicit about field types, optional vs. required, and nullability.

Example:

Add a summary_cache table with columns: id (uuid), document_hash (sha256), summary_text (text), created_at (timestamp). Index on (document_hash, created_at). TTL enforced by the application layer, not the database.

Edge cases and failure modes

This is the section most PRDs skip that causes the most agent-generated bugs. Enumerate the cases that are not the happy path: what happens if the input is empty? If the model returns a malformed response? If the cache is cold?

Example:

If the document exceeds 100,000 tokens, return an error before calling the AI provider. Do not attempt chunking. If the cache lookup takes more than 50ms (e.g., database contention), skip the cache and return the error directly.

Out of scope (explicit)

One of the highest-ROI sections of a spec for AI agents. Agents will implement related functionality if not explicitly told not to. Listing out-of-scope items prevents scope creep in the implementation.

Example:

Out of scope: streaming responses, multi-document summarization, user-facing cache controls, rate limiting (handled by API gateway layer).

Learn to Work With AI Engineering Teams

The AI PM Masterclass covers how PMs work alongside AI coding tools, what makes a spec that engineers and agents can execute, and the new skills that matter in 2026. Taught live by a Salesforce Sr. Director PM.

The PM's Role in a Spec-Driven Team

In a traditional engineering team, the PM writes a PRD and engineers translate it into technical requirements. In a spec-driven team, the PM's leverage point shifts: the spec is closer to executable than a PRD, and the quality of the spec directly determines the quality of what the agent builds.

PMs who write precise specs find that they reduce the back-and-forth with engineers by 50 to 70%. The ambiguity that used to live in "requirements refinement meetings" gets resolved at spec-writing time instead. This is a significant shift in where PM time goes.

Write the feature spec (PM-owned)

The PM is the primary author of the spec. Engineers may add implementation notes or flag architectural constraints, but the user stories, acceptance criteria, and out-of-scope declarations belong to the PM. This is where PM judgment is most valuable and least replaceable.

Review the implementation plan (PM-involved)

After the agent generates an implementation plan, the PM should review it for alignment with product intent. Look for tasks that implement functionality not in the spec, or that skip acceptance criteria. Catch these before a line of code is written.

Spec-anchored acceptance testing (PM-owned)

When the implementation is complete, PM acceptance is validated against the spec, not against the code. Does the feature match every acceptance criterion? Does it respect the out-of-scope constraints? This is a more rigorous and faster QA loop than traditional manual testing.

Spec maintenance as product changes (PM-owned)

When the feature needs to change, the PM updates the spec first. The agent implements against the updated spec. Skipping this step and prompting the agent to make changes directly is how spec drift introduces bugs.

SDD Tools in 2026

By mid-2026, every major AI coding tool has shipped a spec-driven workflow. The implementations vary in how tightly they integrate the spec into the agent context, but the core pattern is consistent: write a spec, agent reads it, agent works task by task.

AWS Kiro

May 2026

Spec-first IDE built on Code OSS (VS Code base). Kiro generates a spec from a high-level requirement, gets PM approval, then implements task by task. A drug discovery team built a production-ready agent in three weeks with three developers, with Kiro generating over 95% of the business logic.

GitHub Spec Kit

Early 2026

Integrates spec workflows directly into GitHub pull requests. The spec lives in the repo as a markdown file. Copilot reads the spec file as authoritative context when implementing or reviewing changes. Spec Kit teams at GitHub report an order of magnitude fewer 'regenerate from scratch' cycles.

Claude Code

2025 (SDD support 2026)

Supports CLAUDE.md and project spec files as persistent context. The spec file is referenced across sessions. Works with any IDE and any repo structure. Particularly strong at following a multi-file spec across long implementation sessions without drift.

Cursor with specs

2024 (SDD support 2026)

Cursor rules + spec files function as a light SDD layer. Teams define a .cursor/spec directory. The agent reads spec files during code generation. Less formalized than Kiro or Spec Kit, but effective for teams already on Cursor.

When SDD Is Worth the Overhead

Spec-driven development has overhead: writing a good spec takes one to three hours for a well-scoped feature. Not every change justifies that investment. Here is the decision framework for when SDD pays for itself versus when it is engineering theater.

Use SDD when:

• The feature touches more than two files
• The implementation will span more than one coding session
• The feature has explicit acceptance criteria that can be verified
• Multiple engineers or agents will work on the feature
• The feature has security, compliance, or data model implications

Skip SDD when:

• You are building a throwaway prototype to validate an idea
• The change is a single-file fix with no cross-cutting concerns
• You are in early discovery and the requirements will change substantially before implementation
• The team is a solo developer building at weekend speed

The underlying principle

The cost of writing a spec is fixed. The cost of implementing from an ambiguous prompt compounds as the codebase grows. On any feature you expect to maintain, extend, or hand off, the spec pays for itself within the first revision cycle. The teams not using SDD in 2026 are accumulating technical debt faster than they realize, because the agent-generated code that looked right at implementation time is accumulating undocumented assumptions.

Spec-Driven Development for Product Managers: Beyond Vibe Coding

Why Vibe Coding Has a Ceiling

What Spec-Driven Development Actually Is

Anatomy of an Effective Spec

Learn to Work With AI Engineering Teams

The PM's Role in a Spec-Driven Team

SDD Tools in 2026

When SDD Is Worth the Overhead

Master the Skills AI PMs Need in 2026

Related Articles