AI Architecture Decision Record (ADR) Template for Product Teams

Why ADRs Matter for AI Teams

Architecture Decision Records originated in software engineering as a way to document significant technical decisions — not the code itself, but the reasoning behind the code. For AI teams, the concept is even more critical. AI architecture decisions involve more uncertainty, more trade-offs, and more future regret than typical software decisions.

Consider a few decisions that haunt AI PMs who didn't document them:

“We chose RAG over fine-tuning”

Six months later, nobody remembers why. A new engineer proposes fine-tuning. The team debates the original decision from scratch — and can't reconstruct whether latency, cost, or data availability drove the original call.

“We chose GPT-4o over Claude for the core feature”

The team switches when a Claude promotion is offered. Two features break because they depended on specific GPT-4o output formatting quirks. The original model selection rationale was in a Slack thread that's now gone.

“We hardcoded the system prompt at v1”

A new PM wants to use prompt versioning but doesn't understand the original constraints. The ADR would have captured 'we chose simplicity over versioning because our prompt changes are infrequent' — making the trade-off explicit and reversible.

“We chose not to use structured outputs”

The codebase has regex parsing scattered everywhere. Nobody knows if this was intentional. An ADR would have documented whether this was a technical decision or just expedient first-version behavior.

An ADR doesn't prevent mistakes. It prevents the same decision from being relitigated without new information, and it captures the constraints that were true at the time — so future teams can judge whether those constraints still apply.

The AI ADR Template

Copy this template into your wiki, Notion, Confluence, or GitHub docs. One file per decision. Keep them short — a good ADR is 200–500 words, not a white paper. The value is in having them, not in their comprehensiveness.

# ADR-[number]: [Short title — what was decided, not the problem]

## Status

[ ] Proposed [ ] Accepted [ ] Deprecated [ ] Superseded by ADR-[N]

## Date

YYYY-MM-DD

## Context

What is the situation that forced this decision? Include constraints that were true at the time: data availability, budget, team expertise, latency requirements, timeline. Keep it factual — this is not the place for advocacy.

## Decision

What was decided? State it clearly in one or two sentences. Be specific — not "we chose to use RAG" but "we chose retrieval-augmented generation using OpenAI text-embedding-3-large with a Pinecone vector store, with a 5-chunk retrieval window and 0.75 similarity threshold."

## Alternatives Considered

List each alternative evaluated and why it was rejected. This is the highest-value section — it prevents future engineers from re-proposing rejected options without understanding why they were rejected.

## Consequences

What becomes easier because of this decision? What becomes harder? What new risks does this introduce? What follow-on decisions does this create? Be honest about both sides — an ADR that only lists benefits is not trustworthy.

## Review Trigger

What condition should prompt revisiting this decision? (e.g., "If inference cost exceeds $X/month" or "When Claude adds native structured output support" or "At 6 months post-launch")

Field-by-Field Breakdown

Each field in the ADR serves a specific purpose. Skipping fields reduces the template's value disproportionately — especially the ones that feel obvious in the moment.

Status

Tells readers whether this decision is still in force. 'Proposed' means it's under discussion. 'Accepted' means it's been ratified. 'Deprecated' means it was intentionally retired. 'Superseded' means a newer ADR replaced it — always link to the superseding ADR.

Tip: Move ADRs to Deprecated when the architecture changes. Never delete ADRs — the history of what you tried and why you changed course is more valuable than the current state alone.

Context

The constraints that existed when the decision was made. This is the section that becomes invaluable two years later. 'We didn't have labeled data for fine-tuning' or 'our latency budget was 200ms at the time' explains decisions that look strange in hindsight.

Tip: Include the date of the decision in the context — AI model capabilities change fast. 'Claude 3 didn't support structured outputs natively' is a context that expires, and future readers need to know it was true when the decision was made.

Decision

The specific choice made, with enough specificity to be reconstructed. Not 'we use RAG' but the specific embedding model, vector store, retrieval window, threshold, and prompt structure. This specificity is what allows the decision to be accurately evaluated later.

Tip: If you find it hard to be specific, the decision isn't made yet. Specificity forces clarity.

Alternatives Considered

The highest-ROI section. Every alternative you evaluated, with the specific reason it was rejected. This prevents future teams from re-evaluating alternatives without new information — they can read why it was rejected and assess whether that rejection reason still holds.

Tip: Write at least three alternatives for any significant decision. If you only evaluated one alternative, you probably haven't explored the space sufficiently.

Consequences

The honest ledger of what this decision costs and what it enables. The costs section is usually harder to write — write it anyway. An ADR that only lists benefits signals that the author didn't think carefully about trade-offs.

Tip: Include both immediate consequences (what's harder to build now) and deferred consequences (what might become a problem at scale, or when models change).

Review Trigger

The condition that should prompt revisiting the decision. This is unique to AI ADRs — the landscape changes fast enough that decisions made in good faith in 2025 may be wrong by 2026. Explicitly stating the review condition prevents decisions from becoming immortal by default.

Tip: Common triggers: cost threshold exceeded, model provider deprecation, new capability available, quality metric falling below threshold, team size change that changes the expertise constraint.

Ship AI Features With More Confidence

The AI PM Masterclass covers the documentation, evaluation, and decision frameworks that let AI teams move fast without creating future messes — taught live by a Salesforce Sr. Director PM.

Worked Example: RAG vs. Fine-Tuning Decision

Here's a complete ADR for one of the most common AI architecture decisions — whether to use retrieval-augmented generation or fine-tuning for a document Q&A feature.

ADR-004: Use RAG for Customer Documentation Q&A Feature

Status: Accepted · Date: 2026-03-15

Context

We need to launch a Q&A feature over our 2,000-page customer documentation in Q2 2026. The documentation is updated monthly. We have no labeled question-answer pairs. Our latency target is under 3 seconds. We have one ML engineer available for 2 weeks. Budget for model fine-tuning is not approved for this quarter.

Decision

Use retrieval-augmented generation: text-embedding-3-large for embeddings, Pinecone serverless for vector storage, 5-chunk retrieval window, 0.70 cosine similarity threshold, GPT-4o for generation with a structured prompt that references retrieved chunks. Embeddings re-indexed nightly via a scheduled job.

Alternatives Considered

Fine-tuning Claude Haiku on documentation: Rejected. Requires labeled Q&A pairs we don't have. Documentation updates would require retraining — ongoing maintenance cost not budgeted. Revisit if we accumulate 500+ labeled Q&A pairs from production logs.
Full-context approach (stuffing entire docs into context): Rejected. 2,000 pages exceeds even 1M context windows economically. Input token cost at query volume exceeds budget by 20x.
Traditional keyword search with LLM answer synthesis: Rejected. Documentation uses inconsistent terminology; keyword matching quality was poor in testing (precision 0.41 vs. 0.78 for embedding retrieval on 50-question internal test set).

Consequences

Easier: Fast to ship (2-week build). Documentation updates propagate automatically via nightly re-indexing. No labeled training data required.
Harder: Answer quality depends on retrieval quality — bad retrieval produces bad answers. Multi-hop questions (requiring synthesizing across 5+ sections) will fail. Adding a new documentation source requires re-indexing strategy.
New risk: Retrieval failures are silent — the model generates an answer even when retrieved chunks are irrelevant. Must monitor low-confidence retrievals and add "I couldn't find this in our documentation" fallback.

Review Trigger

Revisit at 3 months post-launch if: (a) answer quality score falls below 4.0/5.0 in user surveys, (b) inference cost exceeds $2,000/month, or (c) we accumulate 500+ labeled Q&A pairs sufficient for fine-tuning evaluation.

When to Write an ADR (and When Not To)

ADRs add overhead. The goal is to write them for decisions where the context is hard to reconstruct and the cost of getting it wrong — or relitigating it — is high. Not every decision needs one.

Write an ADR when...

›Choosing between two viable AI architecture approaches
›Selecting a model provider or specific model for production
›Deciding on fine-tuning vs. prompting vs. RAG for a feature
›Setting evaluation thresholds that determine production readiness
›Choosing a data pipeline architecture for training or retrieval
›Making a trade-off between quality and cost that affects product positioning

Skip the ADR when...

›The decision is reversible in under a day with no user impact
›You're prototyping — write the ADR when you're ready to commit
›The decision is implementation-level (e.g., which library to use for a utility)
›The team is aligned and the decision is obvious given the constraints
›You're moving fast in a pre-launch sprint — batch ADRs weekly, not per-commit
›The decision has no meaningful alternatives (only one viable option exists)

A good heuristic: if you would want to know this decision existed when joining this team fresh, write the ADR. If you'd figure it out immediately from the codebase or a 5-minute conversation, skip it.

ADR Governance: Making It a Team Habit

The most common failure mode for ADRs is writing two, losing momentum, and abandoning the practice. The governance model needs to be lightweight enough that writing an ADR isn't a significant overhead — because it isn't one. A well-scoped ADR takes 20 minutes to write and saves hours of future debate.

Store ADRs in the repo, not a separate wiki

ADRs in the codebase repo are versioned with the code, searchable, and won't get lost in wiki sprawl. Use a top-level /decisions or /adrs directory. One Markdown file per decision, named ADR-000-short-title.md.

Number ADRs sequentially, never reuse numbers

ADR-004 is always ADR-004, even if it's deprecated. Future ADRs reference past ones by number. Reusing numbers destroys the reference graph.

Review ADRs in sprint planning, not as a separate process

Add 10 minutes to sprint planning to review any ADRs whose review trigger has been met. This keeps the process alive without creating a separate meeting.

PM owns the ADR log, ML engineer co-authors

The PM is responsible for ensuring ADRs exist for significant decisions and that they capture the business context correctly. The ML engineer co-authors the technical alternatives section. Neither owns it alone.

Link ADRs from PRD sections and Jira tickets

When a PRD section describes an architecture approach, link to the relevant ADR. When a Jira ticket implements something covered by an ADR, link back. Cross-referencing is what makes the ADR log discoverable in practice.

When the landscape changes, update the status — don't edit the body

If Anthropic releases a capability that makes your 2025 decision wrong, mark the ADR as Deprecated and write ADR-N+1 that supersedes it. Don't edit the original — the historical record of why the old decision was made is valuable.