AI Model Card Template: Document Your AI Models for Stakeholders and Compliance

Model Overview Section

The model overview answers the basic questions any stakeholder needs before reading further. Fill in every field — blank fields signal gaps in documentation or governance.

Model name and version

Unique identifier including version number. Example: 'Customer Support Classifier v2.3'. Version numbers should increment on every material change to the model or prompt. If you don't know your model version, you don't have versioning — fix that first.

Model owner and review date

Who is responsible for this model's performance and governance? Who must be notified when it changes or fails? Include team name, primary contact, and the date this model card was last reviewed. Model cards that are never reviewed become stale documentation that creates false confidence.

Model type and architecture

Foundation model used (Claude 3.5 Sonnet, GPT-4o, Gemini 1.5 Pro), deployment type (API, fine-tuned, RAG-augmented), and any custom components. If you are using a proprietary third-party model, document which version and your data handling agreement with the provider.

Intended use cases

Explicit list of the use cases this model is designed for and validated on. Being specific here is important — 'customer support' is too broad. 'Classifying inbound support tickets by product area and urgency for routing to the correct queue' is appropriately specific.

Performance and Evaluation Section

Primary performance metrics

The metrics used to evaluate model quality, with current values. Include the evaluation set size and composition. Example: 'Accuracy on golden test set (n=500): 91.3%. F1 on positive class: 88.7%. Human evaluation agreement rate: 93.1%.' Numbers without context (accuracy = 91%) are not actionable; numbers with benchmark context are.

Evaluation methodology

How was the model evaluated? Who created the test set? When was it last updated? Is evaluation automated, human, or hybrid? If your evaluation set was created by the same team that built the model, note this — it is a bias risk. Independent evaluation sets are higher quality evidence.

Performance by segment

Does the model perform equally well across all relevant user segments? Evaluate and document performance by language, user type, query complexity, and any other dimension where differential performance is a risk. Models that perform well on average but poorly on specific segments create equity and legal risk.

Known failure modes

Where does the model fail? Be specific. 'May underperform on multi-lingual inputs.' 'Accuracy drops below 75% for queries involving technical product names not in training data.' 'High override rate observed for edge cases involving date/time expressions in non-US formats.' Known failure modes are features of good documentation, not admissions of weakness.

Safety, Bias, and Limitations Section

Bias evaluation results

Document bias testing that was conducted: demographic parity tests, equalized odds evaluation, and any protected attribute analysis. Document the methodology, the findings, and the mitigations implemented. If bias testing was not conducted, this must be stated explicitly — blank is not the same as tested and clean.

Out-of-scope uses (prohibited use cases)

Explicit list of uses the model should not be used for. This is as important as the intended use list. 'This model should not be used for: making employment decisions, determining creditworthiness, medical diagnosis, or any high-stakes decision-making without human review.' Clear prohibitions reduce misuse.

Data and privacy considerations

What data was used to train or fine-tune this model? Does inference processing involve PII? What data retention policies apply? For models that process user data, document the data handling chain from user input to model output to storage — and the regulatory framework (GDPR, CCPA, HIPAA) that applies.

Regulatory and compliance notes

Any regulatory requirements or compliance certifications relevant to this model's use. EU AI Act risk classification. HIPAA applicability. Industry-specific regulations. This section is particularly important for regulated industries where legal may need to sign off on AI deployments.

Build Responsible AI Products in the Masterclass

AI governance, model documentation, and responsible AI practices are part of the AI PM Masterclass curriculum. Taught by a Salesforce Sr. Director PM.

Deployment and Operational Requirements

Human oversight requirements

What human review is required for this model's outputs? 'All outputs must be reviewed by a licensed professional before action.' 'Sampling review of 5% of outputs weekly.' 'No human review required for outputs with confidence score ≥0.95; human review required for all others.' Document the oversight tier explicitly.

Monitoring and alerting thresholds

What metrics are monitored and at what thresholds are alerts triggered? 'Override rate alert if ≥15% in a 24-hour period.' 'Latency p95 alert if ≥3s.' 'Error rate alert if ≥2%.' These thresholds should be set and documented before deployment so that monitoring is configured to specification, not configured after incidents occur.

Incident response contact

Who is notified if this model has a quality incident? Who has authority to disable it? What is the escalation path? This section should be as specific as a runbook: 'Level 1 (quality alert): notify model owner. Level 2 (quality incident): notify PM and engineering lead. Level 3 (safety incident): notify VP Product and legal.'

Deprecation and update plan

When will this model be updated? What triggers an unplanned update? What is the deprecation process if this model is retired? Models without a stated update plan tend to stagnate until they cause an incident. A quarterly review commitment is a minimum — with explicit criteria for out-of-cycle updates.

Model Card Review and Approval Process

Complete the template before deployment

A model card should be completed and reviewed before any AI feature reaches production. It is a pre-deployment gate, not a post-deployment documentation task. If you can't fill in the known failure modes section, the model isn't ready to deploy — you need more evaluation.

Multi-stakeholder review

Engineering reviews performance metrics and technical limitations. Legal reviews regulatory implications and prohibited uses. Product reviews intended use alignment with product strategy. Relevant specialists review bias evaluation (data science) and safety (security/trust). Model cards that only an engineering team reviews miss important stakeholder perspectives.

Version the model card with the model

When the model changes, the model card must be updated and re-reviewed. Store model cards in version control alongside model configurations. A model card that describes a previous version of the model is misleading. Set up a process that requires model card updates as part of model update approval.

Make model cards accessible

Model cards provide value only if relevant stakeholders can find and read them. Store in a central, searchable location (Notion, Confluence, a dedicated model registry). Reference them in PRDs, release notes, and compliance documentation. Inaccessible documentation is the same as no documentation.