AI Data Access Request Template for Product Managers
TL;DR
The single biggest reason AI features stall isn't the model — it's data access. Data engineering is fielding 20 vague Slack pings a week and your request will sit in the queue until you make it easy to approve. This template gives you the six fields data engineering needs, with example answers. Ship the form filled out — not 'hey can we chat about getting some data.'
Section 1: Purpose & Use Case
Data engineering needs to know what the data will power. 'Training data' is too vague — they cannot evaluate fit-for-purpose without specifics.
Feature name & one-line description
Example: 'Smart Inbox: auto-categorize incoming customer emails into 6 intents.' Avoid generic names — 'AI feature' tells data eng nothing.
Use case category
Example: Pick one: training data / eval data / RAG context / real-time inference / analytics. Each has different access patterns and approval paths.
Business outcome the data unlocks
Example: 'Cuts agent triage time from 90s → 15s per email at 5,000 emails/day.' Quantified. The reviewer needs to weigh access cost against this.
Stage of work
Example: Discovery / prototype / pilot / production. Discovery requests get a sample dataset; production requests get a managed pipeline.
Alternatives considered
Example: Why this dataset specifically? Did you check existing datasets? Synthetic data? Open data? Show your homework.
Section 2: Specific Fields Needed
List the exact tables and columns. 'All customer data' is rejected on sight. Specificity is what gets approved.
Source system
Example: Snowflake / BigQuery / Postgres replica / S3 bucket. Name the system, not 'the data warehouse.'
Table or dataset name
Example: 'production.support.tickets' — fully qualified. If you don't know the table name, work with data eng to find it before submitting.
Columns / fields required
Example: List exactly: ticket_id, created_at, customer_id (hashed), subject, body_text, category. Justify each one.
Filters
Example: 'created_at >= 2024-01-01 AND status = closed.' Tighter filters = faster approval and smaller PII surface.
Volume estimate
Example: Rows per pull and per day. '~50K rows in the historical pull, ~2K rows/day after.' This drives infrastructure decisions.
Joins required
Example: If the dataset spans tables, name the join keys and the relationships. Data eng will catch fan-out issues you missed.
Section 3: Refresh Cadence & Access Pattern
One-time export
For prototypes or eval set creation. Simplest to approve. Specify: target date, format (CSV/Parquet), and delivery location (S3 bucket, secure transfer).
Daily / weekly batch refresh
Scheduled pull into a model training pipeline or a feature store. Specify: cadence, expected window time, downstream owner, and on-call contact.
Real-time / streaming
Required for online inference. Specify: latency budget (e.g., <500ms), throughput (events/sec), and failover behavior. Most expensive to provision — justify the need.
Read-only query access
For RAG, analytics, or internal tools. Specify: concurrency limit, query timeout, and audit log requirements. Often the right answer instead of a full export.
Move AI Features Faster Through Data Eng
Working with data eng, scoping AI features, and stakeholder management are core to the AI PM Masterclass — taught live by a Salesforce Sr. Director PM.
Section 4: PII & Sensitive Data Handling
If you cannot answer this section, the request will not be approved. Bring privacy and legal in before submitting.
Does the data contain PII?
Yes / No / Possibly. If yes, list the fields: email, name, phone, address, IP, location, account ID. Even hashed IDs are often considered PII under GDPR.
PII handling strategy
Pick one: redact at source (best), hash at source, tokenize, or pull with restricted access. Document which fields are handled by which method.
Will the data be sent to a third party (model vendor, eval tool)?
If yes, name the vendor and confirm the DPA covers the data category. Anthropic, OpenAI, and most major vendors have enterprise DPAs — check whether you are on the enterprise plan.
Cross-border data flow
Will EU data leave the EU? Will customer data leave the country it was collected in? If yes, the request needs Schrems II / SCC review before approval.
Sensitive categories under GDPR Art. 9 / CCPA
Health, biometric, sexual orientation, political views, financial. If any are present, escalate to legal — these are usually disqualifying for AI training without explicit consent.
Section 5: Retention & Deletion
Data eng cares as much about how long you keep the data as how you got it. Define the lifecycle up front.
Retention period
Example: '90 days for the eval set, then deleted.' / '12 months rolling for the training pipeline.' Indefinite retention is rarely approved without a documented compliance basis.
Deletion mechanism
Example: Automated cron job / TTL on storage / quarterly review with deletion script. Specify how, not just that.
Right-to-be-forgotten propagation
Example: If a customer requests deletion, how does that propagate into your eval set, prompt logs, and fine-tuning data? Document the propagation path.
Backup & replica handling
Example: Backups extend retention. If your eval set is backed up by IT, your real retention is the backup retention. Coordinate.
Audit log of access
Example: Who accessed the data, when, and for what purpose. Required for SOC 2 and most regulated industries. Specify the log destination — usually the SIEM.
Section 6: Success Criteria & Sign-Off
The last section closes the loop. Define what 'done' looks like so data eng can deliver and walk away.
Definition of done
Example: 'Eval set CSV with 5,000 rows, columns: ticket_id, body_text, true_label, customer_segment. Delivered to s3://ai-prod-evals/inbox/v1/.'
Acceptance test
Example: '10-row sample reviewed by PM and DS lead. No PII in body_text. Categories distributed across all 6 labels.' Approve only after the sample passes.
Owners
Example: Requesting PM, data eng owner, privacy reviewer, business stakeholder. Each named, each accountable for their step.
Timeline
Example: 'Sample by Day 5. Full delivery by Day 12.' Specific dates. Vague timelines slip indefinitely.
Renewal / re-approval
Example: When does this access need to be re-approved? Default: every 6 months for ongoing pipelines, or on team transition.
Sign-off
Example: PM signature, data eng signature, privacy signature. Track in your ticket system. This is your audit trail.