AI STRATEGY

AI Intellectual Property Strategy: Training Data, Output Ownership, and Risk in 2026

By Institute of AI PM·13 min read·Jun 5, 2026

TL;DR

Three IP questions now block enterprise AI deals: Who owns the outputs your product generates? What is your training data liability? Does your vendor indemnify you if a lawsuit lands? The answers differ by vendor contract, jurisdiction, and use case. Under US law, AI-generated content without meaningful human creative input cannot be copyrighted — which has major implications for what your customers can do with your product's outputs. The EU AI Act, US fair use doctrine, Japan's permissive framework, and the UK's post-Getty position are all materially different. This guide gives you the framework to navigate all of it without becoming a lawyer.

The AI PM Minute

One tactic to make you a sharper AI PM, twice a week. 60 seconds to read. Free.

No fluff. Unsubscribe anytime.

Why AI IP Is Now a Product Strategy Issue

For the first two years of the generative AI wave, IP questions were largely theoretical. Lawyers were filing briefs; enterprises were moving fast. In 2026, the gap closed. Court rulings on training data, updated vendor terms that stripped or granted output ownership by tier, and enterprise procurement teams that started requiring IP indemnification letters — these have made IP a first-class product concern, not a legal department footnote.

Three triggers accelerated this shift:

Getty Images vs. Stability AI and its ripple effects

The 2024-2026 litigation established that training on copyrighted images without licensing can constitute infringement when the outputs are substantially similar to training data. Enterprise buyers read this and started adding IP indemnification as a procurement requirement — not just for image generation products, but for any AI product trained on external data.

US Copyright Office guidance (February 2026)

The USCO issued updated guidance affirming that the threshold question is how much meaningful human creative decision-making went into the final work. A user who crafts a detailed prompt and iteratively edits the output may have a stronger copyright claim than a user who accepts the first generation. This ambiguity affects what your customers can protect and what they can't.

EU AI Act training data transparency requirements

High-risk AI systems must document training data sources and demonstrate that data was collected lawfully. Even non-EU companies selling into the EU are now being asked to provide training data summaries as part of enterprise procurement. This is no longer a future requirement — it's active in 2026.

Training Data Rights: What You Know and What You Don't

If your product is built on a foundation model from a major provider (OpenAI, Anthropic, Google, Meta), you are downstream of their training data decisions. Your product inherits both the capability and the liability exposure. Understanding your position requires knowing what each jurisdiction says about training on third-party data.

United States

Fair use (Section 107)Most permissive for AI training

Training on publicly available works for research and transformative purposes is likely fair use under most interpretations. However, the 'substantially similar output' test means that models trained on specific copyrighted works that can reproduce those works verbatim face higher exposure. Courts are still settling this — no definitive ruling as of June 2026.

European Union

EU AI Act + Text and Data Mining exceptionRequires documentation

The EU's TDM exception allows training on lawfully accessed data unless rights holders opt out. The AI Act adds a layer: high-risk systems must document training data sources. The practical requirement is a training data card that shows data provenance and confirms rights holders who opted out were excluded.

Japan

Japan explicitly allows training on copyrighted works without permission or compensation for non-expressive uses (i.e., extracting statistical patterns). This is why several frontier model labs have Japanese subsidiaries. Commercial use of the trained model is allowed; reproducing the original work verbatim is not.

United Kingdom

Evolving post-GettyCautious

The UK's TDM exception previously allowed commercial mining of lawfully accessed content, but the Getty case and subsequent parliamentary review have created uncertainty. Rights holders' opt-out systems are gaining legal recognition. The safe path: document your data sources and have a clear opt-out mechanism.

Practical implication for PMs

If your product is built on a third-party foundation model, request your vendor's training data documentation and IP indemnification terms before signing. The question isn't whether the vendor is right — it's whether you're contractually protected if they're wrong. IP indemnification means the vendor covers your legal costs and damages if a training data claim lands on your product. Without it, you bear the exposure.

Output Ownership: What the Contracts Actually Say

The question of who owns AI-generated outputs — your product, your customer, or neither — is determined by two things: the underlying model provider's terms of service, and the degree of human creative input in the final output. Both vary significantly.

OpenAI (GPT-4o, DALL-E 3)

OpenAI grants users ownership of outputs and the right to use them for any purpose, including commercial use. This is the most commercially permissive major provider. API users retain full ownership; the caveat is that OpenAI may use non-API (ChatGPT) conversations for training unless opted out.

Anthropic (Claude)

Claude API users own their outputs. Anthropic's API terms explicitly disclaim any ownership of outputs generated by customers. Usage data is not used to train models by default for API customers — an important distinction from consumer products.

Google (Gemini API)

Google grants customers ownership of their outputs via the API. Google does not claim any intellectual property rights over customer content or outputs. Cloud customers have additional contractual protections under Google Cloud DPA.

Midjourney

Commercial rights are tiered by subscription level. Free and basic plan users are limited to non-commercial use. Standard, Pro, and Mega plan users receive a commercial license. Enterprise users get additional IP indemnification. This tier-based model is increasingly common for image generation.

The copyright angle adds a second layer. Even when your vendor grants you ownership of outputs, US copyright law may not extend protection to purely AI-generated content. The USCO's current position: outputs that reflect “the expressive choices of a human author” may be copyrightable; outputs generated with a simple prompt and accepted without modification probably aren't. The more human curation, selection, and arrangement involved, the stronger the copyright claim.

The practical implication: if your product generates content that your customers plan to protect (marketing copy, product designs, proprietary reports), you need a workflow that includes meaningful human review and editorial decision-making — both for legal protection and to create a defensible record of human authorship.

Navigate AI Strategy at the Executive Level

The AI PM Masterclass covers IP strategy, enterprise procurement, and how to position your product for regulated markets — taught live by a Salesforce Sr. Director PM.

Building Your AI IP Risk Framework

AI IP risk isn't binary — it sits on a spectrum from low-exposure generic content to high-exposure regulated-industry outputs. Building a risk framework means mapping your product's use cases against the exposure matrix and making deliberate decisions about where to add safeguards.

Low exposure

Internal productivity tools, summarization of the customer's own documents, code generation for internal use

Approach: Standard vendor terms are likely sufficient. Focus on ensuring your vendor's API terms grant output ownership. Document that generated content is for internal use only.

Medium exposure

Marketing copy, blog content, customer-facing reports, product descriptions

Approach: Require human review and editorial decisions before publication. Document the review process. Use vendors with explicit commercial output rights. Brief customers on the copyright ambiguity for minimal-prompt outputs.

High exposure

Legal documents, financial analysis, medical content, creative works intended for external IP protection

Approach: Require IP indemnification from your AI vendor. Implement substantial human authorship in the workflow — the AI is an assistant, not the author. Consider jurisdiction-specific legal review. Document human decisions at each generation step.

Training data-adjacent

Fine-tuning on customer data, RAG systems ingesting third-party content, synthesis of competitor information

Approach: Audit data sources for licensing terms before ingestion. Establish a data rights review process for any content you feed into your model or RAG pipeline. Understand that ingesting copyrighted content into a retrieval system may have different exposure than training on it.

IP Due Diligence Checklist for AI PM Teams

Use this checklist during vendor evaluation, product design, and enterprise procurement conversations. Each item corresponds to a question your customers' legal teams or procurement teams will eventually ask you.

Vendor output ownership terms

Confirm your AI vendor's API terms explicitly grant you (and by extension your customers) ownership of generated outputs for commercial use. Get this in writing, not just in public terms that can change.

IP indemnification coverage

Confirm whether your vendor's IP indemnification covers both training data claims and output similarity claims. Major providers (OpenAI, Google, Microsoft) now offer enterprise IP indemnification tiers — evaluate whether your contract tier includes it.

Training data documentation

Request a training data card or data provenance summary for any model you're building on top of. EU AI Act procurement increasingly requires this. If your vendor can't provide it, document that gap.

Fine-tuning data rights

If you fine-tune on customer data, ensure your data processing agreement (DPA) explicitly addresses ownership of the resulting model weights. Who owns the fine-tuned model — you, the customer, or the base model vendor?

Human authorship workflow

For high-stakes outputs (legal, financial, creative), document that a human made meaningful editorial decisions in the production process. This is both a copyright strategy and a regulatory compliance requirement under the EU AI Act for high-risk decisions.

Third-party content in RAG

Audit the rights status of any content in your retrieval pipeline. Scraping publicly accessible content into a RAG system may violate terms of service even when the original reading was allowed. Use licensed data sources for enterprise RAG deployments.

Competitive output risk

Test whether your AI product can be prompted to produce outputs that are substantially similar to known copyrighted works. This is both a product quality check and a liability check. Implement output filters for high-risk content types.

Build AI Products That Win Enterprise Deals

The AI PM Masterclass covers IP strategy, vendor negotiation, regulatory compliance, and how to position your product for regulated-industry buyers. Taught live by a Salesforce Sr. Director PM.

→ EU AI Act & AI Regulation: What Product Managers Need to Know in 2026 → Responsible AI Product Management: Ethics, Fairness, and Bias Without the Buzzwords → AI Content Provenance and Watermarking: The PM's Guide to C2PA and SynthID → AI Vendor Selection Template: Evaluate and Choose AI Partners

Before you go: get the AI PM Minute