30 Most Important AI Concepts Every Aspiring AI Product Manager Must Know
TL;DR
You don't need to know everything in AI to be an AI product manager — but there are 30 concepts that come up so often, in interviews and on the job, that not knowing them disqualifies you. This guide groups them into five clusters: foundations, retrieval & memory, evaluation, deployment, and safety. Each entry includes a one-line definition and the product implication that actually matters.
Cluster 1 — Foundations (Concepts 1-7)
1. Tokens
The unit of text an LLM reads and bills for. Roughly 3/4 of a word. Token cost shapes pricing, latency, and context limits.
2. Embeddings
High-dimensional vectors that represent meaning. The core primitive behind semantic search, recommendations, and RAG.
3. Attention
The mechanism that lets a model decide which earlier tokens matter for the current prediction. Quadratic in context length — which is why long contexts cost more.
4. Pre-training vs. fine-tuning
Pre-training builds general capability; fine-tuning specializes it. Fine-tuning is rarely the right first move for a PM.
5. Context window
How much text a model can see at once. Larger windows enable document analysis but degrade in the middle (the "lost in the middle" effect).
6. Temperature
A sampling knob that controls output randomness. Low = deterministic, high = creative. Production AI mostly runs at low temperature.
7. System prompt
The instruction layer that sets model behavior. Often the highest-leverage change in an AI feature.
Cluster 2 — Retrieval & Memory (Concepts 8-13)
8. RAG (Retrieval-Augmented Generation)
Inject external documents into the prompt at runtime. The default architecture for grounding AI in your data.
9. Vector database
Stores embeddings and finds nearest neighbors fast. Pinecone, Weaviate, pgvector are common choices.
10. Chunking
How you split documents before embedding. Bad chunking is the #1 cause of bad RAG quality.
11. Reranking
A second-pass model that reorders retrieved chunks. Often a bigger quality lift than upgrading the LLM.
12. Hybrid search
Combine keyword (BM25) and vector search. Each catches what the other misses.
13. Memory systems
Long-term context management for agents and chatbots. Includes summary memory, episodic memory, and entity memory.
Cluster 3 — Evaluation (Concepts 14-19)
14. Eval set (golden set)
A curated set of inputs with known good outputs. Without one, you can't measure regressions.
15. LLM-as-judge
Use a model to score model outputs. Cheap, fast, and noisy — best paired with a small human-graded set.
16. Pass@k
The probability that at least one of k samples is correct. Useful for code, planning, and any task with multiple valid answers.
17. Hallucination
Confidently generated false content. Mitigated by RAG, citations, and refusal training — never eliminated.
18. Drift
Quality degrading over time due to model updates, prompt changes, or distribution shift. Continuous eval catches it.
19. Red teaming
Adversarial testing for failure modes: jailbreaks, prompt injection, harmful outputs. Required for any production AI.
Master These Concepts in the AI PM Masterclass
Reading definitions builds vocabulary; doing exercises builds intuition. The masterclass walks through every concept on this list with hands-on labs and 1:1 review.
Cluster 4 — Deployment (Concepts 20-25)
20. Inference
Running the model to produce outputs. Distinct from training. Most AI PM cost lives here.
21. Latency vs. throughput
Latency = single-request speed. Throughput = total requests per second. They often trade off.
22. Streaming
Send tokens to the user as they generate. The single biggest perceived-latency win in chat UIs.
23. Caching
Reuse responses for repeated inputs. Prompt caching and semantic caching cut cost dramatically.
24. Quantization
Compress model weights to lower precision. Cuts cost and latency at modest quality loss.
25. Distillation
Train a smaller model to mimic a larger one. The classic way to ship a cheap model with most of the quality.
Cluster 5 — Safety & Trust (Concepts 26-30)
26. Guardrails
Programmatic filters on inputs and outputs. Distinct from model safety training — guardrails enforce rules at runtime.
27. Prompt injection
User input that hijacks model behavior. The OWASP top risk for LLM apps.
28. Content filters
Classifiers that block harmful, off-topic, or policy-violating outputs. Always layered, never the only defense.
29. Provenance / citations
Show the user where the answer came from. The single highest-leverage trust intervention in RAG products.
30. Human in the loop (HITL)
Route low-confidence or high-risk outputs to humans. The right design for high-stakes AI features.