How to Design AI Agent Systems: Architecture Patterns for Product Managers

What Makes an Agent Different from a Chatbot

A chatbot takes a user message and generates a response. An agent takes a user goal and figures out how to accomplish it — potentially across multiple steps, using multiple tools, with decisions made along the way.

When a user tells a chatbot "What's the status of ticket #1234?", the chatbot generates a plausible answer based on its training. When a user tells an agent the same thing, the agent queries the ticket system, retrieves the actual status, checks if there are related tickets, and returns the real answer with relevant context.

Chatbot

→Takes a message, generates a reply
→Single-turn interaction
→No external system access
→Output: text

Agent

→Takes a goal, plans how to reach it
→Multi-step autonomous execution
→Calls tools, APIs, databases
→Output: actions + results

Build a working AI agent yourself. The AI PM Masterclass has you design the architecture, define the tools, implement safety guardrails, and ship a functional agent product — live, with a Salesforce Sr. Director PM.

Book Free Strategy Call →View Curriculum →

The Core Agent Loop

Every AI agent follows the same fundamental pattern, regardless of the specific framework or implementation.

Observe

The agent receives input — a user request, a trigger event, or new data. It also has access to context: conversation history, system state, available tools, and any stored memory.

Reason

The agent uses an LLM to analyse the situation and decide what to do next. This is where the model's intelligence matters most — understanding the goal, assessing what information it has, and planning next steps.

Act

The agent executes an action — calling a tool, querying a database, sending a message, or generating a response. The action produces a result that feeds back into the observation step.

Repeat

The agent evaluates the result of its action and decides whether the goal is accomplished or whether more steps are needed. The loop continues until the task is complete or the agent determines it can't proceed.

Tool Use: How Agents Interact with Systems

Tools are how agents do things beyond generating text. A tool is a function the agent can call — it might query an API, read a database, send an email, create a document, or perform a calculation. The design of tools is one of the most important PM decisions in agent development.

Clear name and description

The agent reads tool descriptions to decide which tool to use. If the description is ambiguous, the agent will use the wrong tool. Writing good tool descriptions is as much a PM skill as writing good user stories.

Well-defined inputs and outputs

The agent needs to know what parameters to provide and what to expect back. Vague inputs lead to errors. Overly complex inputs lead to the agent getting confused.

Appropriate scope

A tool should do one thing well. A tool called 'manage_everything' will confuse the agent. A tool called 'get_customer_by_email' is clear and specific.

Error handling

Tools fail — APIs time out, databases return empty results, permissions are denied. The agent needs to handle these failures gracefully, either retrying, using an alternative approach, or informing the user.

MCP: the emerging standard

MCP (Model Context Protocol) is becoming the standard for how agents discover and use tools. Rather than building custom integrations for each tool, MCP provides a universal protocol — like USB-C for agent-tool connections.

Memory Systems: Short-term and Long-term

Agents need memory to function effectively over time. There are two types:

Short-term memory

Conversation context

The current interaction — what the user said, what the agent has done so far, what results it's gotten. Lives in the LLM's context window. Every agent has this by default.

Long-term memory

Persistent storage

Information the agent retains across conversations — user preferences, past interactions, learned patterns. Requires explicit engineering: storing in a database and retrieving when relevant.

The PM memory decisions

What should the agent remember? How long should it retain information? What are the privacy implications? An agent that remembers preferences feels intelligent. An agent that forgets everything each conversation feels frustrating. An agent that remembers too much feels creepy.

Orchestration Patterns

Complex agent tasks require coordinating multiple steps, tools, and sometimes multiple agents. Several patterns have emerged:

Sequential chain

Simplest

The agent completes one step, then the next, then the next. Good for well-defined workflows. Example: read email → extract action items → create tasks → send summary.

Router

Versatile

The agent classifies the request and routes it to a specialised sub-agent or workflow. Good for products that handle diverse request types — billing questions to a billing agent, technical questions to a support agent.

Parallel execution

Fast

The agent kicks off multiple actions simultaneously and aggregates the results. Good for tasks that require gathering information from multiple sources in parallel.

Human-in-the-loop

Safe

The agent executes autonomously until it reaches a decision point requiring human approval, then pauses. Good for high-stakes actions — the agent drafts routine emails but pauses for external-facing sends.

Designing for Agent Failure

Agents fail in ways that chatbots don't. A chatbot that gives a bad answer is annoying. An agent that takes a wrong action can be destructive — sending the wrong email, deleting the wrong file, making the wrong API call. PMs must design safety nets:

Required safety patterns for production agents

→Action confirmation — require user approval before irreversible or high-impact actions
→Scope limiting — restrict tools to read-only first, expand write access as trust is established
→Rollback capability — design actions to be reversible where possible
→Graceful degradation — when stuck, explain what was tried and suggest alternatives
→Monitoring and audit trails — log every action for debugging, trust, and compliance

Evaluation: How to Measure Agent Quality

Agent evaluation is harder than chatbot evaluation because you're measuring multi-step workflows, not single responses.

Task completion rate

What percentage of user requests does the agent successfully complete? Segment by task type and complexity — headline rate alone is misleading.

Action accuracy

When the agent takes an action, is it the right action? A high task completion rate with low action accuracy means the agent is completing tasks but doing them wrong.

Efficiency

How many steps does the agent take to complete a task? Fewer steps generally means better reasoning. An agent that takes 15 steps to book a meeting is poorly designed.

Failure recovery

When the agent encounters an error, does it recover gracefully? Does it find alternative paths? Or does it get stuck in a loop?

User satisfaction

Do users trust and value the agent? This captures all the above metrics plus response speed, communication clarity, and appropriate autonomy.

The PM's Role in Agent Development

Building agent products requires PMs to think differently about several aspects of their work:

Specification

You can't write a traditional spec for an agent because you can't predict every path it will take. Instead, define goals, available tools, constraints, and evaluation criteria — then test extensively.

Testing

Agent testing requires scenario-based evaluation with diverse, realistic tasks. Test adversarial inputs, edge cases, tool failures, and multi-step workflows where early mistakes compound.

User trust

Agent adoption depends on trust, built incrementally. Start with low-stakes tasks where failure is cheap, demonstrate competence, then expand to higher-stakes tasks. Don't launch with an agent that can do everything.