Agent Systems8 min read

When (and How) to Build an Agent

The first question to answer before building an agent is whether you actually need one. Most features don't. The ones that do follow predictable patterns, and the ones that break do so in predictable ways. Getting both right saves significant time and money.

ShareLinkedIn

On this page

1. Do You Actually Need an Agent?
2. The Components
3. Tool Description Is the Interface
4. Where Agents Break
5. RAG: When Memory Needs to Scale

Do You Actually Need an Agent?

Agents add tool definitions, iteration loops, and token cost — typically 3-10x the tokens of a single-shot call. That cost has to buy a real capability gap over a well-designed single prompt.

The design question, taken directly from the source material: 'What specific task will this agent handle better than a single prompt? How will we measure the difference?' If you can't answer both concretely, a single LLM call is almost certainly the right choice.

Single-shot is fine for extraction, classification, generation, and summarization. Agents are for multi-step decisions where the AI needs to reason about what to call and when — scheduling across calendar APIs, researching across multiple sources, debugging by reading logs and running tests.

The Components

Model: the LLM making the decisions. Use a cheaper model during development — you're iterating on prompts and tool descriptions, not on the final user experience.

Tools: functions with three pieces of metadata — name, description, and input schema (Zod or equivalent). The schema doubles as input validation and as documentation the model reads. Describe every schema field; vague fields produce vague tool calls.

Loop: Think, Act, Observe, repeat, stop when the model returns text instead of a tool call. Always set a recursion limit — 5 to 10 for most cases. A missing limit is how a stuck loop becomes an expensive bill.

Memory: conversation history you pass on every invoke. The agent does not remember on its own. Push the user turn, call invoke, push the assistant turn, repeat.

Component	What it is	Common failure
Model	The LLM making decisions	Using expensive model during iteration
Tool	Function + name + description + schema	Vague description → wrong tool selected
Loop	Think, Act, Observe, repeat	No recursion limit → runaway cost
RAG	Embeddings + vector store + retrieval	Wrong chunks returned; read actual results
Memory	Message history you pass every invoke	Grows unbounded; truncate in production

Tool Description Is the Interface

The tool description tells the model when to use the tool. Write it like onboarding a coworker: describe the situation that triggers reaching for this tool versus figuring it out another way.

'Searches the web' gives the model nothing. 'Search the web for current information not available in training data, such as recent events, current prices, or real-time data' tells the model exactly when to call it. If the agent keeps selecting the wrong tool, rewrite the descriptions before anything else.

Always catch errors inside tools and return them as strings. Never throw. A thrown exception crashes the agent loop. A returned error message is an observation the model can read and adapt to.

Where Agents Break

Too many tools: an agent with 12 tools spends most of its reasoning budget deciding which to use, gets it wrong more often, and runs slower. Start with 3-5. Cut tools that haven't been called in real traffic.

Missing recursion limits: a stuck loop without a limit becomes a bill. Every agent in production has this limit. Every agent in development should too.

Missing async/await: web tools and RAG tools must be declared async and properly awaited. Forgetting await is the most common silent failure — the tool appears to 'do nothing.'

Stream the agent to debug. The stream shows each Think, Act, Observe step as it happens — which tool was chosen, what arguments were passed, what the tool returned. This is the primary debugging tool before you rewrite anything.

RAG: When Memory Needs to Scale

RAG (Retrieval Augmented Generation) lets agents query a knowledge base rather than keeping everything in context. Embeddings convert text to vectors; similarity search retrieves the most semantically relevant chunks.

The core trap: embeddings search by meaning, not keywords. If your chunks and your query don't share meaning, similarity search will return something plausible and wrong. Read the actual chunks your vector store returned before assuming the retrieval mechanism is broken.

Return source attribution from the RAG tool, not just the text. Every retrieved chunk should come back labeled with its source. The agent uses the source to format its answer; you use the source to debug when the chunk is wrong.

Agents are a while loop and a few functions. Every framework is a wrapper around that. Learn the loop and the frameworks get easy.

Back to Learn

More in AI Systems

AI FundamentalsHow LLMs Actually Work6 min read Agent ArchitectureThe ReAct Loop: Think, Act, Observe5 min read AI PlanningPlanning AI Projects: The Four-Document Pipeline7 min read AI DevelopmentDesigning Code AI Can Actually Use8 min read

Want to apply these frameworks to your business?

Take the AI readiness scorecard Book an AI readiness diagnostic