Chapter 2 The Agent

Configuring the atomic unit of every workflow

Before you build a multi-agent workflow, you need to master the building block: a single agent. A poorly configured agent in a multi-agent system is like a weak link in a chain — it degrades the whole system. This chapter teaches you how to configure an agent properly.

Every agent has exactly four configuration surfaces. Getting these right is the highest-leverage thing you can do.

Config Surface	Question It Answers	Examples
Model	What LLM powers this agent?	GPT-4o, Claude Sonnet, Llama 3, a local model
Instructions	What does this agent know and focus on?	System prompt defining role, domain, constraints, output format
Tools	What can this agent do in the world?	API calls, code execution, file access, database queries
Reasoning Strategy	How does this agent think?	Direct response, ReAct loop, plan-then-execute

A note on terminology: You’ll see terms like “role-based specialization,” “domain-based specialization,” and “capability-based specialization” in the literature. These are not separate architectural patterns. They’re different knobs on the same agent configuration. Role = change the instructions. Domain = change the instructions + scope the input. Capability = change the tool list. A practitioner doesn’t think “am I doing role-based or capability-based?” — they think “what prompt and tools does this agent need?”

2.1 — Model

The model is the LLM that powers the agent’s reasoning. This is the “brain” of the agent. Different models have fundamentally different reasoning capabilities, and choosing the right one matters.

Reasoning vs. Non-Reasoning Models

The most important distinction when selecting a model is whether it supports extended reasoning (also called “thinking” or “chain-of-thought”).

Category	Examples	How It Works	Best For
Reasoning models	OpenAI o1, o3; Claude with extended thinking; DeepSeek-R1	The model performs an internal chain-of-thought before answering. This “thinking” step is separate from the visible output and can be much longer.	Multi-step logic, math, code generation, exploitation, any task that benefits from deliberate planning
Non-reasoning models	GPT-4o, GPT-4o-mini, Claude Sonnet, Claude Haiku, Gemini Flash	The model generates a response directly. It can still reason in its output, but does not have a hidden thinking step.	Classification, routing, summarization, extraction, formatting, simple tool use

Key insight: Reasoning models are not always better. They are slower and more expensive. For tasks where the answer is obvious (classification, routing, simple extraction), a non-reasoning model is faster and equally accurate. Reserve reasoning models for tasks that genuinely require multi-step deduction.

Mixing Models Within a Workflow

One of the most powerful design decisions: different agents in the same workflow can use different models. You don’t have to pick one model for the whole system.

Example: Mixed-Model Security Pipeline

Agent 1: Alert Classifier — fast, non-reasoning (just classification). Model: GPT-4o-mini (non-reasoning)
Agent 2: Incident Investigator — complex reasoning needed. Model: o3 or Claude with extended thinking (reasoning)
Agent 3: Report Writer — moderate capability. Model: GPT-4o-mini (non-reasoning)

The reasoning model is only used where it matters. The other agents use fast non-reasoning models.

Practical tip: Start with a capable non-reasoning model for all agents. Get the workflow working correctly first. Then selectively upgrade agents that need deeper reasoning, or downgrade agents that don’t need capability. Don’t optimize before you have correctness.

2.2 — Instructions (System Prompt)

The system prompt is the single most important configuration of an agent. It’s the difference between an agent that works and one that doesn’t. The system prompt tells the LLM who it is, what to focus on, what to avoid, and how to format its output.

The Four Components of a Good System Prompt

Component	What It Does	Example
Identity	Establishes the agent’s role and expertise level.	“You are a senior malware reverse engineer with 10 years of experience.”
Focus	Defines what the agent should pay attention to and analyze.	“Your task is to analyze binary executables for obfuscation techniques, hidden functions, and vulnerability patterns.”
Constraints	Sets boundaries on what the agent should not do.	“Never execute untrusted code directly. Never guess if you can use a tool to verify. If uncertain, say so explicitly.”
Output format	Specifies the structure of the agent’s response.	“Respond with: 1) Malware family or classification, 2) Obfuscation techniques identified, 3) Hidden functions or suspicious behavior found, 4) Key findings (IOCs, patterns), 5) Recommended next steps.”

Good vs. Bad Prompts

❌ Bad Prompt — Vague

"Classify security alerts."

Problems:

No identity: the LLM doesn’t know what level of expertise to assume
No focus: what aspects of the alert should it look at?
No constraints: can it make assumptions? Should it guess?
No output format: will it return JSON? Prose? A single word?

✅ Good Prompt — Specific and Structured

"You are a security alert triage specialist at a SOC (Security
Operations Center). You have deep expertise in network security,
malware analysis, and incident classification.

For each alert you receive, analyze it and produce:

1. **Severity**: critical / high / medium / low / informational
2. **Category**: malware, intrusion, data-exfiltration,
   misconfiguration, false-positive
3. **Confidence**: A number from 0.0 to 1.0 indicating how
   confident you are in your classification
4. **Reasoning**: 2-3 sentences explaining your classification

Guidelines:
- Consider the source IP, destination, port, and protocol
- Cross-reference with known threat patterns
- If you are uncertain, classify as 'high' severity and
  explain why in your reasoning
- Never classify as 'informational' unless you are very
  confident it is benign"

The good prompt is longer, but every sentence serves a purpose. It eliminates ambiguity. The LLM knows exactly what to produce and how to handle uncertainty.

System Prompt vs. Human Prompt

Most agentic frameworks distinguish between two prompt fields:

Field	Purpose	When It’s Used
System Prompt	Defines the agent’s identity, role, and behavior. Stays constant across all inputs.	Added once at the beginning of every conversation as a system-level message.
Human Prompt	The specific task or input for this execution. Usually contains template variables to inject dynamic data from the shared state.	Added as a user-level message after the conversation history.

Think of it this way: The System Prompt is the agent’s training manual — it reads it once and internalizes it. The Human Prompt is the specific task assignment — it changes every time.

Template Variables

Prompts can include template variables that are replaced with actual data from the shared state at runtime. Variables typically use curly-brace syntax: {variable_name}.

Template Variables in Action

System Prompt:
"You are a security analyst. Analyze the alert and classify it."

Human Prompt:
"Alert details: {alert_content}

Previous analysis from upstream agent: {previous_analysis}

Classify this alert."

At runtime, {alert_content} and {previous_analysis} are replaced with actual values from the shared state. For example, if alert_content = "SSH brute-force from 203.0.113.42", the LLM receives: “Alert details: SSH brute-force from 203.0.113.42”

This is how data flows into an agent. The template variables pull specific fields from the shared state (covered in detail in Chapter 5).

$Template variables in prompts — system_prompt and human_prompt with {alert_content} and {previous_analysis} placeholders$

2.3 — Tools

Without tools, an agent can only generate text. It can reason, classify, summarize, and explain — but it can’t do anything in the real world. Tools give agents the ability to act.

What Is a Tool?

A tool is a function that the LLM can choose to call. You define the tool (its name, description, and parameters), and the LLM decides at runtime whether to use it, when to use it, and with what arguments.

How Tool Calling Works

# You give the LLM a description of a tool:
Tool: check_ip_reputation
Description: "Look up an IP address in threat intelligence databases.
              Returns reputation score and known associations."
Parameters: ip_address (string, required)

# The LLM receives a task:
"Investigate alert: suspicious connection to 203.0.113.42"

# The LLM DECIDES to call the tool (this is the key insight):
LLM output: {
  "tool_call": "check_ip_reputation",
  "args": {"ip_address": "203.0.113.42"}
}

# The system executes the tool and returns the result:
Tool result: {
  "reputation": "malicious",
  "score": 0.95,
  "associations": ["ransomware C2", "APT-29"]
}

# The LLM reads the result and continues reasoning...

The critical thing to understand: you don’t write code that says “call this tool at step 3.” You give the LLM a list of available tools, and it decides autonomously when and how to use them. This is what makes an agent agentic — it has autonomy over its actions.

Tool Categories

Category	What It Does	Examples
Information retrieval	Fetches data from external sources. Read-only.	Query VirusTotal, search CVE database, read log files, list functions in a binary
Analysis / transformation	Processes data and returns derived information.	Decompile a function, run static analysis, parse a network packet, calculate a hash
Action	Modifies the environment. Has side effects.	Block an IP address, quarantine a file, submit a flag, send an alert
Communication	Sends data to other systems or humans.	Post to Slack, create a Jira ticket, send an email, call another API

How Many Tools Per Agent?

Less is more. Research consistently shows that LLM accuracy on tool selection degrades as the number of available tools increases. With 2–5 tools, accuracy is high. With 10+ tools, models start picking the wrong tool or hallucinating tool names. This is one of the strongest arguments for multi-agent workflows: instead of one agent with 20 tools, use four agents with 5 tools each.

Example: Tool Assignment in Security Triage

# ❌ Bad: One agent, too many tools
Alert Analyst: [check_ip, check_domain, check_hash, decompile,
                list_functions, run_sandbox, block_ip, quarantine,
                search_logs, create_ticket, send_slack, submit_flag]
# 12 tools! The LLM will get confused.

# ✅ Good: Specialized agents, focused tools
Classifier:     [check_ip, check_domain]          # 2 tools
Malware Analyst: [decompile, list_functions, check_hash]  # 3 tools
Responder:      [block_ip, quarantine, create_ticket]    # 3 tools

Tool assignment across agents — Classifier with no tools, Malware Analyst with 3 tools, Responder with 3 tools

2.4 — Reasoning Strategy

How does the agent go from input to output? There are three primary strategies, each suited to different kinds of tasks.

Strategy 1: Direct (Prompt → Response)

The simplest strategy. The agent receives input, the LLM thinks once, and produces output. No tool calls, no iteration. One LLM invocation.

# How it works:
Input → LLM thinks once → Output

# Example:
Input:  "Alert: SSH brute-force from 203.0.113.42, 500 attempts in 10 minutes"
  ↓
LLM thinks once
  ↓
Output: {"severity": "high", "category": "intrusion", "confidence": 0.92}

Pros	Cons	Best For
Fast (one LLM call), cheap, predictable	Can’t gather external information, limited to what’s in the prompt	Classification, summarization, extraction, formatting, routing decisions

An agent with no tools assigned uses the direct strategy.

Strategy 2: ReAct (Reason → Act → Observe → Repeat)

The most common and most powerful strategy. The name “ReAct” comes from “Reason + Act.” The agent enters a loop:

Reason — The LLM thinks about what information it needs or what action to take.
Act — It calls a tool (an action in the real world).
Observe — It reads the tool’s response.
Repeat — Based on the observation, it decides whether to call another tool or produce a final answer.

The ReAct Loop — Reason, Act, Observe cycle with feedback loop and Final Answer branch

Example: ReAct Loop in a Binary Analysis Agent

Reason:  "I need to see what functions are in this binary."
Act:    call list_functions()
Observe: ["main", "check_password", "decrypt_flag", "obfuscate"]

Reason:  "'decrypt_flag' looks interesting. Let me see the code."
Act:    call decompile_function("decrypt_flag")
Observe: "void decrypt_flag() { char key[] = \"s3cr3t\"; ... }"

Reason:  "I can see the decryption key. Let me check the strings too."
Act:    call check_strings()
Observe: ["Usage: ./binary <password>", "Correct!", "Wrong!"]

Reason:  "I have enough info. The password is 's3cr3t'."
Final:  "The binary checks a hardcoded password 's3cr3t' in the
         check_password function. decrypt_flag uses this key to
         decrypt the flag."

Notice: the agent decided on its own which tools to call and in what order. You didn’t write “Step 1: call list_functions, Step 2: call decompile_function.” The LLM reasoned about what information it needed and planned its own investigation.

Pros	Cons	Best For
Flexible, can gather information dynamically, adapts to what it finds	Multiple LLM calls (expensive, slower), can get stuck in loops, unpredictable number of steps	Investigation tasks, analysis, any task where the agent needs to gather information before answering

An agent with tools assigned automatically uses the ReAct strategy. The max_tool_iterations setting controls how many loops are allowed (see Chapter 6).

Strategy 3: Plan-then-Execute

A variation where the agent first creates an explicit plan (a numbered list of steps), then executes each step. This gives the agent a “roadmap” and reduces wandering.

# Phase 1: Planning
Agent: "To analyze this binary, I will:
  1. List all functions to get an overview
  2. Identify security-relevant functions (password checks, crypto)
  3. Decompile the most promising function
  4. Extract the key/password
  5. Formulate my findings"

# Phase 2: Execution (follows the plan)
Step 1: call list_functions() → [main, check_password, ...]
Step 2: identifies check_password and decrypt_flag
Step 3: call decompile_function("decrypt_flag") → code
Step 4: extracts key "s3cr3t" from code
Step 5: writes final report

This is implemented through prompting: you tell the agent in its system prompt to first create a plan, then execute it step by step. It’s still using the ReAct loop under the hood, but with a planning phase at the beginning.

Running Example: Configuring Our First Agent

Let’s apply what we’ve learned to configure the first agent in our Security Alert Triage System: the Alert Classifier.

Alert Classifier — Full Configuration

Model:        Non-reasoning model  (fast — classification is simple)

System Prompt:
"You are a security alert triage specialist at a SOC. You have
deep expertise in network security, malware analysis, and incident
classification.

For each alert, produce:
1. Severity: critical / high / medium / low / informational
2. Category: malware, intrusion, data-exfiltration,
   misconfiguration, false-positive
3. Confidence: 0.0 to 1.0

If confidence > 0.7, classify severity as 'high' and explain why.
Never classify as 'informational' unless confidence > 0.9."

Human Prompt:
"Classify this alert: {alert_content}"

Tools:          None  (direct strategy — classification only)

Strategy:      Direct  (one LLM call, no tools needed)

This agent is simple, focused, and cheap to run. In the next chapter, we’ll connect it to other specialized agents to build a complete triage system.

Chapter Summary

Key Takeaways:

Every agent has four configuration surfaces: model, instructions, tools, and reasoning strategy.
Choose between reasoning models (deep thinking, multi-step logic) and non-reasoning models (fast, direct) based on task complexity.
The system prompt is the highest-leverage configuration. It needs identity, focus, constraints, and output format.
Tools give agents the ability to act. The LLM decides when and how to call them.
Keep tools per agent to 2–5. More tools = more confusion.
Three reasoning strategies: Direct (one shot), ReAct (tool loop), Plan-then-Execute (plan first, then ReAct).

← Chapter 1: Foundations Chapter 3: Workflow Topology →