Anatomy of Subagents

March 1, 2026 • 7 min read

Marionette puppets on stage - subagents controlled by a main agent

> Open Table of contents

The Agent Loop
What Is a Subagent
The Flow
Making It Work
Alternative: Just Hardcode It
Sync vs Async
Handling Failures
Try It Yourself
The Real Insight

Prerequisites: Python, basic idea of agents, tools, memory. Rest will be taken care of :)

The Agent Loop

An agent, at its core, is a simple loop.

At the center is an LLM. Around it, we add tools (to interact with systems), messages/history (to carry context), and some control logic (retries, validation, stopping conditions). Together, these pieces form what we call an agent.

A simplified run-state looks like this:

{
  "messages": [
    {"role": "user", "content": "what is 2+3"}],
  "tools": ["add", "search", "calendar"],
  "memory": {
    "conversation_history": []
  },
  "done": false
}

Each iteration updates this state:

Model reads messages + tools
Model emits either:
- final output, or
- a tool call request
Tool result is appended back to messages
Loop continues until done = true

So fundamentally:

agent = function + loop + memory + tool calls

This becomes a bit problematic when we look at the fact that each agent run needs to account for its memory history, tool calls, its own history, and the subsequent answer.

This piles up pretty quickly and becomes bloated with each step we take.

In turn clogging up the context window and slowing down the agent to do complex tasks for long runs.

This is where we introduce subagents.

What Is a Subagent

Subagents (if we put it simply) are just nested agent runs.

A main agent starts another agent (on the side) for narrower tasks and gets back an output from the side run. It then continues with a more refined, specialised output.

Like a parent branching out its children to do individual special tasks - the final output is a culmination of all these small task outputs.

Main Agent delegating to Subagent A, Subagent B, and producing Final Output

This helps save the agent a lot of context, reduces bias, and decouples a lot of load from the main agent stream.

To keep behavior predictable, pass explicit inputs instead of dumping full context blindly - a task description, an optional context slice, constraints, and the expected output format. This keeps subagent runs tighter and easier to reason about.

The Flow

When we run a subagent, the flow is usually:

Main agent gets input and decides next action (tool call / end / delegate)
Parent runs child: child.run(...)
Child gets its own isolated run loop/state
Child returns output (or error/timeout)
Parent receives that output (not necessarily full child history)
Parent continues or ends

Sequence diagram showing User to Main Agent to Subagent delegation flow

Making It Work

Here's the thing - to use a subagent, you literally just wrap it as a tool:

( Tools being simple functions with a return value that you give to an agent to interact with the real world )

# create a simple subagent
math_agent = Agent(
    model="openai:gpt-4",
    system_prompt="You are a math expert. Solve problems and return just the answer."
)

# Wrap as a tool
def ask_math(task: str) -> str:
    result = math_agent.run_sync(task)
    return result.output

# Main agent uses it
main_agent = Agent(
    model="openai:gpt-4",
    tools=[ask_math],
    system_prompt="Use ask_math for any mathematical calculations."
)

That's it. The main agent sees ask_math as a tool. When it needs math, it calls it. The subagent runs in its own clean context, returns the result, and the main agent continues.

Alternative: Just Hardcode It

Sometimes you don't want the LLM deciding when to delegate. You want to control it in code:

def handle_request(user_input: str):
    if "math" in user_input.lower():
        return math_agent.run_sync(user_input)
    elif "write" in user_input.lower():
        return writer_agent.run_sync(user_input)
    else:
        return general_agent.run_sync(user_input)

Your code decides. Not the agent. Use this when routing logic is simple or you need strict control.

Sync vs Async

Here's where it gets interesting. You can call a subagent in two ways:

Sync - you wait for it:

The main agent pauses. The subagent works. When it's done, the main agent gets the result and moves on.

Use this when you need the answer right now to continue. Like asking for a calculation before you can formulate your response.

Async - you keep going:

The main agent fires off the subagent and immediately continues. The subagent works in the background. Later, you check if it's done.

Use this for independent tasks where the user doesn't need to wait. Like "analyze this document" while you keep chatting about something else.

The core idea is the same in both. Only the timing of when the subagent returns the output changes.

Handling Failures

Subagents fail like any other tool. The child can return:

Success: Output you can use
Error: Something went wrong
Timeout: Took too long

The parent then decides: retry, fallback, escalate, or continue with partial results.

I like to imagine a subagent as a function with params that goes like -

subagent(
    task="solve this",
    context={"user": "admin"},
    timeout=30,
    error="raise"
)

This is actually a feature. With subagents, failure is contained. If the math subagent fails, your main agent can try a different approach or ask for clarification. The error doesn't crash the whole system.

Try It Yourself

Here's a complete runnable example:

from pydantic_ai import Agent

# Create a simple subagent with a system prompt
math_agent = Agent(
    model="openai:gpt-5.2",
    system_prompt="You are a math expert. Solve the problem and return just the numerical answer.",
)


# wrap it as a tool ( return value is the run output of the subagent run )
def ask_math(task: str) -> str:
    """Ask the math expert to solve a problem."""
    result = math_agent.run_sync(task)
    return result.output


# load the main agent with the subagent as a tool
main_agent = Agent(
    model="openai:gpt-5.2",
    tools=[ask_math],
    system_prompt="You are a helpful assistant. Use the ask_math tool for any mathematical calculations.",
)

# run it
if __name__ == "__main__":
    # this triggers the main agent to call the math subagent
    result = main_agent.run_sync("What is 15 multiplied by 23?")
    print(f"Answer: {result.output}")

    # some logging helpers
    print(f"\nUsage: {result.usage()}")
    print(f"Total requests: {result.usage().requests}")

Here we basically -

Import the packages
Setup the subagent with a system prompt
Wrap that subagent into a tool ( ie a function that returns the run output of the subagent run )
Create a new Main agent with the tools loaded (subagent as tool)
Run it

Save it, set your OPENAI_API_KEY, and run it.

The Real Insight

At the end of the day, subagents are not some mystical new primitive.

They still are controlled nested loops for delegation.

Once you see that, most agent frameworks start looking like the same core pattern with different APIs. The complexity isn't in the concept - it's in deciding:

When to delegate
What context to pass
How to handle the result

Start simple. One subagent as a tool. Add complexity when you feel the pain, not before.

That's it. That's the pattern.

Major inspirations and examples for this blog came from the langchain and pydantic docs

I would also suggest you to checkout the official docs for both here - Langchain Subagents and Pydantic AI Multi-Agent

Connect with me

Twitter / Github / LinkedIn / Email