Subagent: Let Your Model Delegate the Busywork
Kenny Rogers ·
On this page
Add openrouter:subagent to your tools array and your model can delegate self-contained tasks to a smaller, cheaper, faster worker model mid-generation. Summarize a document, extract structured data, draft boilerplate, reformat text: the worker handles it and passes the result back. Your frontier model keeps orchestrating without burning expensive tokens on routine work.
Try it in the chatroom, read the docs, or follow the cookbook recipe to wire it into your app.
{
"model": "anthropic/claude-opus-4.8",
"messages": [{ "role": "user", "content": "Audit this release: summarize the changelog, list breaking changes, and draft the announcement." }],
"tools": [
{
"type": "openrouter:subagent",
"parameters": { "model": "z-ai/glm-5.2" }
}
]
}
The model decides when to delegate. It only invokes the subagent for tasks that don’t need its full capability.
Find subagent opportunities in your codebase
Paste this prompt into your coding agent to have it scan your project for places where subagent delegation would cut costs:
Read through this codebase and identify places where an OpenRouter API call
could benefit from the openrouter:subagent server tool. Look for patterns where
a frontier model is doing mechanical sub-tasks inline: summarization, data
extraction, reformatting, boilerplate generation, or schema conversion.
For each candidate, explain:
1. Which file and function
2. What the sub-task is
3. Why it's a good fit for delegation (self-contained, predictable output, doesn't need the full conversation context)
4. A code snippet showing how to add the subagent tool to that call
Reference docs: https://openrouter.ai/docs/guides/features/server-tools/subagent
Cookbook recipe: https://openrouter.ai/docs/cookbook/building-agents/subagent-server-tool
Frontier brain, budget hands
Claude Opus 4.8 costs $5 per million input tokens. GPT-5.5 costs $5. GLM 5.2 costs $1.40. That’s a 3.6x spread on input between frontier and worker, 5.7x on output. (Claude Fable 5 was $10/$50 per M tokens before it got yanked, RIP.)
A frontier model doing a code review doesn’t need to spend its own tokens summarizing a 2,000-line changelog or reformatting a JSON blob. Those are mechanical tasks with clear instructions and predictable output. The subagent handles them at GLM prices while the orchestrator focuses on the parts that actually require reasoning.
In a complex agentic workflow with 20 tool calls, maybe 5-8 are subagent delegations: summarization, data extraction, template filling, format conversion. The frontier model orchestrates and judges. You’ve cut your per-request cost without touching the quality ceiling on the hard parts.
How it works under the hood
The worker model sees only what the delegating model explicitly passes in the task_description. No parent conversation, no prior context, no memory between tasks. Each delegation is a clean, isolated unit of work.
-
Any model can be the worker. Pin it with
parameters.model(anything in the model catalog works). Open-source models likez-ai/glm-5.2work well for mechanical tasks. If you don’t specify a model, it falls back to the outer request model. -
Workers get their own tools. Give the worker
openrouter:web_searchand it can ground its output in fresh sources before responding. The worker runs its own tool loop internally; only the final text comes back to your model. -
Recursion is blocked. The subagent can’t call itself. A depth header and self-reference check prevent unbounded nesting, and delegations are capped at 10 per request.
{
"tools": [
{
"type": "openrouter:subagent",
"parameters": {
"model": "z-ai/glm-5.2",
"instructions": "You are a fast, focused worker. Complete the task exactly as described.",
"tools": [{ "type": "openrouter:web_search" }]
}
}
]
}
Subagent vs. advisor
These two tools point in opposite directions. The advisor escalates hard decisions to a stronger model. The subagent delegates routine work to a cheaper one.
| Advisor | Subagent | |
|---|---|---|
| Direction | Up (consult a stronger model) | Down (delegate to a cheaper model) |
| Worker choice | Model picks per call | Fixed by tool definition |
| Use case | ”Help me think through this" | "Do this mechanical task for me” |
| Memory | Cross-request transcript replay | None (each task is isolated) |
Use both in the same request. Your frontier model consults the advisor on architectural decisions and delegates summarization to the subagent. Different tools for different kinds of work.
Billing
Subagent tokens bill at the worker model’s rates, separate from the orchestrator. If your orchestrator is Claude Opus 4.8 ($5/$25 per M tokens) and the worker is GLM 5.2 ($1.40/$4.40 per M tokens), each model’s tokens bill at their own price. Both show up on your activity page.
Get started
One line in your tools array:
{ "type": "openrouter:subagent", "parameters": { "model": "z-ai/glm-5.2" } }
The model decides when to use it. Read the full docs for all parameters, worker tools, and recursion details, or follow the cookbook recipe for a working integration.