Step 8: Parallel work with Agent.parallel¶
Agent.chain (Step 7) runs stages in sequence. That's right when each
stage depends on the previous one — research, then write, then edit.
But many real workflows have a different shape: several agents could run at the same time, because they don't depend on each other. Asking three different model families the same question. Fanning out a query across three search backends. Extracting entities, sentiment, and topics from the same document. None of these need to be sequential.
Agent.parallel is the primitive for that case.
The simplest parallel fan-out¶
from lazybridge import Agent, LLMEngine
researcher_a = Agent(engine=LLMEngine("claude-haiku-4-5"), name="claude_view")
researcher_b = Agent(engine=LLMEngine("gpt-5.4-mini"), name="gpt_view")
researcher_c = Agent(engine=LLMEngine("gemini-3-flash-preview"), name="gemini_view")
panel = Agent.parallel(researcher_a, researcher_b, researcher_c)
env = panel("What are the trade-offs of agent frameworks in 2026?")
print(env.text())
Agent.parallel(a, b, c) returns a single agent (a ParallelAgent) that
takes one task, runs all branches concurrently on that task, and bundles
the results into one envelope.
Three things to notice:
- Same task, three branches. Each agent receives the exact same input.
- Independent execution. No branch sees any other branch's output. They're isolated.
- One envelope back. From the caller's point of view it's still just an
agent(...).text()call.
What the returned envelope looks like¶
The result envelope is your hand-off to the rest of the program. Two things are guaranteed:
.text() — a labelled join¶
[claude_view]
Agent frameworks fall into three camps: graph-based ...
[gpt_view]
The trade-offs centre on flexibility versus determinism ...
[gemini_view]
You can think of agent frameworks along two axes: ...
Each branch's output is prefixed with its agent name in square brackets. If you just want a quick "show me everything", this is enough.
.payload — list of per-branch envelopes¶
When you need typed access (per-branch cost, individual .error, per-branch
metadata), the payload is a list of Envelopes, one per branch, in input
order:
env = panel("...")
for branch in env.payload:
print(branch.metadata.run_id, branch.metadata.cost_usd, branch.text()[:60])
Output:
run_01HX... 0.00009 Agent frameworks fall into three camps: graph-based ...
run_01HY... 0.00012 The trade-offs centre on flexibility versus determinism ...
run_01HZ... 0.00008 You can think of agent frameworks along two axes ...
The same envelope shape you've seen from a single agent — just N of them.
Async-typed access
If you're already in async code and want the per-branch envelopes as
objects (not as a payload list), await panel.run_branches(task) returns
the list directly. The synchronous panel(task) is the canonical entry.
Before the next example — Agent.parallel vs parallel tool calls¶
This is the single most common source of confusion in this step. Read it now, before the bigger example below.
The LLM can already emit parallel tool calls. You saw it in Step 4 —
the weather agent fetched Rome and Tokyo concurrently in a single turn.
So when you pass three sub-agents as tools=[a, b, c], the model may
choose to call all three at once, or just one, or none. The model
decides.
Agent.parallel(a, b, c) is the opposite: you decide, at
composition time, that all three run on every call. No model in the
middle, no skipping.
Agent.parallel(a, b, c) |
tools=[a, b, c] on a regular Agent |
|
|---|---|---|
| Who decides to fan out? | You — always runs every branch | The LLM — picks at runtime, may skip branches |
| Cost predictability | Predictable: N branches always | Variable: depends on the model's plan |
| Same input to each branch? | Yes (identical) | LLM crafts each tool call's arguments |
| Use when | "I know I want all of these" | "Let the model figure out what's needed" |
Rule of thumb: if you want all N branches every time, use
Agent.parallel. If the right subset depends on the query, expose them
as tools= on an orchestrating agent and let the model decide.
The example below uses Agent.parallel because we want every search
backend hit on every query — we're trying to get coverage, not be smart
about which one to ask.
A concrete use case — multi-source search¶
A common production shape: fan a query out across several search backends and have the LLM synthesise the results.
def search_anthropic_docs(query: str) -> str:
"""Search Anthropic's docs portal."""
return f"[anthropic docs result for {query!r}]"
def search_openai_docs(query: str) -> str:
"""Search OpenAI's platform docs."""
return f"[openai docs result for {query!r}]"
def search_google_docs(query: str) -> str:
"""Search Google AI docs."""
return f"[google ai docs result for {query!r}]"
anthropic_search = Agent(
engine=LLMEngine("claude-haiku-4-5", system="Look up Anthropic-specific facts."),
tools=[search_anthropic_docs],
name="anthropic_search",
)
openai_search = Agent(
engine=LLMEngine("claude-haiku-4-5", system="Look up OpenAI-specific facts."),
tools=[search_openai_docs],
name="openai_search",
)
google_search = Agent(
engine=LLMEngine("claude-haiku-4-5", system="Look up Google/Gemini-specific facts."),
tools=[search_google_docs],
name="google_search",
)
# Step 1 — fan out the same query to three search agents in parallel
fanout = Agent.parallel(anthropic_search, openai_search, google_search)
# Step 2 — pass the combined result to a synthesiser
synthesiser = Agent(
engine=LLMEngine(
"claude-opus-4-7",
system="You synthesise multi-source research into one tight paragraph. "
"Cite each source by its label.",
),
name="synthesiser",
)
pipeline = Agent.chain(fanout, synthesiser)
print(pipeline("How does each provider implement structured output?").text())
This is parallel inside a chain: the fan-out produces a single envelope
whose text() is the labelled join, which becomes the input to the
synthesiser. Two LazyBridge primitives, one expression each, the same
familiar shape.
The wall-clock saving is real: three searches that would take 3×t in series take ≈ t in parallel (modulo network jitter). Cost still adds up — you pay for every branch.
Bounding concurrency and timeouts¶
Two control knobs you should set in production:
panel = Agent.parallel(
researcher_a,
researcher_b,
researcher_c,
concurrency_limit=2, # never run more than 2 branches at once
step_timeout=30.0, # any branch exceeding 30s is cancelled
)
concurrency_limitcaps the simultaneous runs. Useful when:- Each branch hits the same rate-limited API
- You have many branches and want to be a polite neighbour
- The default (
None= unbounded) would saturate your provider quota
step_timeoutcaps the wall clock of each branch individually. The envelope of a timed-out branch arrives withbranch.errorset andbranch.ok == False; the other branches still complete normally.
Both are optional. For small fan-outs (2–5 branches) you usually leave them unset.
Cost and latency — the trade-off¶
Parallel saves time but not cost. With four branches:
| Sequential chain | Agent.parallel |
|
|---|---|---|
| Cost | sum of all stage costs | sum of all branch costs (same) |
| Latency | sum of all stage latencies | max of all branch latencies |
| Order guarantee | strict — N+1 sees N's output | none — branches are isolated |
| Composition | next stage gets stage N's text | next stage gets the labelled join |
Parallel is the right choice when you'd otherwise wait on the same wall clock
N times. It's the wrong choice when stage N+1 needs stage N's output —
that's chain.
Tracing — see all branches at once¶
verbose=True shows every branch's run interleaved:
panel = Agent.parallel(researcher_a, researcher_b, researcher_c, verbose=True)
panel("What are the trade-offs of agent frameworks in 2026?")
Output (abbreviated):
[parallel ▶ 3 branches]
[branch claude_view model=claude-haiku-4-5] started
[branch gpt_view model=gpt-5.4-mini] started
[branch gemini_view model=gemini-3-flash-preview] started
[branch claude_view] assistant: Agent frameworks fall into ...
[branch claude_view] done cost=$0.00009
[branch gpt_view] assistant: The trade-offs centre on ...
[branch gpt_view] done cost=$0.00012
[branch gemini_view] assistant: You can think of agent ...
[branch gemini_view] done cost=$0.00008
[done] branches=3 total_cost=$0.00029 max_latency=812ms
Branch lines are interleaved (because they run concurrently) but each one is prefixed with its name, so you can follow each branch's trace by filtering on the prefix. The final summary shows aggregate cost and max latency (the chain shows sum — that's the visible difference between the two).
How other frameworks express parallel fan-out¶
CrewAI (hierarchical process)
CrewAI doesn't have a "parallel" process directly — its sequential and
hierarchical modes are about delegation order, not concurrent execution.
To run agents truly in parallel you'd reach for Python's concurrent.futures
or asyncio.gather:
import asyncio
from crewai import Agent, Task
async def run_one(agent, task):
return await asyncio.to_thread(agent.execute_task, task)
async def fanout(task_text):
tasks = [Task(description=task_text, agent=a) for a in (a1, a2, a3)]
return await asyncio.gather(*(run_one(a, t) for a, t in zip([a1, a2, a3], tasks)))
results = asyncio.run(fanout("How does each provider..."))
The orchestration is your problem; you've left CrewAI's abstractions and you're back to plain Python concurrency.
LangGraph (parallel nodes via Send)
from typing import Annotated, TypedDict
from operator import add
from langgraph.graph import StateGraph, START, END
from langgraph.types import Send
from langchain_anthropic import ChatAnthropic
claude_view = ChatAnthropic(model="claude-haiku-4-5")
gpt_view = ChatAnthropic(model="gpt-5.4-mini")
gemini_view = ChatAnthropic(model="gemini-3-flash-preview")
class State(TypedDict):
query: str
results: Annotated[list[str], add] # reducer aggregates branch outputs
def fanout(state: State):
return [Send("branch", {"query": state["query"], "agent_id": i})
for i in range(3)]
def branch(state):
models = [claude_view, gpt_view, gemini_view]
reply = models[state["agent_id"]].invoke(state["query"])
return {"results": [reply.content]}
builder = StateGraph(State)
builder.add_node("fanout", lambda s: s)
builder.add_node("branch", branch)
builder.add_conditional_edges("fanout", fanout, ["branch"])
builder.add_edge("branch", END)
builder.add_edge(START, "fanout")
graph = builder.compile()
out = graph.invoke({"query": "How does each provider...", "results": []})
for r in out["results"]:
print(r)
Powerful — Send is one of LangGraph's strongest primitives. Cost: a
typed State, a reducer to merge branch outputs, a conditional-edge
function, three node definitions. For the simple "run these N agents
on the same task" case, Agent.parallel(a, b, c) is one line.
When NOT to use Agent.parallel¶
| Symptom | Use instead |
|---|---|
| Stage N+1 needs the output of stage N | Agent.chain — Step 7 |
| You want the LLM to pick which subset to run | tools=[a, b, c] — Step 4/5 |
| Each branch needs different input | Plan with named steps — Step 9 |
| Branches need to be conditional | Plan with routes= — Step 10 |
Agent.parallel is for the case where the only dimension of variation is
"which agent" — the input is the same for all of them. Anything more
structured belongs in Plan.
Summary¶
| Concept | Syntax | What it does |
|---|---|---|
| Fan-out | Agent.parallel(a, b, c) |
Runs all branches concurrently on the same task |
| Joined text | env.text() |
Labelled join across branches ([name] prefixed) |
| Per-branch envelopes | env.payload |
list[Envelope] in input order |
| Async typed access | await panel.run_branches(task) |
Same list, returned directly |
| Concurrency cap | concurrency_limit=N |
Never more than N branches at once |
| Per-branch timeout | step_timeout=seconds |
Cancel any branch over the limit |
| Cost rollup | env.metadata.cost_usd |
Sum across all branches |
| Latency rollup | env.metadata.latency_ms |
Max across branches (not sum) |
You've got the three deterministic composition primitives now — chain,
parallel, and sub-agent-as-tool. The next step covers the most powerful
one: the explicit DAG via Plan + sentinels, for workflows that don't
fit the three so far.