verify=¶
A judge-and-retry loop wrapped around an agent's output (or around a
specific tool call). Each output is graded; rejection feeds the
judge's reason back into the next attempt as feedback, capped at
max_verify attempts.
The hard sibling of verify= is Guards: a guard blocks
on policy violation; verify= accepts feedback and retries. Pick
the right tool for the policy.
Signature¶
# Three placements, same judge contract.
# 1. Agent-level — final output gate
Agent(
engine=LLMEngine("claude-haiku-4-5"),
verify=judge_agent, # Agent or Callable[[str], Any]
max_verify=3, # max attempts; default 3, must be >= 1
)
# 2. Tool-level — every call through the wrapped agent is gated
agent.as_tool(
name="...",
description="...",
verify=judge_agent,
max_verify=3,
)
# 3. Plan-level — wrap the step's agent with its own verify=
Plan(
Step(
target=Agent(
engine=LLMEngine("claude-haiku-4-5"),
verify=judge_agent,
max_verify=3,
name="summarise",
),
name="summarise",
),
)
Judge contract¶
- The judge receives the agent's output text (and the original task
for context) and must respond with a string starting with
"approved"(case-insensitive) to accept. - Anything else is treated as rejection; the verdict text is
injected as feedback on the next retry:
f"{original_task}\n\nFeedback: {judge_verdict}". - Judges may be
Agentinstances or plain callables (Callable[[str], Any]).
Synopsis¶
verify= is the soft, retryable counterpart to Guards. Where a
guard is a hard yes / no gate that ends the run on failure, a
verifier re-asks the agent with the judge's feedback baked into
the next prompt. After max_verify attempts the last result is
returned as-is, even if still rejected — there's no infinite loop.
There are three placements, all with the same contract:
- Agent-level (
Agent(engine=…, verify=judge)) gates the agent's final output. Whatever tool chain the engine chose internally, the judge sees the eventual text and may force a retry. - Tool-level (
agent.as_tool(verify=judge)) gates every invocation of one specific wrapped agent — useful when one sub-agent is the risky one and the rest is fine. Already documented in passing in as-tool;verify=is the dedicated reference for the loop semantics. - Plan-level is just a special case of agent-level: wrap the
step's agent with its own
verify=. No special primitive.
When to use it¶
- High-stakes outputs that should not ship "first try". Drafts, summaries of regulated content, customer-facing replies.
- Quality control with feedback. Unlike a guard,
verify=gives the agent a chance to fix what the judge complained about. Use it when "wrong answer" is recoverable, not just blockable. - Per-tool gating. When one sub-agent in a hierarchy is
noisier than the rest, gate just that one with
agent.as_tool(verify=judge)instead of taxing the whole pipeline. - Targeted Plan steps. When one step in a
Planis the quality-critical one (the summary, the final draft), wrap just its agent withverify=— leave the rest unchecked.
When NOT to use it¶
- Hard policy violations. PII leakage, content-safety
failures, schema violations — those need Guards.
verify=retries; a guard refuses and ends the run. - CI / batch grading. When you want to test "does this agent
generally produce good output?" offline, use Evals
instead.
verify=runs every single time the agent is invoked. - As a structured-output validator. That's what
output=does (withmax_output_retries=). The framework already re-prompts on Pydantic validation errors; you don't needverify=on top. - Multi-criteria judges. A judge that grades fluency,
accuracy, and tone simultaneously produces vague feedback.
Either run two separate
verify=loops (expensive) or pick one criterion and accept the rest will need a different mechanism.
Example¶
from lazybridge import Agent, LLMEngine, Plan, Step
judge = Agent(
engine=LLMEngine(
"claude-haiku-4-5-20251001", # cheap-tier judge
system='Respond "approved" or "rejected: <short reason>".',
),
name="judge",
)
# 1) Agent-level — final output gated, up to 2 attempts.
writer = Agent(
engine=LLMEngine("claude-haiku-4-5"),
verify=judge,
max_verify=2,
name="writer",
)
result = writer("write a haiku about bees")
print(result.text())
# 2) Tool-level (Option B) — every call of synthesizer is gated;
# the orchestrator's other tools run unchecked.
synthesizer = Agent(
engine=LLMEngine("claude-haiku-4-5"),
name="synthesizer",
)
orchestrator = Agent(
engine=LLMEngine("claude-haiku-4-5"),
tools=[
synthesizer.as_tool(
name="synth",
verify=judge,
max_verify=2,
),
],
)
# 3) Plan-level — only the summarise step is gated.
fetcher = Agent(
engine=LLMEngine("claude-haiku-4-5"),
name="fetch",
)
publisher = Agent(
engine=LLMEngine("claude-haiku-4-5"),
name="publish",
)
summariser = Agent(
engine=LLMEngine("claude-haiku-4-5"),
verify=judge,
max_verify=2,
name="summarise",
)
plan = Agent(
engine=Plan(
Step(target=fetcher, name=fetcher.name),
Step(target=summariser, name=summariser.name),
Step(target=publisher, name=publisher.name),
),
)
# 4) Callable judge — boolean verdict, no feedback loop.
def at_least_three_lines(output: str) -> bool:
return output.count("\n") >= 2
short_writer = Agent(
engine=LLMEngine("claude-haiku-4-5"),
verify=at_least_three_lines,
max_verify=2,
)
Pitfalls¶
- A strict judge + small
max_verifysilently returns poor output. After the cap is hit, the last attempt is returned even if still rejected. Log the retry feedback during development so you know when you're hitting the cap; consider raisingmax_verifyor relaxing the policy. - Callable judges returning booleans don't produce feedback.
Retries reuse the same task. Return a string verdict
(
"approved"or"rejected: <reason>") if you want the feedback loop. Abool-returning callable is acceptable when the failure mode is binary and "try again" alone is enough. - Nested verify is allowed but expensive. Agent-level + tool-level + Plan-level on the same path stacks the loops. Pick one per agent unless you're intentionally building defence in depth.
- Keep judges cheap and specific. Use a smaller / faster model. One criterion per judge — multi-criteria judges conflate failure modes and produce vague feedback. Two single-criterion judges chained at different levels often produce better results than one multi-criterion judge at one level.
verify=operates onEnvelope.text(). When the agent has structured output (output=Model), the judge sees the JSON serialisation, not the model instance. Phrase the judge's policy accordingly, or write a callable judge that doesjson.loads(...)first.max_verify=1disables the retry but keeps the gate. The agent runs once, the judge grades, and the result is returned whatever the verdict. Effectively a non-blocking quality warning. Use a realmax_verify=2or3if you actually want the retry semantics.- Cost rollup. Each retry is a full agent run plus a judge
run. With a
Session=, the cost rollup includes every attempt; checksession.usage_summary()for the true cost of a verify-heavy workflow.
See also¶
- Guards — the hard counterpart: blocks on policy
failure with no retry. Pair them: hard guards for "must never
pass",
verify=for "should usually be good". - As tool — the surface for tool-level
verify=placement. - Evals — offline / CI grading;
verify=is the runtime sibling. - Agent —
verify=is a first-classAgentkwarg alongsideguard=,output=, andtools=.