Skip to content
Press / to search

Orchestrator

Decision-plane component that runs the bounded Planner / Executor / Critic triad.

Reference DesignLast reviewed: Edit on GitHub
At a glance
Decision planeBounded execution loop

One Run Context's worth of execution: read CompiledContext, run the triad, emit a DecisionRecord.

Inputs
  • CompiledContext from the Compiler
  • Run Context with RunBudget (atomic accumulators)
  • DecisionSpec registry from the Decision Catalog
  • Tool registry with declared approval modes
Outputs
  • Typed Plan and step transcripts
  • DecisionRecord with evidence_refs, approvals, controls_active
  • Memory write proposals
  • OTEL trace bundle stitched across subagent lanes
  • Session checkpoints (in long_running mode)
Lifecycle
  1. plan
  2. verify
  3. execute
  4. score
  5. consolidate
Canonical types
  • Plan
  • DecisionRecord
  • ToolCall
  • ToolResult
  • BackgroundSession

Reference Architecture

The Orchestrator is the bounded execution loop of the Decision plane. It runs the Planner / Executor / Critic triad, manages subagent lanes, and persists checkpoints for durable background sessions.

Definition

A coordinator component that owns one Run Context’s worth of execution. Reads the CompiledContext; runs the triad until the Critic emits a terminal verdict; produces a DecisionRecord. See Orchestration for the spec narrative.

Why it exists

A single agentic loop conflates planning, execution, and judgment, making failures uninterpretable. Splitting them gives the runtime three independently auditable artifacts (plan, transcript, verdict), enables replay, and lets each role enforce its own budget.

Inputs

  • CompiledContext from the Compiler
  • Run Context with RunBudget (atomic accumulators)
  • DecisionSpec registry from the Decision Catalog
  • Tool registry with declared approval modes

Outputs

  • Typed Plan and step transcripts
  • DecisionRecord with evidence_refs, approvals, controls_active
  • Memory write proposals
  • OTEL trace bundle stitched across subagent lanes
  • Session checkpoints (in long_running mode)

How it works

  1. Plan — Planner reads CompiledContext; emits typed Plan with steps, tool intents, decision checkpoints.
  2. Verify — Critic checks plan against tool allow-lists, evidence requirements, approval-mode declarations.
  3. Execute — Executor runs verified steps via the Tool Manager. For network / delegated / destructive, splits into propose → approve → execute against a frozen evidence snapshot.
  4. Score — Critic scores each completed step on the evaluators; renders accept / retry / replan / escalate.
  5. Consolidate — extracts effects + evidence into memory write proposals.
  6. Loop or terminate — re-plan attempts capped by RunBudget.max_replan_attempts.

Subagent lanes

  • Lanes are spawned with their own Run Context, token budget, tool surface, trace span.
  • Lane outputs return as typed envelopes; lane traces stitched into the parent.
  • Lanes cannot mutate parent effects[]; they propose results the parent’s Critic accepts or rejects.

Background sessions

  • A long_running mode persists a checkpoint after every Critic verdict.
  • Resumable by session_id against the same pinned pack and snapshot.
  • Operator interruptions enqueue a checkpoint resume.

Failure modes

  • Plan contains disallowed tools or missing approvals — caught by Critic at verify.
  • Tool retries cause duplicate side effects — mitigated by idempotency keys at the Tool Manager.
  • Subagent lane modifies parent effects — invariant violation; lane terminated.
  • Background session resumes against a different pack version — refuse with pack_version_mismatch.
  • Loop guard trips silently without a structured reason — bug; loop guard must always set loop_detected reason.

Operational concerns

  • Re-plan budgets per workflow; default 2, raised only with rationale.
  • Verification cost and strictness scale with risk_class.
  • Workflow timeouts and SLA enforcement at the Run Context boundary.
  • Subagent lane fan-out limits to bound cost and latency.
  • Background session checkpoint storage and TTL by tenant.

Evaluation metrics

  • Plan-verification pass rate.
  • Step completion rate.
  • Retry and recovery rate.
  • Escalation rate by risk tier.
  • Subagent lane success rate vs. parent re-plan rate.
  • Mean time to safe completion.