Orchestration

Bounded planner / executor / critic, subagent lanes, and durable background sessions.

Foundational SpecLast reviewed: 2026-05-12 Edit on GitHub

At a glance

Decision planeBounded execution loop

Planner / Executor / Critic with deterministic budgets, subagent lanes, and durable background sessions.

Inputs

CompiledContext from the Cognitive Core
Run Context with budgets and approval-mode authority
Decision Spec from the Decision Catalog
Tool registry with declared approval modes

Outputs

Typed Plan and step transcripts
DecisionRecord with evidence_refs, approvals, controls_active
Memory write proposals
Trace bundle

Lifecycle

plan
verify
execute
score
consolidate

Canonical types

Plan
DecisionRecord
ToolCall
ToolResult
BackgroundSession

Orchestration is the Decision plane that turns a CompiledContext into a typed DecisionRecord. It is a bounded triad — Planner, Executor, Critic — running under one Run Context with explicit budgets, isolated subagent lanes, and durable background sessions.

Definition

A control loop with three roles that move forward only when all three agree on a verdict. The Planner produces a typed plan; the Executor runs only verified plan steps through the Tool Gateway; the Critic verifies the plan, scores each completed step, and decides accept / retry / replan / escalate.

Why it exists

A single agentic loop conflates planning, execution, and judgment, making failures uninterpretable and replays unreliable. Splitting them gives the runtime three independently auditable artifacts (plan, transcript, verdict), enables replay without re-executing tools, and lets each role enforce its own budget.

How it works

Plan: the Planner reads the CompiledContext and produces a typed Plan with steps, tool intents, decision checkpoints, and dependencies. It cannot execute side effects.
Verify: the Critic checks the plan against tool allow-lists, evidence requirements, approval-mode declarations, and decision-binding rules.
Execute: the Executor runs only verified steps via the Tool Gateway with retries, idempotency keys, and approval-mode-aware routing.
Score: the Critic scores each completed step and the run as a whole against evaluators.
Decide: the Critic emits one of accept / retry / replan / escalate. Replans return control to the Planner under a re-plan budget.

Bounded planner / executor / critic

The triad is the heart of the Decision plane:

Planner — typed plan only; no side effects. Sees the full CompiledContext. Cap: planner timeout (5–30s by task class).
Executor — runs verified plan steps only. Cannot extend the plan; if a step fails or returns unexpected evidence, control returns to the Planner with a structured reason. Cap: per-step timeout (adapter-specific).
Critic — verifies plans, scores steps, and renders the final verdict. Operates on plan + transcripts; can re-derive its verdict offline during replay.

This separation makes the runtime debuggable by role and enables replay: a Critic verdict is reproducible from the saved plan, tool transcripts, policy versions, evaluator versions, and model-call lineage without re-executing tools.

Plan structure

{
  "plan_id": "plan_refund_01",
  "intent": "support.refund",
  "steps": [
    { "id": "s1", "tool": "adp_orders.lookup", "params": { "order_id": "ord_881" } },
    { "id": "s2", "tool": "adp_policy.eval", "depends_on": ["s1"] },
    { "id": "s3", "tool": "adp_payments.issue_refund",
      "depends_on": ["s2"],
      "approval_mode": "destructive",
      "requires": ["GATE_FINANCE_APPROVAL"] }
  ],
  "decision_checkpoints": [
    { "decision_id": "support.refund.execute", "after_step": "s2" }
  ]
}

Approval-mode-aware routing

The Orchestrator reads each tool’s declared approval_mode (see Governance — Approval-mode tiers) and routes accordingly:

read_only and local_write may execute inline within a step.
network, delegated, and destructive are split into a propose → approve → execute sub-sequence so the approval surface is consistent across workflows.
A destructive step cannot bypass its gate even if the model suggests an alternate path.

Subagent lanes

Long-horizon or parallelizable work runs in isolated lanes:

Each lane is spawned with its own Run Context, token budget, tool surface, and trace span.
The parent Run Context observes lane completion as a typed envelope; lane traces are stitched into the parent trace.
Lanes cannot mutate the parent’s effects[]; they propose results that the parent’s Critic accepts or rejects.
Lane-level loop guards trip independently from the parent.

Durable background sessions

Hour-scale workflows run as durable sessions that persist across process restarts.

Session contract

Every mode: "long_running" request materializes a Session:

{
  "session_id": "sess_inv_22",
  "tenant_id": "tenant_acme_prod",
  "pack_pin": "ctxpack.finops@3.1.0",
  "snapshot_pin": "kg_2026_05_03_T0930",
  "intent": "finance.invoice.credit_adjust",
  "started_at": "2026-05-04T10:14:00Z",
  "heartbeat": { "interval_ms": 30000, "ttl_ms": 14400000, "last_seen": "2026-05-04T10:25:00Z" },
  "status": "in_progress",
  "checkpoints": [
    { "checkpoint_id": "ck_inv_22_after_s2", "after_step": "s2", "critic_verdict": "ok", "saved_at": "2026-05-04T10:14:42Z" },
    { "checkpoint_id": "ck_inv_22_after_s3", "after_step": "s3", "critic_verdict": "accept", "saved_at": "2026-05-04T10:25:11Z" }
  ],
  "next_step": "s4",
  "owner_actor": "agt_finops"
}

Lifecycle states

State	Meaning	Transitions
`in_progress`	actively executing	→ `awaiting_gate`, `paused`, `completed`, `failed`
`awaiting_gate`	blocked on an approval gate	→ `in_progress` (on approve), `rejected` (on deny), `expired` (gate TTL)
`paused`	operator-initiated pause	→ `in_progress`, `cancelled`
`completed`	terminal success	(terminal)
`failed`	terminal failure	(terminal)
`expired`	gate or session TTL exceeded	(terminal)
`rejected` / `cancelled`	terminal denials	(terminal)

Spawn

Triggered by invokeAgent with mode: "long_running" and runtime.max_session_duration_ms ≥ 1 hour.
Conversation Manager validates the envelope, materializes the Run Context, persists the initial session row before any plane runs.
Planner picks up the session and emits the first Plan against the pinned pack + snapshot.

Checkpoint

A checkpoint is persisted after every Critic verdict (ok / accept / replan).
Checkpoint payload includes: plan_id, accumulated tool_transcripts[] IDs, accumulated evidence_refs[], RunBudget accumulators (atomic snapshot), Critic step scores, the next-step pointer.
Checkpoint storage is append-only and content-addressed; checkpoints are immutable.

Resume

A resumable session is reconstituted by session_id against the recorded pack_pin + snapshot_pin (refusing if either has advanced beyond the recorded version — pack_version_mismatch / snapshot_version_mismatch).
Resume restores the Run Context’s atomic budgets to their checkpointed values; remaining budget = original − used.
Resume executes from next_step against the most recent checkpoint; never re-executes already-recorded tool calls (idempotency keys absorb any duplicate calls that escape).
Operator interruptions enqueue a resume; no work is lost.

Heartbeat and TTL

The session heartbeat is renewed by the Critic on every verdict; loss of heartbeat for 2 × interval_ms flips the session to paused.
Session TTL (max_session_duration_ms) is hard; on expiry the runtime emits Session expired and persists a terminal checkpoint.
Approval gates have their own TTL; gate expiry sets the session to expired independently of the session TTL.

Progress envelopes

Sessions emit periodic typed progress envelopes consumable by external orchestrators:

{
  "session_id": "sess_inv_22",
  "progress_id": "prog_inv_22_0014",
  "emitted_at": "2026-05-04T10:25:11Z",
  "current_step": "s4",
  "completed_step_count": 3,
  "total_step_count": 4,
  "budget_remaining": { "tokens": 7280, "tool_calls": 8, "cost_cents": 24.0, "wall_clock_ms": 12830 },
  "awaiting_gate": null
}

Failure modes specific to background sessions

pack_version_mismatch on resume — pack advanced after suspend; resume refused; operator decides to migrate or terminate.
snapshot_version_mismatch on resume — knowledge-graph snapshot advanced; resume refused; same operator decision.
Heartbeat loss without operator pause — runtime watchdog flips session to paused; auto-resume permitted only if operator approves.
Checkpoint write fails — runtime aborts the Critic verdict and emits failed; no partial commit.
Long-running session over budget — atomic budget exhaustion mid-step produces escalate; operator may extend Run Budget on resume.

Replay

Any past run can be re-executed against:

the recorded CompiledContext,
the recorded Plan,
the recorded tool transcripts,
the policy, evaluator, and model-profile versions active during the run,
the pinned Knowledge Graph, memory, and registry snapshots,

to re-derive the Critic verdict and the DecisionRecord without paying live tool cost. This is the substrate for evaluation, regression testing, and post-incident investigation.

Verification and guardrails

Tool allow-lists — only permitted tools and arguments may appear in a plan.
Evidence checks — decisions require resolved evidence_refs before execution.
Approval gates — steps are blocked until approved, with frozen evidence snapshot.
Conflict detection — plans that violate policy invariants are rejected before execution.
Loop guard — repeated identical tool calls or no-progress reflection cycles short-circuit with a structured reason.

Autotune surfaces

Decision-plane autotune is higher risk than Context-plane tuning because it changes how work is sequenced. It is still useful when bounded to explicit orchestration surfaces and scored by replay.

Surface	Candidate examples	Guardrail
Planner templates	change step ordering, add a verification step before an action, prefer a known Skill	Plan verification pass rate and utility must improve without adding risky tool calls.
Re-plan budgets	lower or raise max re-plan attempts for one intent class	Loop-guard trips and escalation quality cannot regress.
Critic rubrics	tighten evidence sufficiency, adjust retry vs. escalate thresholds	Policy and safety are floor constraints; replay must reproduce typed verdicts.
Subagent lanes	fan-out limit, lane timeout, lane eligibility by risk class	Parent effects remain immutable until the parent Critic accepts lane output.
Background sessions	checkpoint cadence, progress envelope frequency, resume policy	Resume must use the pinned pack and snapshot from session start.

The optimizer cannot propose direct tool execution, approval bypasses, or untyped free-form plans. A candidate that changes the loop must emit a replayable Plan, step transcript expectations, and a DecisionRecord diff for review.

Execution model

Idempotency keys prevent duplicate side effects on retries.
Retry budgets are bounded per step and per workflow.
Timeouts and fallbacks degrade to HITL when tools fail.
Compensation steps are optional rollbacks for reversible tools.
Parallel steps within a plan use the Executor’s lane abstraction.

Error handling and recovery

Step failure isolates the failing step and returns control to the Planner under the re-plan budget.
Partial execution records progress in the Run Context and prevents duplicate actions on resume.
Human handoff routes to approval or operator with the full Plan, transcript, and verdict so far.

Implementation mapping

Orchestration is implemented primarily by:

Orchestrator (planning, verification, execution control, subagent lanes)
Decision Catalog (typed decision validation)
Policy Engine (approval gates and constraints)

Implementation references

Interfaces

Inputs

CompiledContext from the Cognitive Core
Run Context (with budgets and approval-mode authority)
Decision Spec from the Decision Catalog
Tool registry with declared approval modes

Outputs

Typed Plan and step transcripts
DecisionRecord with evidence_refs, approvals, controls_active
Memory write proposals
Trace bundle

Failure modes

Plan contains disallowed tools or missing approvals (caught by the Critic at verify).
Tool retries cause duplicate side effects (mitigated by idempotency keys).
Missing evidence for claims (rejected at the decision checkpoint).
Subagent lane modifies parent effects (must remain isolated).
Background session resumes against a different Context Pack version than it was suspended on.
Loop guard trips silently without a structured reason.

Operational concerns

Planner latency vs. template coverage tradeoff per intent.
Re-plan budgets per workflow; default 2, raised only with rationale.
Verification cost and strictness scale with risk tier.
Workflow timeouts and SLA enforcement at the Run Context boundary.
Subagent lane fan-out limits to bound cost and latency.
Background session checkpoint storage and TTL by tenant.

Evaluation metrics

Plan-verification pass rate.
Step completion rate.
Retry and recovery rate.
Escalation rate by risk tier.
Subagent lane success rate vs. parent re-plan rate.
Mean time to safe completion.

Example

A Critic verdict envelope condensed:

{
  "run_id": "run_a1b2c3d4e5f60718",
  "plan_id": "plan_refund_01",
  "verdict": "accept",
  "step_scores": [
    { "step_id": "s1", "scores": { "policy": 1.0, "utility": 1.0 } },
    { "step_id": "s2", "scores": { "policy": 1.0, "utility": 1.0 } },
    { "step_id": "s3", "scores": { "policy": 1.0, "utility": 0.95 }, "approval_gate": "GATE_FINANCE_APPROVAL", "approver": "user_finance_lead" }
  ],
  "loop_guard_trips": 0,
  "replan_attempts": 0
}

Common misconceptions

Orchestration is not a workflow engine alone. It includes verification and scoring gates that traditional engines lack.
Guardrails are not optional. They are part of the runtime loop, not a release-gate add-on.
Subagent lanes are not free parallelism. They consume budget from the parent Run Context’s accumulators.
Replay is not optional. It is the property that makes evaluation honest.