Orchestration
Bounded planner / executor / critic, subagent lanes, and durable background sessions.
Planner / Executor / Critic with deterministic budgets, subagent lanes, and durable background sessions.
- CompiledContext from the Cognitive Core
- Run Context with budgets and approval-mode authority
- Decision Spec from the Decision Catalog
- Tool registry with declared approval modes
- Typed Plan and step transcripts
- DecisionRecord with evidence_refs, approvals, controls_active
- Memory write proposals
- Trace bundle
- plan
- verify
- execute
- score
- consolidate
- Plan
- DecisionRecord
- ToolCall
- ToolResult
- BackgroundSession
Orchestration is the Decision plane that turns a CompiledContext into a typed DecisionRecord. It is a bounded triad — Planner, Executor, Critic — running under one Run Context with explicit budgets, isolated subagent lanes, and durable background sessions.
Definition
A control loop with three roles that move forward only when all three agree on a verdict. The Planner produces a typed plan; the Executor runs only verified plan steps through the Tool Gateway; the Critic verifies the plan, scores each completed step, and decides accept / retry / replan / escalate.
Why it exists
A single agentic loop conflates planning, execution, and judgment, making failures uninterpretable and replays unreliable. Splitting them gives the runtime three independently auditable artifacts (plan, transcript, verdict), enables replay without re-executing tools, and lets each role enforce its own budget.
How it works
- Plan: the Planner reads the
CompiledContextand produces a typedPlanwith steps, tool intents, decision checkpoints, and dependencies. It cannot execute side effects. - Verify: the Critic checks the plan against tool allow-lists, evidence requirements, approval-mode declarations, and decision-binding rules.
- Execute: the Executor runs only verified steps via the Tool Gateway with retries, idempotency keys, and approval-mode-aware routing.
- Score: the Critic scores each completed step and the run as a whole against evaluators.
- Decide: the Critic emits one of
accept/retry/replan/escalate. Replans return control to the Planner under a re-plan budget.
Bounded planner / executor / critic
The triad is the heart of the Decision plane:
- Planner — typed plan only; no side effects. Sees the full
CompiledContext. Cap: planner timeout (5–30s by task class). - Executor — runs verified plan steps only. Cannot extend the plan; if a step fails or returns unexpected evidence, control returns to the Planner with a structured reason. Cap: per-step timeout (adapter-specific).
- Critic — verifies plans, scores steps, and renders the final verdict. Operates on plan + transcripts; can re-derive its verdict offline during replay.
This separation makes the runtime debuggable by role and enables replay: a Critic verdict is reproducible from the saved plan, tool transcripts, policy versions, evaluator versions, and model-call lineage without re-executing tools.
Plan structure
{
"plan_id": "plan_refund_01",
"intent": "support.refund",
"steps": [
{ "id": "s1", "tool": "adp_orders.lookup", "params": { "order_id": "ord_881" } },
{ "id": "s2", "tool": "adp_policy.eval", "depends_on": ["s1"] },
{ "id": "s3", "tool": "adp_payments.issue_refund",
"depends_on": ["s2"],
"approval_mode": "destructive",
"requires": ["GATE_FINANCE_APPROVAL"] }
],
"decision_checkpoints": [
{ "decision_id": "support.refund.execute", "after_step": "s2" }
]
}Approval-mode-aware routing
The Orchestrator reads each tool’s declared approval_mode (see Governance — Approval-mode tiers) and routes accordingly:
read_onlyandlocal_writemay execute inline within a step.network,delegated, anddestructiveare split into a propose → approve → execute sub-sequence so the approval surface is consistent across workflows.- A
destructivestep cannot bypass its gate even if the model suggests an alternate path.
Subagent lanes
Long-horizon or parallelizable work runs in isolated lanes:
- Each lane is spawned with its own Run Context, token budget, tool surface, and trace span.
- The parent Run Context observes lane completion as a typed envelope; lane traces are stitched into the parent trace.
- Lanes cannot mutate the parent’s
effects[]; they propose results that the parent’s Critic accepts or rejects. - Lane-level loop guards trip independently from the parent.
Durable background sessions
Hour-scale workflows run as durable sessions that persist across process restarts.
Session contract
Every mode: "long_running" request materializes a Session:
{
"session_id": "sess_inv_22",
"tenant_id": "tenant_acme_prod",
"pack_pin": "ctxpack.finops@3.1.0",
"snapshot_pin": "kg_2026_05_03_T0930",
"intent": "finance.invoice.credit_adjust",
"started_at": "2026-05-04T10:14:00Z",
"heartbeat": { "interval_ms": 30000, "ttl_ms": 14400000, "last_seen": "2026-05-04T10:25:00Z" },
"status": "in_progress",
"checkpoints": [
{ "checkpoint_id": "ck_inv_22_after_s2", "after_step": "s2", "critic_verdict": "ok", "saved_at": "2026-05-04T10:14:42Z" },
{ "checkpoint_id": "ck_inv_22_after_s3", "after_step": "s3", "critic_verdict": "accept", "saved_at": "2026-05-04T10:25:11Z" }
],
"next_step": "s4",
"owner_actor": "agt_finops"
}Lifecycle states
| State | Meaning | Transitions |
|---|---|---|
in_progress | actively executing | → awaiting_gate, paused, completed, failed |
awaiting_gate | blocked on an approval gate | → in_progress (on approve), rejected (on deny), expired (gate TTL) |
paused | operator-initiated pause | → in_progress, cancelled |
completed | terminal success | (terminal) |
failed | terminal failure | (terminal) |
expired | gate or session TTL exceeded | (terminal) |
rejected / cancelled | terminal denials | (terminal) |
Spawn
- Triggered by
invokeAgentwithmode: "long_running"andruntime.max_session_duration_ms≥ 1 hour. - Conversation Manager validates the envelope, materializes the Run Context, persists the initial session row before any plane runs.
- Planner picks up the session and emits the first Plan against the pinned pack + snapshot.
Checkpoint
- A checkpoint is persisted after every Critic verdict (
ok/accept/replan). - Checkpoint payload includes:
plan_id, accumulatedtool_transcripts[]IDs, accumulatedevidence_refs[],RunBudgetaccumulators (atomic snapshot), Critic step scores, the next-step pointer. - Checkpoint storage is append-only and content-addressed; checkpoints are immutable.
Resume
- A resumable session is reconstituted by
session_idagainst the recordedpack_pin+snapshot_pin(refusing if either has advanced beyond the recorded version —pack_version_mismatch/snapshot_version_mismatch). - Resume restores the Run Context’s atomic budgets to their checkpointed values; remaining budget = original − used.
- Resume executes from
next_stepagainst the most recent checkpoint; never re-executes already-recorded tool calls (idempotency keys absorb any duplicate calls that escape). - Operator interruptions enqueue a resume; no work is lost.
Heartbeat and TTL
- The session heartbeat is renewed by the Critic on every verdict; loss of heartbeat for
2 × interval_msflips the session topaused. - Session TTL (
max_session_duration_ms) is hard; on expiry the runtime emitsSession expiredand persists a terminal checkpoint. - Approval gates have their own TTL; gate expiry sets the session to
expiredindependently of the session TTL.
Progress envelopes
Sessions emit periodic typed progress envelopes consumable by external orchestrators:
{
"session_id": "sess_inv_22",
"progress_id": "prog_inv_22_0014",
"emitted_at": "2026-05-04T10:25:11Z",
"current_step": "s4",
"completed_step_count": 3,
"total_step_count": 4,
"budget_remaining": { "tokens": 7280, "tool_calls": 8, "cost_cents": 24.0, "wall_clock_ms": 12830 },
"awaiting_gate": null
}Failure modes specific to background sessions
pack_version_mismatchon resume — pack advanced after suspend; resume refused; operator decides to migrate or terminate.snapshot_version_mismatchon resume — knowledge-graph snapshot advanced; resume refused; same operator decision.- Heartbeat loss without operator pause — runtime watchdog flips session to
paused; auto-resume permitted only if operator approves. - Checkpoint write fails — runtime aborts the Critic verdict and emits
failed; no partial commit. - Long-running session over budget — atomic budget exhaustion mid-step produces
escalate; operator may extend Run Budget on resume.
Replay
Any past run can be re-executed against:
- the recorded
CompiledContext, - the recorded
Plan, - the recorded tool transcripts,
- the policy, evaluator, and model-profile versions active during the run,
- the pinned Knowledge Graph, memory, and registry snapshots,
to re-derive the Critic verdict and the DecisionRecord without paying live tool cost. This is the substrate for evaluation, regression testing, and post-incident investigation.
Verification and guardrails
- Tool allow-lists — only permitted tools and arguments may appear in a plan.
- Evidence checks — decisions require resolved
evidence_refsbefore execution. - Approval gates — steps are blocked until approved, with frozen evidence snapshot.
- Conflict detection — plans that violate policy invariants are rejected before execution.
- Loop guard — repeated identical tool calls or no-progress reflection cycles short-circuit with a structured reason.
Autotune surfaces
Decision-plane autotune is higher risk than Context-plane tuning because it changes how work is sequenced. It is still useful when bounded to explicit orchestration surfaces and scored by replay.
| Surface | Candidate examples | Guardrail |
|---|---|---|
| Planner templates | change step ordering, add a verification step before an action, prefer a known Skill | Plan verification pass rate and utility must improve without adding risky tool calls. |
| Re-plan budgets | lower or raise max re-plan attempts for one intent class | Loop-guard trips and escalation quality cannot regress. |
| Critic rubrics | tighten evidence sufficiency, adjust retry vs. escalate thresholds | Policy and safety are floor constraints; replay must reproduce typed verdicts. |
| Subagent lanes | fan-out limit, lane timeout, lane eligibility by risk class | Parent effects remain immutable until the parent Critic accepts lane output. |
| Background sessions | checkpoint cadence, progress envelope frequency, resume policy | Resume must use the pinned pack and snapshot from session start. |
The optimizer cannot propose direct tool execution, approval bypasses, or untyped free-form plans. A candidate that changes the loop must emit a replayable Plan, step transcript expectations, and a DecisionRecord diff for review.
Execution model
- Idempotency keys prevent duplicate side effects on retries.
- Retry budgets are bounded per step and per workflow.
- Timeouts and fallbacks degrade to HITL when tools fail.
- Compensation steps are optional rollbacks for reversible tools.
- Parallel steps within a plan use the Executor’s lane abstraction.
Error handling and recovery
- Step failure isolates the failing step and returns control to the Planner under the re-plan budget.
- Partial execution records progress in the Run Context and prevents duplicate actions on resume.
- Human handoff routes to approval or operator with the full Plan, transcript, and verdict so far.
Implementation mapping
Orchestration is implemented primarily by:
- Orchestrator (planning, verification, execution control, subagent lanes)
- Decision Catalog (typed decision validation)
- Policy Engine (approval gates and constraints)
Implementation references
Interfaces
Inputs
CompiledContextfrom the Cognitive Core- Run Context (with budgets and approval-mode authority)
- Decision Spec from the Decision Catalog
- Tool registry with declared approval modes
Outputs
- Typed
Planand step transcripts DecisionRecordwith evidence_refs, approvals, controls_active- Memory write proposals
- Trace bundle
Failure modes
- Plan contains disallowed tools or missing approvals (caught by the Critic at verify).
- Tool retries cause duplicate side effects (mitigated by idempotency keys).
- Missing evidence for claims (rejected at the decision checkpoint).
- Subagent lane modifies parent effects (must remain isolated).
- Background session resumes against a different Context Pack version than it was suspended on.
- Loop guard trips silently without a structured reason.
Operational concerns
- Planner latency vs. template coverage tradeoff per intent.
- Re-plan budgets per workflow; default
2, raised only with rationale. - Verification cost and strictness scale with risk tier.
- Workflow timeouts and SLA enforcement at the Run Context boundary.
- Subagent lane fan-out limits to bound cost and latency.
- Background session checkpoint storage and TTL by tenant.
Evaluation metrics
- Plan-verification pass rate.
- Step completion rate.
- Retry and recovery rate.
- Escalation rate by risk tier.
- Subagent lane success rate vs. parent re-plan rate.
- Mean time to safe completion.
Example
A Critic verdict envelope condensed:
{
"run_id": "run_a1b2c3d4e5f60718",
"plan_id": "plan_refund_01",
"verdict": "accept",
"step_scores": [
{ "step_id": "s1", "scores": { "policy": 1.0, "utility": 1.0 } },
{ "step_id": "s2", "scores": { "policy": 1.0, "utility": 1.0 } },
{ "step_id": "s3", "scores": { "policy": 1.0, "utility": 0.95 }, "approval_gate": "GATE_FINANCE_APPROVAL", "approver": "user_finance_lead" }
],
"loop_guard_trips": 0,
"replan_attempts": 0
}Common misconceptions
- Orchestration is not a workflow engine alone. It includes verification and scoring gates that traditional engines lack.
- Guardrails are not optional. They are part of the runtime loop, not a release-gate add-on.
- Subagent lanes are not free parallelism. They consume budget from the parent Run Context’s accumulators.
- Replay is not optional. It is the property that makes evaluation honest.