The Five Planes of Agentic Operating Systems

A friend at a healthcare company sent me a screenshot last month. Their support agent had quoted a refund window of 90 days to a customer. The actual policy is 14. The customer screenshotted the chat and posted it on X. The team spent two days reconstructing what happened.

The story they pieced together had nothing to do with the model. The wiki said 90. The internal policy KB said 14. The wiki had been migrated last quarter and nobody updated the agent’s retrieval source. The model behaved exactly as instructed; the instructions were wrong.

This is the kind of incident that does not show up in benchmark scores. It comes from a missing seam in the system — a place where one team’s responsibility was supposed to end and another’s begin, and the absence of that line let stale data travel into production.

The decomposition I find most useful for catching these seams uses five planes. None of them is novel; what helps is committing to all five, with named contracts at every boundary.

2026 update: the planes are operating boundaries

The five-plane model works only when each plane owns artifacts the others can inspect. Intelligence owns promoted facts and identity. Context owns compiled manifests. Decision owns the plan and verdict. Action owns tool envelopes. Trust owns policy decisions, approvals, evaluator gates, and audit controls.

That artifact discipline is what turns an architecture diagram into an operating model. If a production incident cannot be assigned to one plane and one contract, the planes are decorative. If it can, the fix usually becomes a pack change, policy rule, tool manifest update, reviewer rule, or StrategyRule instead of another prompt patch.

The five, and why two of them get folded together

The slowest-moving layer is Intelligence. It holds your ontology, your knowledge graph, the identity model, and the memory that survives sessions. New facts land here. New schema lands here. It changes under change-control.

The Decision plane is the bounded loop everyone has seen by now: a planner proposes, a critic verifies, an executor runs steps, the critic scores them, and the loop either commits or replans. The output of this plane, in the architecture I’d recommend, is not a chat reply — it is a typed DecisionRecord that names the verdict, the evidence it relied on, the policies it satisfied, and the approvals it collected.

Between those two there is a third plane that most teams fold into one of its neighbors, which is where the trouble starts. I call it the Context plane. It is the compiler that takes the slow substrate (Intelligence) and the per-request reality (intent, claims, budgets, the policy bundle in effect right now) and produces a typed CompiledContext for the Decision plane to consume. This is the bucket of work that includes “which retrieval source wins?”, “which policy version was in effect?”, “which tools should the model even see?”. Folded into Intelligence, it disappears into retrieval pipelines and you lose the per-request reasoning. Folded into Decision, it disappears into prompt templates and you lose versioning. Either way, the seam in my friend’s incident never gets named.

The fourth plane is Action: the only path through which the Decision plane can produce side effects in the world. The pattern that makes this work is a single Tool Gateway — there is exactly one place that mediates identity, validates schemas, exchanges credentials, enforces approval modes, and emits the trace. Anything that bypasses the Gateway is, by construction, ungoverned. ContextOS pushes adapters (MCP, OpenAPI, A2A peers, internal functions) behind this single boundary; see the Adapter Mesh.

The fifth, Trust, is not a peer of the others. It sits over them. Policy lives outside agent code; the runtime evaluates it deterministically at every plane boundary. The model is not the security boundary. This sounds obvious until you find a team whose “approval logic” is a sentence in the system prompt.

Why this is more than a Venn diagram

The reason this decomposition pays off is operational, not aesthetic. When you have five planes with named contracts, regressions localize.

A drop in Utility that follows a pack release isolates to Context. A spike in denied tool results isolates to Trust. A loop-guard breach with a normal pack and a stable model isolates to Decision. A new family of injection attacks succeeds against you only if Trust failed to surface its constraint into the Compiler — which is, again, a seam.

Without the seams, every regression looks like “the AI got worse.” Hard to fix that.

The shared primitives that travel across the planes are worth committing to memory:

RunContext — the state container with tenant_id, the user delegation, the agent workload identity, the intent, the budget, and the trace id. Every envelope carries it or a reference to it.
ContextPack → CompiledContext — the input and output of the Context plane.
DecisionRecord — the output of the Decision plane.
ApprovalMode — five canonical tiers (read_only, local_write, network, delegated, destructive) bound at wire-time on every adapter capability.
ToolEnvelope — toolCall and toolResult, the only shape side effects take.

If a primitive is named, it is a contract. If a contract is broken, the runtime owes an explanation in the Decision Record — not in a comment in the code.

Why the Context plane deserves its own name

Most architecture diagrams show retrieval as part of the data layer and prompt construction as part of the agent layer. That worked for chatbots. It does not work for agent systems where the same intent runs across a dozen tenants, with different policies in effect, against different evidence snapshots.

When the Context plane is its own plane, four things become possible.

The pack is versionable. ctxpack.support@5.2.0 is a real artifact you can sign, pin, and reference. The model your agent ran against last Tuesday is reconstructible because the pack version is recorded in lineage, not because someone kept a screenshot.

The pack is testable. Golden sets run against a pinned pack; a release candidate’s scorecard delta tells you whether the new version regressed before it ships. This is the loop my friend’s team did not have.

The pack is replayable. The Compiler is deterministic given the same inputs, which is the entire reason replay-as-IR works. (Replay is its own topic — see Replay Is the Real Audit Log — but the substrate for it lives here.)

The pack is enforceable. Tool surfaces, redaction rules, and approval gates get baked into the manifest at compile time. The model sees the surfaced tool set; it cannot see, and therefore cannot call, anything else. That is a structural defense against indirect prompt injection, and it lives in the Context plane.

Why Trust sits over, not next to

A common diagram puts Decision, Trust, and Observability on the same row, with arrows in both directions. It survives a whiteboard. It does not survive an audit.

In production, Trust appears at every boundary. The Compiler embeds Trust-plane controls into manifests (runtime_controls.must_refuse, redaction_rules_active, approval_gates_active). The Critic re-verifies obligations before any plan executes. The Tool Gateway re-evaluates policy at execute time, regardless of what the model proposed. The promotion gate on memory writes is itself a Trust-plane check. The audit is hash-chained, signed, and bound to the same trace_id carried by every other plane.

If you draw Trust as a peer, you eventually find someone in the Decision plane “interpreting” a policy. The right diagram has Trust as a regime — present at every arrow.

What each plane owns, in plain terms

Plane	Owns	Does NOT own
Intelligence	facts, embeddings, memory promotion, identity	the prompt; the policy
Context	the pack, the buckets, redaction-at-compile, tool surfacing	the verdict; long-term storage
Decision	the verdict, the plan, the loop guard, the budget	the side effect; the policy decision
Action	the call, the credential exchange, the trace, idempotency	what to call; whether the call should happen
Trust	what is allowed, by whom, with what evidence	how the model thinks

The “does not own” column is the more useful one. If your Decision plane is making policy decisions, that’s a seam violation. If your Action plane is choosing which tool to call, the seam between Decision and Action has eroded. Most production bugs in agent systems live at exactly these violations.

Plane health checklist

Plane	Health signal
Intelligence	Facts have owners, classifications, promotion state, contradiction status, and retirement policy.
Context	Packs compile deterministically with bounded buckets, source priority, and stage-level diagnostics.
Decision	Plans are verified, loop-guarded, scored, and emitted as typed Decision Records.
Action	Every effect passes through the Tool Gateway with schema validation, identity, idempotency, and trace context.
Trust	Policy, approvals, reviewer verdicts, redaction, replay, and scorecards are enforced outside model text.

The canonical loop, in one place

It helps to see the loop with the planes annotated:

invokeAgent(request_envelope, run_context)
  → compile(packs, request, run_context) → CompiledContext   // Context
  → loop {
       planner(CompiledContext)         → Plan                // Decision
       critic.verify(Plan)              → ok | replan | reject
       executor(Plan, ToolGateway)      → step_results        // Action
       critic.score(step_results)       → accept | retry | replan | escalate
       consolidate(effects, evidence)   → memory_proposals    // Intelligence
     }
  → DecisionRecord(evidence_refs, approvals, controls_active, trace_id)

Trust is not in the diagram because it is the diagram. Every arrow is a deterministic check it owns.

Back to the wiki

If my friend’s team had run this architecture, two things would have changed. The pack for the support intent would have declared a single source of truth for refund policy, with a priority rule resolving any conflict, and the wiki would not have been in scope. And the pack version they ran against would be recorded on the Decision Record for that conversation, so the recovery would have started with a trace_id and ended with a one-line git log on the policy bundle, instead of two days of forensics.

That is the kind of bug five planes is designed to prevent. Not the impressive ones. The boring ones. The ones that compound.

If your architecture diagram has fewer planes than this, you are probably folding two of them together. The seam will show up — usually as your hardest production bug.