Reference Architecture
The canonical five-plane ContextOS architecture: runtime topology, contracts, plane boundaries, trust invariants, execution flow, standards alignment, and implementation checklist.
Executive summary
ContextOS is the governed decision runtime for production AI agents. It does not try to make a model deterministic. It makes the system around the model deterministic where production systems need determinism: context selection, model invocation, provider routing, tool exposure, policy enforcement, approval routing, trace propagation, evidence capture, decision records, replay, and improvement promotion.
The architecture decomposes the runtime into five planes:
| Plane | Responsibility | Output artifact |
|---|---|---|
| Intelligence | Shared meaning: ontology, identity, graph, memory, evidence | evidence refs, promoted memory, pinned snapshots |
| Context | Per-request compilation of bounded state | CompiledContext |
| Decision | Bounded Planner / Executor / Critic loop | Plan, verdicts, DecisionRecord |
| Action | Governed external effects through adapters | ToolEnvelope, tool transcripts |
| Trust | Policy, identity, approvals, evaluation, observability, replay | controls, approvals, scorecards, replay packets |
The model sits inside this architecture. It is not the architecture. Model calls cross the AI Gateway / LLM Router; external effects still cross the Tool Gateway.
The same plane boundaries also define the improvement surface. Autotune and reviewer agents may search over harness variants, but only within the fields each plane declares tunable:
| Plane | Tunable in architecture | Release invariant |
|---|---|---|
| Intelligence | source mappings, retrieval constraints, ontology additions, memory promotion candidates | pinned snapshots, provenance, CEID stability, classification rules |
| Context | retrieval settings, bucket budgets, compression, prompt fragments | complete manifests, required evidence, redaction, tool eligibility |
| Decision | planner templates, Critic rubrics, re-plan budgets, lane limits | typed plans, approval gates, loop guards, replayable verdicts |
| Action | adapter retries, circuit breakers, cached read-only aliases, compatible adapter routing | schemas, approval-mode maxima, credentials, idempotency |
| Trust | evaluator thresholds, sampling, replay sets, rollout gates | safety/policy floors, human approval, append-only audit |
This is why improvement is a control-plane concern, not a model-side feature. A candidate can be generated automatically; promotion is still a governed release.
What this architecture is for
This reference is written for platform teams building or evaluating a ContextOS runtime. It answers six questions:
| Question | Architecture answer |
|---|---|
| What owns business meaning? | Intelligence plane. Ontology, CEIDs, knowledge snapshots, and promoted memory. |
| What decides what the model sees? | Context plane. Context Pack Compiler, manifests, runtime controls, and budget report. |
| What decides what happens next? | Decision plane. Planner proposes, Critic verifies, Executor runs approved steps. |
| What calls model providers? | AI Gateway / LLM Router. Provider-neutral calls, governed route selection, fallback, and token/cost telemetry. |
| What touches external systems? | Action plane. Every effect crosses the Tool Gateway. |
| What makes this governable? | Trust plane. Policy, identity, approval gates, scorecards, traces, replay, and release gates. |
This page is not a tutorial. For a step-by-step MVP, start with Quickstart. For one request end-to-end, read How It Works.
Architectural thesis
Most failed agent architectures collapse four concerns into one prompt: context, reasoning, tools, and governance. That works for demos and fails in production because the organization cannot answer:
- Which facts did the agent rely on?
- Which policies were evaluated?
- Which tools were exposed, and why?
- Which human or system identity authorized the action?
- What exactly would happen if we replay the run?
- Which change caused a regression?
ContextOS separates those concerns into typed planes and turns every important boundary into a contract.
High-level architecture
TRUST PLANE
Policy Engine | Identity | Approvals | Evaluators | OTEL | Replay | Improvement
--------------------------------------------------------------------------------
ACTION PLANE
Tool Gateway | MCP | A2A | OpenAPI | custom adapters | idempotency
--------------------------------------------------------------------------------
DECISION PLANE
Planner | Executor | Critic | AI Gateway | LLM Router | sessions | Decision Catalog
--------------------------------------------------------------------------------
CONTEXT PLANE
Context Pack | Compiler | token budgets | manifests | runtime controls
--------------------------------------------------------------------------------
INTELLIGENCE PLANE
Ontology | Identity Layer | Knowledge Graph | GraphRAG | Memory FabricThe planes are stacked by dependency direction. Higher planes may constrain lower-plane behavior; lower planes must not bypass higher-plane controls. For example, an adapter cannot self-authorize a destructive action, and a Planner cannot introduce a tool that the Compiler did not surface.
The three ledgers
A production agent run should leave three ledgers. If any ledger is missing, the run is not auditable.
| Ledger | What it records | Primary owner |
|---|---|---|
| Context ledger | Pack version, graph snapshot, retrieved evidence, memory recall, policy and tool manifests, budget truncations | Context plane |
| Effect ledger | Tool calls, tool results, credentials used, approval gates, idempotency keys, side-effect status | Action + Trust planes |
| Decision ledger | DecisionSpec binding, outcome, evidence refs, policy decisions, approvals, scorecard, replay pointer | Decision + Trust planes |
The DecisionRecord indexes all three. It is the durable audit artifact, not a decorative log line.
Core invariants
These invariants are more important than any individual component implementation.
| Invariant | Production implication |
|---|---|
| Context is compiled, not hand-assembled. | Prompt text is an output of the Compiler, not the runtime contract. |
| Model calls cross the AI Gateway. | Provider choice, fallback, redaction, residency, token/cost telemetry, and route audit stay outside Planner and Critic code. |
| Tools are surfaced, not discovered ad hoc by the model. | The Tool Gateway only accepts tools present in the tool_manifest. |
| Policy is evaluated outside agent code. | The model may propose; the boundary decides. |
| Evidence precedes governed action. | required_evidence must resolve before network, delegated, or destructive effects. |
| Identity is dual. | Human delegation and agent workload identity travel together. |
| Budgets are enforced by the runtime. | Token, cost, wall-clock, tool-call, and replan limits are typed controls. |
| Memory is promoted before reuse. | Captured observations cannot silently become future context. |
| Replay is designed in. | Pack, request, policy, snapshot, model profile, route decisions, tool transcripts, and evaluator set are pinned. |
| Improvement is gated. | Corrections become proposals that pass replay and review before promotion. |
Cross-cutting contracts
RunContext
run_id
trace_id
session_id
tenant_id
user.delegation
agent.workload_identity
intent
locale
safety_mode
run_budget
RunBudget
total_tokens
bucket_tokens{business,policy,tool,evidence,memory,session}
max_tool_calls
max_replan_attempts
wall_clock_ms
max_cost_cents
atomic_usage{tokens,tool_calls,latency_ms,cost_cents}
ApprovalMode
read_only | local_write | network | delegated | destructive
ContextPack
contract_meta
pack_meta
intelligence_refs
business_context
policy_layer
tooling_layer
decision_layer
memory_layer
evaluation_layer
tone_and_comms
CompiledContext
compiled_prompt
manifests{policy,tool,evidence}
runtime_controls{must_refuse,must_escalate,approval_gates_active,redaction_rules_active}
budget_report
context_ledger
ToolEnvelope
ToolCallEnvelope{tool_call_id,run_id,capability_id,args,trace_id,idempotency_key,evidence_refs}
ToolResultEnvelope{tool_call_id,capability_id,status,output,error,citations,mutations,policy_decision_id,latency_ms}
DecisionRecord
record_id
decision_key
decision_version
status
actor
subject_ids
outputs
evidence_refs
policy_decisions
approvals
controls_active
budget_usage
trace_id
replay_idContracts move across planes. Components do not reach into each other’s private state.
Canonical execution contract
invokeAgent(request_envelope, run_context)
-> resolve pack refs, tenant, identity, intent, safety mode
-> compile(packs, request, run_context) -> CompiledContext
-> bind DecisionSpec
-> route model judgment through AI Gateway / LLM Router
-> loop {
planner(CompiledContext) -> Plan
critic.verify(Plan) -> ok | replan | reject | escalate
executor(Plan, ToolGateway) -> step_results + evidence_refs
critic.score(step_results) -> accept | retry | replan | escalate
consolidate(effects, evidence) -> memory_proposals
}
-> emit DecisionRecord
-> emit replay packet and scorecardEvery component is either a participant in this loop, a registry consulted by a participant, or an operator surface over the artifacts the loop emits.
Runtime topology
ContextOS separates authoring and control concerns from hot-path execution.
| Layer | Components | Writes | Reads |
|---|---|---|---|
| Control plane | Pack registry, policy bundle registry, decision catalog, adapter registry, evaluator registry, approval configuration | signed versions, rollout state, kill switches | runtime config, release gates |
| Runtime plane | Conversation manager, Compiler, Orchestrator, Planner, Executor, Critic, AI Gateway, LLM Router, Tool Gateway | run state, route decisions, tool transcripts, DecisionRecords | active pack refs, policies, model profiles, routing rules, tools, graph snapshots |
| Intelligence substrate | ontology service, entity resolver, graph store, retrieval service, memory fabric | snapshots, evidence refs, promoted memories | compile-time evidence and recall |
| Trust and ops plane | policy engine, approval queue, trace collector, scorecard service, replay harness, improvement queue | approvals, scorecards, incidents, proposals | traces, DecisionRecords, run artifacts |
Deployment rule
Do not deploy the Compiler, AI Gateway, Tool Gateway, and Policy Engine as optional libraries inside agent code. They are platform services or platform-controlled modules because they enforce the boundary. Agent code may call them; it must not replace them.
Tool Gateway names the Action-plane pattern. Tool Manager is the concrete component implementation.
Plane responsibilities
Intelligence plane
The Intelligence plane owns durable meaning. It turns enterprise data into stable identities, typed relationships, evidence refs, and memory that can be safely reused.
| Primitive | Contract | Source |
|---|---|---|
| Ontology | versioned entity and relationship schema | Ontology |
| Identity Layer | CEIDs for audit, SIDs for ML features, workload identity for agents | Identity Layer |
| Knowledge Graph | evidence-bound graph, snapshot pinning, GraphRAG retrieval | Knowledge Graph |
| Memory Fabric | capture -> candidate -> review -> promoted memory | Memory, Memory Fabric |
Owns:
- canonical entity identity,
- evidence provenance,
- knowledge snapshots,
- memory promotion state.
Does not own:
- deciding which facts enter a specific prompt,
- authorizing external actions,
- final decision outcomes.
Context plane
The Context plane owns per-request compilation. It converts a pinned Context Pack plus RunContext into a CompiledContext envelope.
| Primitive | Contract | Source |
|---|---|---|
| Context Pack | versioned declarative contract for a workflow | Context Pack |
| ContextPackCompiler | deterministic compile pipeline | Cognitive Core |
| Token Budget Allocator | budget allocation and truncation report | Cognitive Core |
| Runtime Controls | active refusals, escalations, approval gates, redaction rules | API Contracts |
Owns:
- policy/tool/evidence manifests,
- prompt assembly,
- context budget accounting,
- truncation visibility.
Does not own:
- live tool execution,
- approval decisions,
- memory promotion.
Decision plane
The Decision plane owns the bounded loop. It turns CompiledContext into a plan, verifies it, executes approved steps, scores the result, and emits a typed decision.
| Primitive | Contract | Source |
|---|---|---|
| Planner / Executor / Critic | plan, verify, execute, score | Orchestration |
| Subagent lanes | isolated sub-runs with independent budgets | Orchestration |
| Background sessions | resumable durable execution | Orchestration |
| AI Gateway / LLM Router | provider-neutral model invocation, route selection, fallback, route audit | AI Gateway and LLM Router |
| Decision Catalog | DecisionSpec registry and decision binding | Decision Catalog |
| Decision Record Store | replayable records, evidence refs, approvals, controls, lineage, trace ids | Decision Record |
| Intent-Task Catalog | intent taxonomy and task templates | Intent-Task Catalog |
Owns:
- plan structure,
- Critic verdicts,
- loop control,
- terminal DecisionRecord emission.
Does not own:
- direct API calls,
- policy truth,
- graph mutation.
Action plane
The Action plane owns external effects. It converts tool intents into validated, authorized, traced calls.
| Primitive | Contract | Source |
|---|---|---|
| Tool Gateway pattern | policy-bound tool execution boundary | Adapter Mesh |
| Tool Manager | concrete Tool Gateway implementation | Tool Manager |
| Adapter Registry | capabilities, schemas, auth mode, approval mode | Adapter Mesh |
| MCP / A2A / OpenAPI / custom adapters | protocol adapters behind one envelope | Adapter Mesh |
| Idempotency | write-class calls carry stable idempotency keys | Adapter Mesh |
| Approval gates | propose -> approve -> execute | Governance |
Owns:
- schema validation for tool args and results,
- credential exchange,
- tool transcript capture,
- side-effect idempotency.
Does not own:
- deciding that a risky action is allowed,
- inventing capabilities outside the registry,
- storing final business decisions.
Trust plane
The Trust plane owns control over the other four planes. It makes the runtime governable.
| Primitive | Contract | Source |
|---|---|---|
| Policy Engine | deterministic policy decisions outside model code | Governance |
| Approval modes | five-tier action-risk taxonomy | Governance |
| Evaluators | policy, utility, latency, safety, economics | Evaluation and Observability |
| Trace propagation | W3C trace context and OTEL spans | Evaluation and Observability |
| Replay Harness | re-derive verdicts from pinned artifacts | Evaluation and Observability |
| Improvement Loop | insights, strategy proposals, feedback, autotune | Improvement Loop |
Owns:
- policy decisions,
- approval state,
- scorecards and release gates,
- trace and replay requirements,
- promotion of improvement proposals.
Does not own:
- arbitrary business logic hidden in prompts,
- unreviewed automatic self-modification.
Plane dependency rules
| Rule | Rationale |
|---|---|
| Context may read Intelligence, but may not mutate it during compilation. | Compile stays deterministic and replayable. |
Decision may read Context, but cannot add tools not present in tool_manifest. | Planning remains bounded by compiled state. |
| Decision may call model providers only through AI Gateway / LLM Router. | Provider drift, fallback, cost, residency, and route audit stay governed. |
| Action may execute tools only through Tool Gateway. | External effects stay governed and traced. |
| Trust may constrain all planes. | Policy, identity, approvals, and evaluation are cross-cutting controls. |
| Intelligence writes happen through promotion workflows. | Memory and graph state cannot be poisoned by a single run. |
Reference flow: support refund
The refund workflow is the reference example because it exercises all production boundaries:
- User request enters with a
RunContext. - Context Pack
ctxpack.support@x.y.zand graph snapshot are pinned. - Compiler emits
CompiledContextwith policy, tool, and evidence manifests. - Planner proposes lookup, policy eval, and refund steps.
- Critic verifies tool availability, args, approval mode, and required evidence.
- Tool Gateway executes read tools and freezes evidence for the risky write.
- Approval gate authorizes or denies the
destructiverefund. - Executor calls the payment adapter with idempotency key and trace context.
- Critic scores the completed run.
- Runtime emits
DecisionRecord, replay packet, and memory proposals.
For the complete transcript, see Workflow Examples and How It Works.
Trust architecture
Tenant boundary
storage, graph, memory, pack registry, traces, and tool credentials are tenant scoped
Identity boundary
user delegation and agent workload identity are both present on every governed call
Policy boundary
policy decisions are evaluated before compile exposure and before tool execution
Approval boundary
network, delegated, and destructive actions can freeze evidence and wait for an approver
Audit boundary
policy decisions, approvals, tool transcripts, scorecards, and traces bind to one trace_id
Replay boundary
request, pack, policy, graph snapshot, model profile, route decision, tool transcripts, evaluator set, and model config are pinnedSee Security and Compliance for the detailed control map.
Failure semantics
Failures must be typed. Silent fallback is a production bug.
| Failure | Boundary that catches it | Required outcome |
|---|---|---|
| Unknown intent | Intent / Risk Classifier | reject or operator clarification |
| Missing required evidence | Critic verify | replan or escalate |
| Tool not in manifest | Critic verify / Tool Gateway | reject |
| Tool arg schema mismatch | Tool Gateway | protocol error and no side effect |
| Approval timeout | Approval Queue | escalate or denied verdict |
| Policy denial | Policy Engine | refuse or escalate |
| Budget exhaustion | RunBudget guard | terminal budget verdict |
| No eligible model route | AI Gateway / LLM Router | fail closed or escalate |
| Unsafe tool output | Critic score / output validation | retry, replan, or fail closed |
| Replay mismatch | Replay Harness | block promotion and open incident |
Observability and AgentOps
Every production run should be observable at four levels.
| Level | Required signals |
|---|---|
| Trace | W3C traceparent, span hierarchy, plane and component names, parent-child tool spans |
| Logs | structured run events, policy decisions, approval lifecycle, errors, redactions |
| Metrics | latency, model token/cost use, route decisions, tool calls, approval wait time, evaluator scores, replay pass rate |
| Artifacts | CompiledContext, Plan, RoutingDecision, tool transcripts, DecisionRecord, replay packet, scorecard |
Recommended span attributes:
contextos.run_id
contextos.session_id
contextos.tenant_id
contextos.intent
contextos.context_pack_ref
contextos.policy_bundle_ids
contextos.approval_mode_required
contextos.approval_mode_effective
contextos.decision_key
contextos.decision_record_id
contextos.replay_idTail sampling should force retention for runs that cross approval gates, fail evaluator thresholds, produce incidents, or affect durable business state.
Standards alignment
ContextOS uses existing standards where they fit and adds only the agent-runtime contracts those standards do not define.
| Concern | External standard or guidance | ContextOS use |
|---|---|---|
| Distributed trace identity | W3C Trace Context | trace_id, parent spans, tool spans, replay correlation |
| Telemetry model | OpenTelemetry and semantic conventions | spans, metrics, logs, resource and attribute conventions |
| Workload identity | SPIFFE | agent workload identity format and trust-domain separation |
| Delegation and token exchange | OAuth 2.0 Token Exchange, RFC 8693 | user delegation, actor/subject distinction, scoped credentials |
| HTTP API description | OpenAPI Specification | adapter schemas, operation metadata, security schemes |
| AI tool protocol | Model Context Protocol | one adapter class behind the Tool Gateway, with ContextOS adding policy, approval, and audit envelopes |
| GenAI risk taxonomy | OWASP GenAI Security Project | prompt injection, excessive agency, insecure plugin/tool design, data exposure, output handling |
| AI risk governance | NIST AI RMF Core | govern, map, measure, manage reflected through policy, evaluator, release, and improvement loops |
Alignment does not mean delegation. MCP or OpenAPI can describe a tool. ContextOS still decides whether that tool is exposed, whether it can execute, which identity it uses, which evidence it must cite, and how the action is replayed.
Control-plane lifecycle
| Lifecycle step | Required artifact | Gate |
|---|---|---|
| Author | Context Pack, policy bundle, DecisionSpec, adapter capability, model profile, routing rule | schema lint |
| Review | architecture, security, data, evaluation review | reviewer verdicts |
| Publish | signed immutable version | registry signature |
| Roll out | tenant and environment pin | release gate |
| Execute | run artifacts and traces | runtime guards |
| Evaluate | scorecard and replay packet | evaluator thresholds |
| Improve | proposal from feedback, incident, or autotune | replay and review before promotion |
| Roll back | prior pack/policy/model/tool version pin | replay determinism check |
Multi-tenant isolation
Tenant isolation is not only a database filter. It applies to every artifact:
| Artifact | Isolation requirement |
|---|---|
| Context Pack | tenant or environment pin; signed publisher; immutable version |
| Graph snapshot | tenant-scoped snapshot ref; no cross-tenant traversal without explicit policy |
| Memory | tenant, subject, consent, classification, and retention gates |
| Model profile / RoutingDecision | tenant policy, residency, capability, and retention gates |
| Tool credential | tenant-scoped credential exchange with short-lived tokens |
| Trace | tenant-scoped trace access; redaction before export |
| DecisionRecord | subject IDs and evidence refs must not leak across tenant boundaries |
| Replay packet | pinned to tenant-owned or explicitly shared artifacts |
Reference contracts
| Contract | Source of truth |
|---|---|
invokeAgent, ToolCallEnvelope, ToolResultEnvelope, DecisionRecord | API Contracts |
ContextPack schema | Context Pack |
DecisionSpec | Decision Catalog |
DecisionRecord | Decision Record |
| model invocation and routing decisions | AI Gateway and LLM Router |
Intent and TaskTemplate | Intent-Task Catalog |
| memory write proposals and review queue | Memory Fabric |
| component-level reference pages | Component Inventory |
Implementation checklist
For a new tenant or workflow, do not call the runtime production-ready until every row is true.
| Area | Check |
|---|---|
| Ontology | entity types, relationship types, CEID format, and evidence refs are declared |
| Context Pack | pack is signed, versioned, immutable, and pinned by environment |
| Policy | bundle is outside agent code and evaluated before risky action |
| Models | model profiles and routing rules are signed, pinned, residency-aware, and replay-gated |
| Tools | every capability has schema, auth mode, approval mode, and idempotency behavior |
| Decision | every governed action binds to a DecisionSpec with required evidence |
| Identity | user delegation and agent workload identity are both present on tool calls |
| Budget | token, cost, tool-call, wall-clock, and replan limits are enforced |
| Memory | capture, candidate, review, promotion, consent, and contradiction checks exist |
| Observability | traces, logs, metrics, scorecards, and artifacts are joined by trace ID |
| Replay | request, pack, policy, snapshot, tools, evaluator set, and model config are pinned |
| Release | golden replay and evaluator thresholds gate promotion |
| Rollback | prior versions can be re-pinned without schema migration drama |
Anti-patterns
| Anti-pattern | Why it fails |
|---|---|
| Direct provider calls from planners or evaluators | bypasses routing policy, residency checks, fallback controls, token/cost telemetry, and route audit |
| Direct adapter calls from the model | bypasses policy, identity, approval, schema validation, and trace capture |
| Hand-built prompts as the source of truth | hides context selection, truncation, and runtime controls |
| Free-form final answers for governed actions | loses DecisionSpec binding, evidence refs, and replay |
| Tool descriptions trusted as policy | tool metadata can describe behavior; it cannot authorize behavior |
| Memory writes on every run | turns temporary observations and injected content into future context |
| One global agent identity | destroys attribution between user delegation and agent workload identity |
| Evaluation only on model quality | misses policy, safety, cost, latency, evidence, and tool-risk regressions |
| Prompt edits after incidents | creates unreviewed behavior drift instead of replayable proposals |
| Cross-tenant shared traces by default | leaks business context and evidence refs |
Roadmap notes
- Plane primitives are stable contracts; individual services may evolve.
- New patterns should be validated against working systems before promotion into this reference.
- Major changes follow the same change-control process as Context Packs, policy bundles, DecisionSpecs, and evaluator sets.
Appendix A: Component inventory
| Component | Plane | Owner doc |
|---|---|---|
| Conversation Manager | Decision | components/conversation-manager |
| Intent / Risk Classifier | Decision | components/intent-risk-classifier |
| Intent-Task Catalog | Decision | implementation/intent-task-catalog |
| Context Pack Compiler | Context | components/context-pack-compiler |
| Policy Engine | Trust | components/policy-engine |
| Orchestrator | Decision | components/orchestrator |
| AI Gateway / LLM Router | Decision | reference/ai-gateway-llm-router |
| Tool Gateway pattern | Action | Adapter Mesh |
| Tool Manager | Action | components/tool-manager |
| Decision Catalog | Decision | implementation/decision-catalog |
| Knowledge Substrate | Intelligence | foundations/knowledge-graph |
| Memory Fabric | Intelligence | implementation/memory-fabric |
| Identity Layer | Intelligence + Trust | foundations/identity-layer |
| Evaluation Engine | Trust | components/evaluation-engine |
| Observability | Trust | components/observability |
| Admin Console | Trust | components/admin-console |
Appendix B: Naming conventions
- Planes:
Intelligence,Context,Decision,Action,Trust. - Primitives: PascalCase (
RunContext,ContextPack,CompiledContext,DecisionRecord,ToolEnvelope). - Enum values: snake_case (
read_only,local_write,network,delegated,destructive). - Identifiers:
<scope>:<type>:<id>when human-readable (order:ord_881,customer:cus_77). - Trace attributes:
contextos.<plane>.<attribute>when plane-specific;contextos.run_idandcontextos.decision_record_idwhen global. - Version refs:
<artifact_id>@<semver>for Context Packs, policy bundles, evaluator sets, and DecisionSpecs.