Foundations
The operating model for ContextOS: five planes, cross-cutting primitives, and the contracts that make agent runs governable.
ContextOS foundations are the engineering boundaries around production agents. They define what the runtime knows, what it gives the model, how decisions are made, which actions are allowed, and how every run becomes auditable evidence.
The foundation rule
An agent run is acceptable only when it can answer five questions:
| Question | Foundation owner | Runtime artifact |
|---|---|---|
| What did the system know? | Intelligence plane | ontology version, CEIDs, graph evidence, promoted memory |
| What did the model actually see? | Context plane | ContextPack, CompiledContext, source manifests, budget report |
| Why did it choose that path? | Decision plane | plan, critic verdicts, decision spec binding |
| What external effects happened? | Action plane | ToolEnvelope, approval mode, idempotency key, tool result |
| Who approved and how can we replay it? | Trust plane | policy decision, evaluator result, trace id, DecisionRecord, replay handle |
If one answer is missing, the run is not yet production-grade. It may still be a useful prototype, but it is not a ContextOS-governed run.
The five planes
Intelligence
The substrate of meaning: canonical schema, identity, evidence, memory, and retrieval.
Context
The compiler that turns request state, evidence, policy, tools, and memory into bounded model input.
Decision
The bounded loop that plans, critiques, executes, consolidates, and emits typed decision state.
Action
The governed integration boundary for tools, MCP, A2A, OpenAPI, internal functions, and custom adapters.
Trust
The control plane over the other four: policy, approvals, evaluation, tracing, replay, and improvement.
The planes compose in one direction: Intelligence feeds Context, Context feeds Decision, Decision drives Action, and Trust wraps every boundary.
How improvement crosses the planes
The latest ContextOS improvement-loop rule is: the harness may be searched, but it must not silently mutate. Autotune, reviewer agents, and human operators can propose changes to any plane, but each proposal must name its target metric, replay set, guardrails, owner, and rollback target.
| Plane | What can improve | What must stay invariant |
|---|---|---|
| Intelligence | ontology additions, source-priority hints, graph retrieval constraints, memory promotion proposals | CEID stability, source provenance, data classification, snapshot pinning |
| Context | bucket budgets, retrieval top_k, source priority, compression, prompt fragments | required evidence coverage, redaction, policy manifest, tool manifest |
| Decision | planner templates, tool ordering, re-plan budgets, Critic scoring rubrics, subagent lane limits | DecisionSpec binding, approval gates, loop guards, replayable plan and verdict |
| Action | adapter retries, circuit breakers, cached read-only aliases, version routing for compatible adapters | schema validation, approval-mode maximum, credential exchange, idempotency keys |
| Trust | evaluator thresholds, sampling strategy, replay-set composition, rollout gates, proposal ranking | safety and policy floors, human approval for promotion, append-only audit |
Every improvement candidate is an artifact, not an edit in place. It enters the same lifecycle as packs, policies, tools, and evaluator suites: proposed -> reviewed -> approved -> released, with rejected and superseded recorded when the proposal does not survive review.
One run through the foundations
| Step | What happens | Contract produced | Primary docs |
|---|---|---|---|
| 1. Capture | The request is wrapped with tenant, actor, agent, session, budget, safety mode, and trace identity. | RunContext | Governance, Identity Layer |
| 2. Ground | The runtime resolves entities, retrieves evidence, and selects eligible memory. | CEIDs, graph evidence, memory candidates | Ontology, Knowledge Graph, Memory Model |
| 3. Compile | The context compiler selects, ranks, redacts, budgets, and assembles the model input. | CompiledContext | Cognitive Core, Agentic Context Engineering |
| 4. Decide | Planner, executor, and critic move inside bounded plan and verdict contracts. | plan, critic verdicts, decision binding | Orchestration, Decision Catalog |
| 5. Act | Every external effect goes through the Tool Gateway with policy, approval, identity, and idempotency. | ToolEnvelope | Adapter Mesh, Governance |
| 6. Record | The final answer, effects, evidence, approvals, controls, trace, and replay handle are persisted. | DecisionRecord | Evaluation and Observability, API Contracts |
| 7. Improve | Failures and corrections become proposals that must pass replay and approval gates before promotion. | scorecard, strategy proposal, pack version | Improvement Loop, Harness Engineering |
Cross-cutting primitives
These primitives are deliberately boring: they appear everywhere so every subsystem can be audited the same way.
| Primitive | What it carries | Why it matters |
|---|---|---|
RunContext | run_id, trace_id, session_id, tenant_id, user delegation, agent workload identity, safety mode, run budget | Establishes who is acting, under which authority, with which limits. |
ApprovalMode | read_only, local_write, network, delegated, destructive | Makes risk explicit before a tool can be planned or executed. |
ContextPack | Versioned, signed input contract with evidence, policy, tools, memory, and decision layers | Stops prompt stuffing from becoming an undocumented runtime dependency. |
CompiledContext | Compiled prompt, manifests, omitted context, runtime controls, and budget report | Shows exactly what the model saw and what the compiler excluded. |
ToolEnvelope | Tool request, tool result, policy decision, approval mode, audit metadata, idempotency, trace context | Turns side effects into governed operations. |
DecisionRecord | Outcome, evidence refs, approvals, controls active, policy decisions, confidence, trace id, replay handle | Makes a run comparable, searchable, reviewable, and replayable. |
The end-to-end contract
invokeAgent(request_envelope, RunContext)
-> Context plane: compile packs, evidence, tools, policy, memory
-> Decision plane: plan, verify, execute, score, consolidate
-> Action plane: route every effect through the Tool Gateway
-> Trust plane: enforce policy, approvals, evaluation, trace, replay
-> DecisionRecord(evidence_refs, approvals, controls_active, trace_id, replay_id)The Intelligence plane feeds the compile step. Consolidation writes memory proposals back through governed promotion, not direct durable writes.
What to read first
If you are evaluating the platform
- Invest Early - the business case and late-retrofit failure modes.
- Harness Engineering - the discipline behind the product.
- Reference Architecture - the full five-plane blueprint.
If you are building the runtime
- Cognitive Core - compiler plus bounded execution loop.
- Adapter Mesh - governed tool execution.
- Governance - policy, approval modes, audit contract.
If you are grounding an agent
- Ontology - canonical types and relationship rules.
- Identity Layer - CEIDs, SIDs, actor identity.
- Knowledge Graph and Memory Model - evidence plus promotion-aware recall.
If you own safety, audit, or release gates
- Evaluation and Observability - scorecards, traces, replay.
- Improvement Loop - governed change from failures and corrections.
- Security and Compliance - sandboxing, identity propagation, compliance map.
Foundation docs by plane
| Plane | Foundation docs | Use them when you need to decide |
|---|---|---|
| Intelligence | Ontology, Identity Layer, Knowledge Graph, Memory Model | What the system knows, how entities resolve, what evidence is trusted, and what can be remembered. |
| Context | Cognitive Core, Agentic Context Engineering | What should enter the model input, under which budget, provenance, and runtime controls. |
| Decision | Orchestration, Cognitive Core | How plans are proposed, verified, executed, scored, retried, or escalated. |
| Action | Adapter Mesh | Which external capabilities are discoverable, callable, idempotent, and approval-bound. |
| Trust | Governance, Evaluation and Observability, Improvement Loop, Harness Engineering, Security and Compliance | Which policies apply, how approvals work, what gets evaluated, how replay works, and how the harness improves. |
Adoption checklist
Use this before calling a workflow production-ready.
| Check | Minimum acceptable answer | Source |
|---|---|---|
| Entity model | The workflow’s core entities have ontology types, stable CEIDs, and relationship rules. | Ontology, Identity Layer |
| Context boundary | The workflow has a versioned ContextPack; the compiler emits manifests, omissions, controls, and budgets. | Agentic Context Engineering, Context Pack |
| Tool boundary | Every external capability is behind the Tool Gateway with schemas, approval mode, idempotency, and trace propagation. | Adapter Mesh |
| Decision boundary | Outputs bind to a decision spec and produce a typed DecisionRecord. | Orchestration, Decision Record, Decision Catalog |
| Policy boundary | Policy lives outside agent code; approval modes map to the canonical tier taxonomy. | Governance |
| Evidence boundary | The record contains evidence refs for material claims and tool results. | Knowledge Graph, Evaluation and Observability |
| Replay boundary | The run can be replayed against pinned context, policy, tools, and evaluator versions. | Evaluation and Observability, Improvement Loop |
| Improvement boundary | Failures and corrections become proposals, not silent prompt edits. | Harness Engineering, Improvement Loop |
| Autotune boundary | Any optimizer run declares target metric, guardrails, tunable surfaces, disjoint search/test sets, and rollback target before producing a proposal. | Improvement Loop, Evaluation and Observability |
Where the concrete contracts live
The foundations define the operating model. The implementation section defines the concrete contracts:
- API Contracts -
invokeAgent,ToolCallEnvelope,ToolResultEnvelope,DecisionRecord - Context Pack - pack schema, layers, lifecycle, caching
- Decision Catalog -
DecisionSpec,DecisionRecord, decision binding - Intent-Task Catalog - intent taxonomy, task templates, risk classification
- Memory Fabric - concrete memory storage, promotion, consent, contradiction handling
- Workflow Examples - neutral end-to-end runs through the canonical contract
- High-Risk Workflow - multi-approver, irreversible, cross-tenant workflow