Skip to content
Press / to search

Foundations

The operating model for ContextOS: five planes, cross-cutting primitives, and the contracts that make agent runs governable.

Living DocumentLast reviewed: Edit on GitHub
At a glance

ContextOS foundations are the engineering boundaries around production agents. They define what the runtime knows, what it gives the model, how decisions are made, which actions are allowed, and how every run becomes auditable evidence.

The foundation rule

An agent run is acceptable only when it can answer five questions:

QuestionFoundation ownerRuntime artifact
What did the system know?Intelligence planeontology version, CEIDs, graph evidence, promoted memory
What did the model actually see?Context planeContextPack, CompiledContext, source manifests, budget report
Why did it choose that path?Decision planeplan, critic verdicts, decision spec binding
What external effects happened?Action planeToolEnvelope, approval mode, idempotency key, tool result
Who approved and how can we replay it?Trust planepolicy decision, evaluator result, trace id, DecisionRecord, replay handle

If one answer is missing, the run is not yet production-grade. It may still be a useful prototype, but it is not a ContextOS-governed run.

The five planes

The planes compose in one direction: Intelligence feeds Context, Context feeds Decision, Decision drives Action, and Trust wraps every boundary.

How improvement crosses the planes

The latest ContextOS improvement-loop rule is: the harness may be searched, but it must not silently mutate. Autotune, reviewer agents, and human operators can propose changes to any plane, but each proposal must name its target metric, replay set, guardrails, owner, and rollback target.

PlaneWhat can improveWhat must stay invariant
Intelligenceontology additions, source-priority hints, graph retrieval constraints, memory promotion proposalsCEID stability, source provenance, data classification, snapshot pinning
Contextbucket budgets, retrieval top_k, source priority, compression, prompt fragmentsrequired evidence coverage, redaction, policy manifest, tool manifest
Decisionplanner templates, tool ordering, re-plan budgets, Critic scoring rubrics, subagent lane limitsDecisionSpec binding, approval gates, loop guards, replayable plan and verdict
Actionadapter retries, circuit breakers, cached read-only aliases, version routing for compatible adaptersschema validation, approval-mode maximum, credential exchange, idempotency keys
Trustevaluator thresholds, sampling strategy, replay-set composition, rollout gates, proposal rankingsafety and policy floors, human approval for promotion, append-only audit

Every improvement candidate is an artifact, not an edit in place. It enters the same lifecycle as packs, policies, tools, and evaluator suites: proposed -> reviewed -> approved -> released, with rejected and superseded recorded when the proposal does not survive review.

One run through the foundations

StepWhat happensContract producedPrimary docs
1. CaptureThe request is wrapped with tenant, actor, agent, session, budget, safety mode, and trace identity.RunContextGovernance, Identity Layer
2. GroundThe runtime resolves entities, retrieves evidence, and selects eligible memory.CEIDs, graph evidence, memory candidatesOntology, Knowledge Graph, Memory Model
3. CompileThe context compiler selects, ranks, redacts, budgets, and assembles the model input.CompiledContextCognitive Core, Agentic Context Engineering
4. DecidePlanner, executor, and critic move inside bounded plan and verdict contracts.plan, critic verdicts, decision bindingOrchestration, Decision Catalog
5. ActEvery external effect goes through the Tool Gateway with policy, approval, identity, and idempotency.ToolEnvelopeAdapter Mesh, Governance
6. RecordThe final answer, effects, evidence, approvals, controls, trace, and replay handle are persisted.DecisionRecordEvaluation and Observability, API Contracts
7. ImproveFailures and corrections become proposals that must pass replay and approval gates before promotion.scorecard, strategy proposal, pack versionImprovement Loop, Harness Engineering

Cross-cutting primitives

These primitives are deliberately boring: they appear everywhere so every subsystem can be audited the same way.

PrimitiveWhat it carriesWhy it matters
RunContextrun_id, trace_id, session_id, tenant_id, user delegation, agent workload identity, safety mode, run budgetEstablishes who is acting, under which authority, with which limits.
ApprovalModeread_only, local_write, network, delegated, destructiveMakes risk explicit before a tool can be planned or executed.
ContextPackVersioned, signed input contract with evidence, policy, tools, memory, and decision layersStops prompt stuffing from becoming an undocumented runtime dependency.
CompiledContextCompiled prompt, manifests, omitted context, runtime controls, and budget reportShows exactly what the model saw and what the compiler excluded.
ToolEnvelopeTool request, tool result, policy decision, approval mode, audit metadata, idempotency, trace contextTurns side effects into governed operations.
DecisionRecordOutcome, evidence refs, approvals, controls active, policy decisions, confidence, trace id, replay handleMakes a run comparable, searchable, reviewable, and replayable.

The end-to-end contract

invokeAgent(request_envelope, RunContext)
  -> Context plane: compile packs, evidence, tools, policy, memory
  -> Decision plane: plan, verify, execute, score, consolidate
  -> Action plane: route every effect through the Tool Gateway
  -> Trust plane: enforce policy, approvals, evaluation, trace, replay
  -> DecisionRecord(evidence_refs, approvals, controls_active, trace_id, replay_id)

The Intelligence plane feeds the compile step. Consolidation writes memory proposals back through governed promotion, not direct durable writes.

What to read first

If you are evaluating the platform

  1. Invest Early - the business case and late-retrofit failure modes.
  2. Harness Engineering - the discipline behind the product.
  3. Reference Architecture - the full five-plane blueprint.

If you are building the runtime

  1. Cognitive Core - compiler plus bounded execution loop.
  2. Adapter Mesh - governed tool execution.
  3. Governance - policy, approval modes, audit contract.

If you are grounding an agent

  1. Ontology - canonical types and relationship rules.
  2. Identity Layer - CEIDs, SIDs, actor identity.
  3. Knowledge Graph and Memory Model - evidence plus promotion-aware recall.

If you own safety, audit, or release gates

  1. Evaluation and Observability - scorecards, traces, replay.
  2. Improvement Loop - governed change from failures and corrections.
  3. Security and Compliance - sandboxing, identity propagation, compliance map.

Foundation docs by plane

PlaneFoundation docsUse them when you need to decide
IntelligenceOntology, Identity Layer, Knowledge Graph, Memory ModelWhat the system knows, how entities resolve, what evidence is trusted, and what can be remembered.
ContextCognitive Core, Agentic Context EngineeringWhat should enter the model input, under which budget, provenance, and runtime controls.
DecisionOrchestration, Cognitive CoreHow plans are proposed, verified, executed, scored, retried, or escalated.
ActionAdapter MeshWhich external capabilities are discoverable, callable, idempotent, and approval-bound.
TrustGovernance, Evaluation and Observability, Improvement Loop, Harness Engineering, Security and ComplianceWhich policies apply, how approvals work, what gets evaluated, how replay works, and how the harness improves.

Adoption checklist

Use this before calling a workflow production-ready.

CheckMinimum acceptable answerSource
Entity modelThe workflow’s core entities have ontology types, stable CEIDs, and relationship rules.Ontology, Identity Layer
Context boundaryThe workflow has a versioned ContextPack; the compiler emits manifests, omissions, controls, and budgets.Agentic Context Engineering, Context Pack
Tool boundaryEvery external capability is behind the Tool Gateway with schemas, approval mode, idempotency, and trace propagation.Adapter Mesh
Decision boundaryOutputs bind to a decision spec and produce a typed DecisionRecord.Orchestration, Decision Record, Decision Catalog
Policy boundaryPolicy lives outside agent code; approval modes map to the canonical tier taxonomy.Governance
Evidence boundaryThe record contains evidence refs for material claims and tool results.Knowledge Graph, Evaluation and Observability
Replay boundaryThe run can be replayed against pinned context, policy, tools, and evaluator versions.Evaluation and Observability, Improvement Loop
Improvement boundaryFailures and corrections become proposals, not silent prompt edits.Harness Engineering, Improvement Loop
Autotune boundaryAny optimizer run declares target metric, guardrails, tunable surfaces, disjoint search/test sets, and rollback target before producing a proposal.Improvement Loop, Evaluation and Observability

Where the concrete contracts live

The foundations define the operating model. The implementation section defines the concrete contracts: