A product architecture for agents you can operate, audit, and improve.
ContextOS turns agent behavior into harnessed engineering: compiled context, bounded decisions, governed tools, durable evidence, and replayable improvement loops. It keeps the core promise simple: every production agent run should explain what it knew, what it did, why it did it, who approved it, and how to make the next run better.

Intelligence, Context, Decision, Action, and Trust stay separate, but one run carries the same typed primitives through every plane.
- 01Intelligence
What the system can know and remember.
- 02Context
What this request is allowed to use.
- 03Decision
How the bounded loop turns context into work.
- 04Action
How tools and integrations are safely reached.
- 05Trust
How policy, evaluation, replay, and audit govern the rest.
Customer escalation run
Assemble verified policy, customer, product, and incident context.
Plan remediation steps with critic checks before any external effect.
Call approved tools through envelopes with evidence and trace context.
Publish a decision record for audit, replay, and improvement.
The platform is easiest to understand as a single audited execution.
A ContextOS run is not just a prompt call. It is a sequence of contracts that start before inference and continue after the answer ships.
Capture the request
The runtime assigns run identity, tenant identity, session scope, actor delegation, budgets, safety mode, and trace context before the model sees anything.
Compile usable context
Context packs are selected, ranked, budgeted, and compiled into a prompt plus manifests, provenance, controls, and context debt warnings.
Plan inside bounds
Planner, executor, and critic lanes work inside the approved execution envelope instead of improvising hidden steps or invisible assumptions.
Gate every effect
Each tool call passes through capability discovery, approval mode, policy checks, idempotency keys, and tool-result evidence capture.
Emit the record
The final decision ships with evidence refs, approvals, controls active, policy decisions, trace ids, and replay handles.
Replay and improve
Failed, slow, risky, and high-value runs become replay cases for prompt packs, policy rules, tool contracts, and evaluation suites.
Separate concerns, then make the handoffs explicit.
Each plane owns a different failure surface. The product value comes from forcing clean boundaries between knowledge, context, decisions, actions, and trust.
Intelligence plane
Ontology, identity layer, knowledge graph, GraphRAG, memory proposals, and review queues that make facts addressable and promotable.
Context plane
Context pack schemas, compiler stages, token-budget allocation, provenance, runtime controls, and context-debt reporting.
Decision plane
Planner, executor, critic, subagent lanes, durable sessions, and decision catalogs that turn work into inspectable state.
Action plane
Tool Gateway, MCP, A2A, OpenAPI and custom adapters, approval tiers, idempotency, and normalized tool evidence.
Trust plane
Policy outside agent code, identity propagation, evaluators, trace propagation, replay, and continuous improvement loops.
Operators get records, not vibes.
The product surface is built around the evidence teams need when a run succeeds, fails, escalates, or causes an external effect.
RunContext
The canonical identity envelope for a run: actor, tenant, workload identity, trace, budget, safety mode, and delegated authority.
CompiledContext
The model-ready payload plus manifests, sources, ranked snippets, omitted context, and runtime controls used to create it.
ToolEnvelope
A typed record for every external effect: request, result, policy decision, approval mode, evidence, idempotency, and trace context.
DecisionRecord
The durable outcome: answer, actions, citations, approvals, policy decisions, controls active, confidence, trace id, and replay id.
Replay packet
The reproducible package that lets teams rerun a decision after changing prompts, packs, tools, policies, or evaluator thresholds.
Evaluation Harness
Scenario packs, assertions, grader outputs, regression gates, and scorecards that turn production misses into engineering work.
The same primitives travel through every plane.
ContextOS is not a collection of disconnected features. The runtime keeps one execution vocabulary from request intake through replay.
run_id, trace_id, session_id, tenant_id, user delegation, workload identity, safety mode, and run budget
read_only, local_write, network, delegated, and destructive modes bound to every capability and decision
Versioned, signed, immutable input contract with ten layers and clear owners
Compiled prompt, source manifests, runtime controls, omitted context, and budget report
Typed outcome with evidence refs, approvals, active controls, policy decisions, trace id, and replay handle
Tool call and tool result record with policy id, evidence refs, audit metadata, and W3C trace context
A team can start narrow and still end up with the full platform shape.
The best first implementation is not a giant agent rewrite. It is one valuable workflow wrapped in the contracts that make production behavior inspectable.
Put RunContext, tool envelopes, and decision records around one high-value workflow.
Move prompt stuffing into owned context packs with manifests, budgets, and provenance.
Route actions through approval modes, policy checks, identity propagation, and idempotency.
Convert failures, escalations, and risky decisions into replay suites and scorecards.
Every invocation follows one execution contract.
Local agents, delegated agents, background sessions, and subagent lanes all pass through the same core shape.
invokeAgent(request_envelope, run_context)
-> compile(packs, request, run_context) -> CompiledContext
-> loop {
planner(CompiledContext) -> Plan
critic.verify(Plan) -> ok | replan | reject
executor(Plan, ToolGateway) -> step_results, evidence
critic.score(step_results) -> accept | retry | replan | escalate
consolidate(effects, evidence) -> memory_proposals
}
-> DecisionRecord(evidence_refs, approvals, controls_active, trace_id, replay_id)