Production agent runtime

A product architecture for agents you can operate, audit, and improve.

ContextOS turns agent behavior into harnessed engineering: compiled context, bounded decisions, governed tools, durable evidence, and replayable improvement loops. It keeps the core promise simple: every production agent run should explain what it knew, what it did, why it did it, who approved it, and how to make the next run better.

Read the architecture Start with a workflow

Abstract five-plane ContextOS product visual with coordinated layers around one controlled agent execution path. — Five-plane product model
Intelligence, Context, Decision, Action, and Trust stay separate, but one run carries the same typed primitives through every plane.
IntelligenceContextDecisionActionTrustRunContext
What each plane owns
01Intelligence
What the system can know and remember.
02Context
What this request is allowed to use.
03Decision
How the bounded loop turns context into work.
04Action
How tools and integrations are safely reached.
05Trust
How policy, evaluation, replay, and audit govern the rest.

Operator view

Customer escalation run

replayable

Compile

Assemble verified policy, customer, product, and incident context.

Decide

Plan remediation steps with critic checks before any external effect.

Act

Call approved tools through envelopes with evidence and trace context.

Record

Publish a decision record for audit, replay, and improvement.

DecisionRecord

run_idrun_74b2

approval_modenetwork

context_packsupport_escalation:v18

tool_policyallowlist + approval gate

decision_recorddec_9138

replay_statusready

controls active

budgetpolicyapprovaltraceeval

Follow one run

The platform is easiest to understand as a single audited execution.

A ContextOS run is not just a prompt call. It is a sequence of contracts that start before inference and continue after the answer ships.

RunContext

Capture the request

The runtime assigns run identity, tenant identity, session scope, actor delegation, budgets, safety mode, and trace context before the model sees anything.

request_envelope

Context plane

Compile usable context

Context packs are selected, ranked, budgeted, and compiled into a prompt plus manifests, provenance, controls, and context debt warnings.

CompiledContext

Decision plane

Plan inside bounds

Planner, executor, and critic lanes work inside the approved execution envelope instead of improvising hidden steps or invisible assumptions.

Plan + critique

Action plane

Gate every effect

Each tool call passes through capability discovery, approval mode, policy checks, idempotency keys, and tool-result evidence capture.

ToolEnvelope

Trust plane

Emit the record

The final decision ships with evidence refs, approvals, controls active, policy decisions, trace ids, and replay handles.

DecisionRecord

Harness loop

Replay and improve

Failed, slow, risky, and high-value runs become replay cases for prompt packs, policy rules, tool contracts, and evaluation suites.

Replay packet

Five planes

Separate concerns, then make the handoffs explicit.

Each plane owns a different failure surface. The product value comes from forcing clean boundaries between knowledge, context, decisions, actions, and trust.

Substrate of meaning

Intelligence plane

Ontology, identity layer, knowledge graph, GraphRAG, memory proposals, and review queues that make facts addressable and promotable.

Per-request compilation

Context plane

Context pack schemas, compiler stages, token-budget allocation, provenance, runtime controls, and context-debt reporting.

Bounded execution loop

Decision plane

Planner, executor, critic, subagent lanes, durable sessions, and decision catalogs that turn work into inspectable state.

Governed external effects

Action plane

Tool Gateway, MCP, A2A, OpenAPI and custom adapters, approval tiers, idempotency, and normalized tool evidence.

Control plane over the other four

Trust plane

Policy outside agent code, identity propagation, evaluators, trace propagation, replay, and continuous improvement loops.

Evidence model

Operators get records, not vibes.

The product surface is built around the evidence teams need when a run succeeds, fails, escalates, or causes an external effect.

RunContext

The canonical identity envelope for a run: actor, tenant, workload identity, trace, budget, safety mode, and delegated authority.

CompiledContext

The model-ready payload plus manifests, sources, ranked snippets, omitted context, and runtime controls used to create it.

ToolEnvelope

A typed record for every external effect: request, result, policy decision, approval mode, evidence, idempotency, and trace context.

DecisionRecord

The durable outcome: answer, actions, citations, approvals, policy decisions, controls active, confidence, trace id, and replay id.

Replay packet

The reproducible package that lets teams rerun a decision after changing prompts, packs, tools, policies, or evaluator thresholds.

Evaluation Harness

Scenario packs, assertions, grader outputs, regression gates, and scorecards that turn production misses into engineering work.

Cross-cutting primitives

The same primitives travel through every plane.

ContextOS is not a collection of disconnected features. The runtime keeps one execution vocabulary from request intake through replay.

RunContext

run_id, trace_id, session_id, tenant_id, user delegation, workload identity, safety mode, and run budget

ApprovalMode

read_only, local_write, network, delegated, and destructive modes bound to every capability and decision

Context Pack

Versioned, signed, immutable input contract with ten layers and clear owners

CompiledContext

Compiled prompt, source manifests, runtime controls, omitted context, and budget report

DecisionRecord

Typed outcome with evidence refs, approvals, active controls, policy decisions, trace id, and replay handle

ToolEnvelope

Tool call and tool result record with policy id, evidence refs, audit metadata, and W3C trace context

Adoption path

A team can start narrow and still end up with the full platform shape.

The best first implementation is not a giant agent rewrite. It is one valuable workflow wrapped in the contracts that make production behavior inspectable.

1. Instrument

Put RunContext, tool envelopes, and decision records around one high-value workflow.

Every run is traceable before the agent is trusted with more authority.

2. Compile

Move prompt stuffing into owned context packs with manifests, budgets, and provenance.

Teams can see exactly what the model saw and what was intentionally omitted.

3. Gate

Route actions through approval modes, policy checks, identity propagation, and idempotency.

External effects become reviewable operations instead of opaque model behavior.

4. Replay

Convert failures, escalations, and risky decisions into replay suites and scorecards.

Improvements are measured against the work the business actually cares about.

Canonical contract

Every invocation follows one execution contract.

Local agents, delegated agents, background sessions, and subagent lanes all pass through the same core shape.

Contract sketch

invokeAgent(request_envelope, run_context)
  -> compile(packs, request, run_context) -> CompiledContext
  -> loop {
       planner(CompiledContext)         -> Plan
       critic.verify(Plan)              -> ok | replan | reject
       executor(Plan, ToolGateway)      -> step_results, evidence
       critic.score(step_results)       -> accept | retry | replan | escalate
       consolidate(effects, evidence)   -> memory_proposals
     }
  -> DecisionRecord(evidence_refs, approvals, controls_active, trace_id, replay_id)

Read the reference architecture API contracts