Skip to content
Production agent runtime

A product architecture for agents you can operate, audit, and improve.

ContextOS turns agent behavior into harnessed engineering: compiled context, bounded decisions, governed tools, durable evidence, and replayable improvement loops. It keeps the core promise simple: every production agent run should explain what it knew, what it did, why it did it, who approved it, and how to make the next run better.

Abstract five-plane ContextOS product visual with coordinated layers around one controlled agent execution path.
Five-plane product model

Intelligence, Context, Decision, Action, and Trust stay separate, but one run carries the same typed primitives through every plane.

IntelligenceContextDecisionActionTrustRunContext
What each plane owns
  1. 01Intelligence

    What the system can know and remember.

  2. 02Context

    What this request is allowed to use.

  3. 03Decision

    How the bounded loop turns context into work.

  4. 04Action

    How tools and integrations are safely reached.

  5. 05Trust

    How policy, evaluation, replay, and audit govern the rest.

Operator view

Customer escalation run

replayable
01
Compile

Assemble verified policy, customer, product, and incident context.

02
Decide

Plan remediation steps with critic checks before any external effect.

03
Act

Call approved tools through envelopes with evidence and trace context.

04
Record

Publish a decision record for audit, replay, and improvement.

DecisionRecord
run_idrun_74b2
approval_modenetwork
context_packsupport_escalation:v18
tool_policyallowlist + approval gate
decision_recorddec_9138
replay_statusready
controls active
budgetpolicyapprovaltraceeval
Follow one run

The platform is easiest to understand as a single audited execution.

A ContextOS run is not just a prompt call. It is a sequence of contracts that start before inference and continue after the answer ships.

RunContext

Capture the request

The runtime assigns run identity, tenant identity, session scope, actor delegation, budgets, safety mode, and trace context before the model sees anything.

request_envelope
Context plane

Compile usable context

Context packs are selected, ranked, budgeted, and compiled into a prompt plus manifests, provenance, controls, and context debt warnings.

CompiledContext
Decision plane

Plan inside bounds

Planner, executor, and critic lanes work inside the approved execution envelope instead of improvising hidden steps or invisible assumptions.

Plan + critique
Action plane

Gate every effect

Each tool call passes through capability discovery, approval mode, policy checks, idempotency keys, and tool-result evidence capture.

ToolEnvelope
Trust plane

Emit the record

The final decision ships with evidence refs, approvals, controls active, policy decisions, trace ids, and replay handles.

DecisionRecord
Harness loop

Replay and improve

Failed, slow, risky, and high-value runs become replay cases for prompt packs, policy rules, tool contracts, and evaluation suites.

Replay packet
Five planes

Separate concerns, then make the handoffs explicit.

Each plane owns a different failure surface. The product value comes from forcing clean boundaries between knowledge, context, decisions, actions, and trust.

TRUST PLANEPolicy outside agent code · evaluators · OTEL traces · replayIntelligenceSUBSTRATE OF MEANINGOntology + IdentityKnowledge GraphPromotion-aware memoryContextPER-REQUEST COMPILEContext PackCompiler pipelineToken-budget allocatorDecisionBOUNDED LOOPPlanner / Executor / CriticSubagent lanesDecision CatalogActionGOVERNED EFFECTSTool GatewayMCP / A2A / OpenAPIApproval-mode tiersCross-cutting:RunContextApprovalModeContextPackDecisionRecordToolEnvelopeRequest envelope enters at Intelligence · DecisionRecord exits at Action · every step is signed, traced, and replayable.
Evidence model

Operators get records, not vibes.

The product surface is built around the evidence teams need when a run succeeds, fails, escalates, or causes an external effect.

RunContext

The canonical identity envelope for a run: actor, tenant, workload identity, trace, budget, safety mode, and delegated authority.

CompiledContext

The model-ready payload plus manifests, sources, ranked snippets, omitted context, and runtime controls used to create it.

ToolEnvelope

A typed record for every external effect: request, result, policy decision, approval mode, evidence, idempotency, and trace context.

DecisionRecord

The durable outcome: answer, actions, citations, approvals, policy decisions, controls active, confidence, trace id, and replay id.

Replay packet

The reproducible package that lets teams rerun a decision after changing prompts, packs, tools, policies, or evaluator thresholds.

Evaluation Harness

Scenario packs, assertions, grader outputs, regression gates, and scorecards that turn production misses into engineering work.

Cross-cutting primitives

The same primitives travel through every plane.

ContextOS is not a collection of disconnected features. The runtime keeps one execution vocabulary from request intake through replay.

IntelligenceContextDecisionActionTrustRunContextrun_id · trace_id · session · tenant · user · agent · safety_mode · run_budgetApprovalModeread_only · local_write · network · delegated · destructive — bound to every capabilityContext Packversioned · signed · immutable · ten layers · compiled per requestDecisionRecordevidence_refs · approvals · controls_active · policy_decisions · trace_idToolEnvelopetoolCall / toolResult · policy_decision_id · evidence_refs · audit · W3C trace context
RunContext

run_id, trace_id, session_id, tenant_id, user delegation, workload identity, safety mode, and run budget

ApprovalMode

read_only, local_write, network, delegated, and destructive modes bound to every capability and decision

Context Pack

Versioned, signed, immutable input contract with ten layers and clear owners

CompiledContext

Compiled prompt, source manifests, runtime controls, omitted context, and budget report

DecisionRecord

Typed outcome with evidence refs, approvals, active controls, policy decisions, trace id, and replay handle

ToolEnvelope

Tool call and tool result record with policy id, evidence refs, audit metadata, and W3C trace context

Adoption path

A team can start narrow and still end up with the full platform shape.

The best first implementation is not a giant agent rewrite. It is one valuable workflow wrapped in the contracts that make production behavior inspectable.

1. Instrument

Put RunContext, tool envelopes, and decision records around one high-value workflow.

Every run is traceable before the agent is trusted with more authority.
2. Compile

Move prompt stuffing into owned context packs with manifests, budgets, and provenance.

Teams can see exactly what the model saw and what was intentionally omitted.
3. Gate

Route actions through approval modes, policy checks, identity propagation, and idempotency.

External effects become reviewable operations instead of opaque model behavior.
4. Replay

Convert failures, escalations, and risky decisions into replay suites and scorecards.

Improvements are measured against the work the business actually cares about.
Canonical contract

Every invocation follows one execution contract.

Local agents, delegated agents, background sessions, and subagent lanes all pass through the same core shape.

invokeAgentrequest_enveloperun_contextcompile()packs · request · run_context→ CompiledContextBounded loop · Planner / Executor / CriticPlan→ PlanVerifyok | replan | rejectExecuteToolGatewayScoreaccept | retry | replanConsolidatememory proposalsDecisionRecordevidence ·trace_idTrust plane spans every step — policy decisions, approvals, OTEL spans, replay attestations
Contract sketch
invokeAgent(request_envelope, run_context)
  -> compile(packs, request, run_context) -> CompiledContext
  -> loop {
       planner(CompiledContext)         -> Plan
       critic.verify(Plan)              -> ok | replan | reject
       executor(Plan, ToolGateway)      -> step_results, evidence
       critic.score(step_results)       -> accept | retry | replan | escalate
       consolidate(effects, evidence)   -> memory_proposals
     }
  -> DecisionRecord(evidence_refs, approvals, controls_active, trace_id, replay_id)