Decision Record

The replayable audit receipt emitted by every governed ContextOS run: evidence, approvals, controls, policy decisions, lineage, trace, and scorecard.

Implementation GuideLast reviewed: 2026-05-09 Edit on GitHub

At a glance

Definition

A DecisionRecord is an append-only, schema-validated artifact emitted by the Decision plane whenever a governed decision reaches a checkpoint or terminal state.

RunContext + CompiledContext
  -> Planner / Critic / Executor loop
  -> ToolEnvelope + policy decisions + approvals + scorecard
  -> DecisionRecord
       evidence_refs
       policy_decisions
       approvals
       controls_active
       outputs
       lineage
       trace_id
       record_hash

The record is the audit contract for the run. The trace explains where execution moved. The DecisionRecord explains what the system accepted as the governed outcome.

Published validator: /schemas/decision-record.v1.schema.json. Replay inputs use /schemas/replay-packet.v1.schema.json.

Why it exists

Production agent systems fail audit when they can only answer with prose: “the model said this because it saw that.” Prose is interpretation. Audit needs typed evidence.

Without DecisionRecord	With DecisionRecord
Logs describe what might have happened	A typed record names the accepted outcome
Evidence is scattered across stores	`evidence_refs[]` point to the exact artifacts
Approvals are status fields	`approvals[]` include gate, approver, mode, and frozen evidence hash
Policy is prompt text	`policy_decisions[]` name rule ids and bundle versions
Replay is best-effort	Replay compares byte-stable record fields against pinned inputs
Improvement is anecdotal	Feedback, scorecards, and StrategyRules link to a durable decision

Relationship to nearby contracts

Contract	Role
Context Pack	Declares which context, tools, policies, decisions, memory, and evaluators can participate.
Decision Catalog	Registers `DecisionSpec` entries: allowed outcomes, required evidence, schemas, and approval mode.
API Contracts	Defines the compact runtime envelopes, including the canonical DecisionRecord JSON shape.
Evaluation and Observability	Emits traces, scorecards, and replay bundles that attach to the record.

One DecisionSpec produces many DecisionRecords over time. One run may emit more than one DecisionRecord when it crosses multiple governed checkpoints, but it must emit one terminal record for the primary decision.

Anatomy

Field group	Purpose	Examples
Identity	Stable record and decision keys	`record_id`, `decision_key`, `decision_version`, `timestamp`, `status`
Actor	Who or what made the decision	agent workload identity, delegated user, approver
Subject	What the decision is about	customer CEID, order CEID, account CEID
Inputs	Request/session references, not raw payload dumps	`request_id`, `session_id`, attachment hashes
Outputs	Schema-validated accepted outcome	refund amount, denial reason, escalation target
Evidence	Pointers to source facts	KG snapshot refs, tool results, policy evals, files
Policy	Deterministic rules applied	`policy_decision_id`, bundle id, rule ids, verdict
Approvals	Human or delegated gates	`gate_id`, approver, effective mode, evidence hash
Controls	Runtime controls active at decision time	redaction rules, must-refuse, must-escalate, approval gates
Tool lineage	Side-effect transcript refs	tool call ids, result ids, idempotency keys, reversal tokens
Scores	Evaluator outputs	Policy, Utility, Latency, Safety, Cost
Budget	Resource use	tokens, tool calls, wall-clock, cost
Lineage	Pinned runtime substrate	pack, policy, model profile, KG snapshot, evaluator suite
Trace	Correlation spine	W3C `trace_id`, span refs, replay id
Seal	Tamper evidence	prior hash, record hash, signer key id

Minimum viable record

Your MVP record can be small, but it cannot be vague.

Required	Why
`record_id` and `decision_key`	Lets every decision be addressed and queried.
`decision_version`	Binds the record to the DecisionSpec that allowed it.
`status`	Makes terminal state machine outcomes explicit.
`actor`	Separates model proposal, agent identity, and delegated human authority.
`subject_ids[]`	Binds the record to stable business entities.
`outputs`	Stores the accepted typed result, not a chat transcript.
`evidence_refs[]`	Proves which facts supported the result.
`policy_decisions[]`	Shows which deterministic policy rules fired.
`approvals[]`	Records high-risk human/delegated gates.
`controls_active`	Shows which runtime controls were in force.
`lineage`	Pins pack, policy, graph, model, and evaluator versions.
`trace_id`	Connects the record to spans and replay artifacts.

Example

{
  "record_id": "dr_2026_05_09_a17",
  "decision_key": "support.refund.execute",
  "decision_version": "1.0.0",
  "timestamp": "2026-05-09T09:31:42Z",
  "status": "DECIDED",
  "actor": {
    "type": "AGENT",
    "id": "agt_support",
    "workload_identity": "spiffe://contextos/agents/support",
    "delegated_user_id": "usr_771"
  },
  "intent_ref": "support.refund",
  "subject_ids": ["customer:cus_77", "order:ord_881"],
  "inputs_refs": {
    "request": "req_9f3a12",
    "session": "sess_42f1",
    "attachments": ["artifact:invoice_881#sha256:2f9cc0"]
  },
  "outputs": {
    "outcome": "approved",
    "refund_amount_inr": 4200,
    "currency": "INR",
    "transaction_id": "txn_q9"
  },
  "evidence_refs": [
    "kg:order:ord_881#snapshot_kg_2026_05_09_T0930",
    "tool:adp_orders.lookup:tc_117",
    "tool:adp_policy.eval:tc_119",
    "tool:adp_payments.issue_refund:tc_121"
  ],
  "policy_decisions": [
    {
      "policy_decision_id": "pol_9900",
      "bundle_id": "POLICY_RETURNS_V1",
      "rule_ids": ["R_REFUND_REQUIRES_IDV"],
      "verdict": "allow"
    },
    {
      "policy_decision_id": "pol_9901",
      "bundle_id": "POLICY_RETURNS_V1",
      "rule_ids": ["R_HIGH_VALUE_REQUIRES_APPROVAL"],
      "verdict": "require_approval"
    }
  ],
  "approvals": [
    {
      "gate_id": "GATE_FINANCE_APPROVAL",
      "approver": "user_finance_lead_77",
      "approval_mode_effective": "destructive",
      "evidence_snapshot_hash": "sha256:b2a1",
      "decided_at": "2026-05-09T09:31:30Z"
    }
  ],
  "controls_active": {
    "must_refuse": [],
    "must_escalate": ["fraud_signal_high"],
    "approval_gates_active": ["GATE_FINANCE_APPROVAL"],
    "redaction_rules_active": ["pan", "credit_card"]
  },
  "tool_lineage": [
    {
      "tool_call_id": "tc_121",
      "tool_result_id": "tr_121",
      "approval_mode_effective": "destructive",
      "idempotency_key": "ik_2x9k4j7m1q8w0p3z",
      "reversal_token": "rv_refund_txn_q9"
    }
  ],
  "scorecard": {
    "policy": 1,
    "utility": 0.94,
    "latency": 0.86,
    "safety": 1,
    "cost": 0.91
  },
  "budget_usage": {
    "tokens": 4720,
    "tool_calls": 4,
    "cost_usd_cents": 0.91,
    "wall_clock_ms": 1840
  },
  "lineage": {
    "pack_version": "ctxpack.support@1.0.0",
    "policy_versions": ["POLICY_RETURNS_V1@1.0.0"],
    "decision_spec": "support.refund.execute@1.0.0",
    "kg_snapshot": "kg_2026_05_09_T0930",
    "model_profile": "model_profile.support.safe@1.0.0",
    "evaluator_suite": "eval.support.refund@1.0.0"
  },
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "replay": {
    "replay_packet_id": "rp_2026_05_09_a17",
    "compiled_context_hash": "sha256:9f0b",
    "tool_transcript_chain_hash": "sha256:7c4a"
  },
  "audit": {
    "prev_hash": "sha256:6ad0",
    "record_hash": "sha256:9ea1",
    "signed_by": "kid_runtime_2026Q2"
  }
}

Lifecycle

Select spec — the Orchestrator resolves decision_key from the Intent-Task Catalog and Decision Catalog.
Compile context — the Compiler emits manifests and runtime controls from the pinned Context Pack.
Verify plan — the Critic checks required evidence, tool surface, policy obligations, and budget before execution.
Execute tools — every side effect goes through the Tool Gateway and emits ToolCallEnvelope / ToolResultEnvelope receipts.
Collect approvals — high-risk modes bind approver, gate, effective mode, and frozen evidence hash.
Score result — evaluators produce Policy, Utility, Latency, Safety, and Cost scores.
Emit record — the runtime writes the DecisionRecord with all refs, not copied raw data.
Seal record — the store computes the canonical hash, links the prior hash, and signs the record.
Replay — replay re-runs the canonical loop against pinned inputs and recorded transcripts, then compares the resulting record.

Status values

Status	Meaning
`DECIDED`	The runtime accepted an outcome from `allowed_outcomes`.
`DEFERRED`	Required evidence or approval was missing; run can resume.
`REJECTED`	Critic or policy refused the proposed outcome.
`ESCALATED`	Runtime routed to a human or higher-trust workflow.
`IN_FLIGHT`	Long-running decision checkpoint has not reached terminal state.
`CLOSED`	A previously in-flight or deferred decision was finalized or retired.

Validation rules

Rule	Block if
Schema	`outputs` does not match the active DecisionSpec output schema.
Evidence	Required evidence refs are absent, stale, unresolvable, or wrong-tenant.
Policy	A governed action lacks a policy decision or uses an inactive policy version.
Approval	Effective approval mode requires a gate but no approval event is attached.
Controls	Redaction, refusal, escalation, or approval controls active at compile time are missing from the record.
Lineage	Pack, policy, graph snapshot, model profile, or evaluator suite is unpinned.
Trace	`trace_id` is absent or does not match the runtime trace bundle.
Seal	Canonical hash does not verify or the signing key was not valid at emit time.

Replay contract

Replay is the difference between evidence and reconstruction. Given a trace_id, the harness must fetch:

Replay input	Source
Request envelope and RunContext	API envelope store
Context Pack and overlays	Pack registry, pinned by version and hash
Policy bundles and DecisionSpec	Control-plane registry
Knowledge Graph snapshot	snapshot store named in lineage
Tool transcripts	Tool Gateway transcript store
Model profile and routing decision	AI Gateway / LLM Router
Evaluator suite	Evaluation Engine
Persisted DecisionRecord	Decision record store

Replay does not re-execute side-effecting tools. It replays the canonical loop against recorded transcripts. A match returns replay_equal. A mismatch returns a typed diff: changed evidence, changed policy, changed tool transcript, changed compiled context, changed scorecard, or tamper detected.

Storage model

Store DecisionRecords append-only. Retractions and corrections create new records that supersede prior records; they do not edit history.

Storage concern	Requirement
Partition key	`tenant_id` plus time window; never cross-tenant by default.
Query keys	`decision_key`, `subject_ids[]`, `status`, `approval_mode_effective`, `policy_decision_id`, `trace_id`.
Hashing	Canonical JSON serialization excluding transport-only fields; include `prev_hash`.
Signing	Runtime key id with effective window; revoked keys remain queryable for historical replay.
Retention	Destructive, denied, escalated, failed-scorecard, and incident-linked records are retained regardless of sampling.
Privacy	Store refs and hashes by default; keep raw payloads in evidence stores with classification controls.

Query patterns

-- High-value refunds with approver and evidence in a quarter.
SELECT
  record_id,
  trace_id,
  outputs->>'refund_amount_inr' AS amount_inr,
  approvals,
  evidence_refs,
  lineage
FROM decision_records
WHERE decision_key = 'support.refund.execute'
  AND (outputs->>'refund_amount_inr')::numeric > 10000
  AND timestamp >= '2026-01-01'
  AND timestamp < '2026-04-01';

-- Policy rules producing the most denials this week.
SELECT
  jsonb_array_elements(policy_decisions)->>'policy_decision_id' AS policy_decision_id,
  count(*) AS records
FROM decision_records
WHERE status = 'REJECTED'
  AND timestamp >= now() - interval '7 days'
GROUP BY 1
ORDER BY records DESC;

-- Replay health by pack version.
SELECT
  lineage->>'pack_version' AS pack_version,
  count(*) AS sampled_records,
  avg(CASE WHEN replay->>'last_status' = 'replay_equal' THEN 1 ELSE 0 END) AS replay_equal_rate
FROM decision_records
WHERE timestamp >= now() - interval '30 days'
GROUP BY 1;

Standards alignment

DecisionRecord is ContextOS-specific, but it should interoperate with operational standards:

Standard	How it maps
W3C Trace Context	`trace_id` and span correlation use `traceparent` / `tracestate` semantics so services can join the same trace.
OpenTelemetry	Spans, metrics, logs, and events carry the same trace and record identifiers; telemetry is the observation layer, not the decision receipt.
NIST AI RMF Core	The record supports continuous governance, measurement, incident response, recovery, and documented improvement across the AI lifecycle.

Readiness checklist

Check	Production-ready answer
Can a human explain the decision from the record alone?	The record names outcome, evidence, policy, approvals, controls, and lineage.
Can replay reproduce it?	Pinned inputs and tool transcripts are enough to rebuild a byte-equivalent record.
Can audit query it?	Decision key, subjects, policy ids, approval mode, approver, and trace are indexed.
Can privacy controls hold?	Raw payloads stay in classified evidence stores; the record carries refs and hashes.
Can incidents route from it?	Denials, failed scorecards, escalations, and destructive actions retain full trace bundles.
Can improvement build on it?	Feedback entries, scorecards, and StrategyRules link back to `record_id`.

Common misconceptions

A DecisionRecord is not the trace. The trace is the path; the record is the accepted governed outcome.
A DecisionRecord is not model chain-of-thought. Store typed rationale, evidence, and policy decisions. Do not store hidden reasoning.
A DecisionRecord is not a warehouse row. It must be append-only, hashable, replayable, and bound to runtime lineage.
A DecisionRecord is not optional for read-only decisions. Read-only decisions still need evidence and provenance when they affect a user, a downstream workflow, or future memory.
A DecisionRecord is not authored after the incident. It is emitted by the runtime while the decision is made.