Decision Record
The replayable audit receipt emitted by every governed ContextOS run: evidence, approvals, controls, policy decisions, lineage, trace, and scorecard.
Definition
A DecisionRecord is an append-only, schema-validated artifact emitted by the Decision plane whenever a governed decision reaches a checkpoint or terminal state.
RunContext + CompiledContext
-> Planner / Critic / Executor loop
-> ToolEnvelope + policy decisions + approvals + scorecard
-> DecisionRecord
evidence_refs
policy_decisions
approvals
controls_active
outputs
lineage
trace_id
record_hashThe record is the audit contract for the run. The trace explains where execution moved. The DecisionRecord explains what the system accepted as the governed outcome.
Why it exists
Production agent systems fail audit when they can only answer with prose: “the model said this because it saw that.” Prose is interpretation. Audit needs typed evidence.
| Without DecisionRecord | With DecisionRecord |
|---|---|
| Logs describe what might have happened | A typed record names the accepted outcome |
| Evidence is scattered across stores | evidence_refs[] point to the exact artifacts |
| Approvals are status fields | approvals[] include gate, approver, mode, and frozen evidence hash |
| Policy is prompt text | policy_decisions[] name rule ids and bundle versions |
| Replay is best-effort | Replay compares byte-stable record fields against pinned inputs |
| Improvement is anecdotal | Feedback, scorecards, and StrategyRules link to a durable decision |
Relationship to nearby contracts
| Contract | Role |
|---|---|
| Context Pack | Declares which context, tools, policies, decisions, memory, and evaluators can participate. |
| Decision Catalog | Registers DecisionSpec entries: allowed outcomes, required evidence, schemas, and approval mode. |
| API Contracts | Defines the compact runtime envelopes, including the canonical DecisionRecord JSON shape. |
| Evaluation and Observability | Emits traces, scorecards, and replay bundles that attach to the record. |
One DecisionSpec produces many DecisionRecords over time. One run may emit more than one DecisionRecord when it crosses multiple governed checkpoints, but it must emit one terminal record for the primary decision.
Anatomy
| Field group | Purpose | Examples |
|---|---|---|
| Identity | Stable record and decision keys | record_id, decision_key, decision_version, timestamp, status |
| Actor | Who or what made the decision | agent workload identity, delegated user, approver |
| Subject | What the decision is about | customer CEID, order CEID, account CEID |
| Inputs | Request/session references, not raw payload dumps | request_id, session_id, attachment hashes |
| Outputs | Schema-validated accepted outcome | refund amount, denial reason, escalation target |
| Evidence | Pointers to source facts | KG snapshot refs, tool results, policy evals, files |
| Policy | Deterministic rules applied | policy_decision_id, bundle id, rule ids, verdict |
| Approvals | Human or delegated gates | gate_id, approver, effective mode, evidence hash |
| Controls | Runtime controls active at decision time | redaction rules, must-refuse, must-escalate, approval gates |
| Tool lineage | Side-effect transcript refs | tool call ids, result ids, idempotency keys, reversal tokens |
| Scores | Evaluator outputs | Policy, Utility, Latency, Safety, Cost |
| Budget | Resource use | tokens, tool calls, wall-clock, cost |
| Lineage | Pinned runtime substrate | pack, policy, model profile, KG snapshot, evaluator suite |
| Trace | Correlation spine | W3C trace_id, span refs, replay id |
| Seal | Tamper evidence | prior hash, record hash, signer key id |
Minimum viable record
Your MVP record can be small, but it cannot be vague.
| Required | Why |
|---|---|
record_id and decision_key | Lets every decision be addressed and queried. |
decision_version | Binds the record to the DecisionSpec that allowed it. |
status | Makes terminal state machine outcomes explicit. |
actor | Separates model proposal, agent identity, and delegated human authority. |
subject_ids[] | Binds the record to stable business entities. |
outputs | Stores the accepted typed result, not a chat transcript. |
evidence_refs[] | Proves which facts supported the result. |
policy_decisions[] | Shows which deterministic policy rules fired. |
approvals[] | Records high-risk human/delegated gates. |
controls_active | Shows which runtime controls were in force. |
lineage | Pins pack, policy, graph, model, and evaluator versions. |
trace_id | Connects the record to spans and replay artifacts. |
Example
{
"record_id": "dr_2026_05_09_a17",
"decision_key": "support.refund.execute",
"decision_version": "1.0.0",
"timestamp": "2026-05-09T09:31:42Z",
"status": "DECIDED",
"actor": {
"type": "AGENT",
"id": "agt_support",
"workload_identity": "spiffe://contextos/agents/support",
"delegated_user_id": "usr_771"
},
"intent_ref": "support.refund",
"subject_ids": ["customer:cus_77", "order:ord_881"],
"inputs_refs": {
"request": "req_9f3a12",
"session": "sess_42f1",
"attachments": ["artifact:invoice_881#sha256:2f9cc0"]
},
"outputs": {
"outcome": "approved",
"refund_amount_inr": 4200,
"currency": "INR",
"transaction_id": "txn_q9"
},
"evidence_refs": [
"kg:order:ord_881#snapshot_kg_2026_05_09_T0930",
"tool:adp_orders.lookup:tc_117",
"tool:adp_policy.eval:tc_119",
"tool:adp_payments.issue_refund:tc_121"
],
"policy_decisions": [
{
"policy_decision_id": "pol_9900",
"bundle_id": "POLICY_RETURNS_V1",
"rule_ids": ["R_REFUND_REQUIRES_IDV"],
"verdict": "allow"
},
{
"policy_decision_id": "pol_9901",
"bundle_id": "POLICY_RETURNS_V1",
"rule_ids": ["R_HIGH_VALUE_REQUIRES_APPROVAL"],
"verdict": "require_approval"
}
],
"approvals": [
{
"gate_id": "GATE_FINANCE_APPROVAL",
"approver": "user_finance_lead_77",
"approval_mode_effective": "destructive",
"evidence_snapshot_hash": "sha256:b2a1",
"decided_at": "2026-05-09T09:31:30Z"
}
],
"controls_active": {
"must_refuse": [],
"must_escalate": ["fraud_signal_high"],
"approval_gates_active": ["GATE_FINANCE_APPROVAL"],
"redaction_rules_active": ["pan", "credit_card"]
},
"tool_lineage": [
{
"tool_call_id": "tc_121",
"tool_result_id": "tr_121",
"approval_mode_effective": "destructive",
"idempotency_key": "ik_2x9k4j7m1q8w0p3z",
"reversal_token": "rv_refund_txn_q9"
}
],
"scorecard": {
"policy": 1,
"utility": 0.94,
"latency": 0.86,
"safety": 1,
"cost": 0.91
},
"budget_usage": {
"tokens": 4720,
"tool_calls": 4,
"cost_usd_cents": 0.91,
"wall_clock_ms": 1840
},
"lineage": {
"pack_version": "ctxpack.support@1.0.0",
"policy_versions": ["POLICY_RETURNS_V1@1.0.0"],
"decision_spec": "support.refund.execute@1.0.0",
"kg_snapshot": "kg_2026_05_09_T0930",
"model_profile": "model_profile.support.safe@1.0.0",
"evaluator_suite": "eval.support.refund@1.0.0"
},
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"replay": {
"replay_packet_id": "rp_2026_05_09_a17",
"compiled_context_hash": "sha256:9f0b",
"tool_transcript_chain_hash": "sha256:7c4a"
},
"audit": {
"prev_hash": "sha256:6ad0",
"record_hash": "sha256:9ea1",
"signed_by": "kid_runtime_2026Q2"
}
}Lifecycle
- Select spec — the Orchestrator resolves
decision_keyfrom the Intent-Task Catalog and Decision Catalog. - Compile context — the Compiler emits manifests and runtime controls from the pinned Context Pack.
- Verify plan — the Critic checks required evidence, tool surface, policy obligations, and budget before execution.
- Execute tools — every side effect goes through the Tool Gateway and emits
toolCall/toolResultenvelopes. - Collect approvals — high-risk modes bind approver, gate, effective mode, and frozen evidence hash.
- Score result — evaluators produce Policy, Utility, Latency, Safety, and Cost scores.
- Emit record — the runtime writes the DecisionRecord with all refs, not copied raw data.
- Seal record — the store computes the canonical hash, links the prior hash, and signs the record.
- Replay — replay re-runs the canonical loop against pinned inputs and recorded transcripts, then compares the resulting record.
Status values
| Status | Meaning |
|---|---|
DECIDED | The runtime accepted an outcome from allowed_outcomes. |
DEFERRED | Required evidence or approval was missing; run can resume. |
REJECTED | Critic or policy refused the proposed outcome. |
ESCALATED | Runtime routed to a human or higher-trust workflow. |
IN_FLIGHT | Long-running decision checkpoint has not reached terminal state. |
CLOSED | A previously in-flight or deferred decision was finalized or retired. |
Validation rules
| Rule | Block if |
|---|---|
| Schema | outputs does not match the active DecisionSpec output schema. |
| Evidence | Required evidence refs are absent, stale, unresolvable, or wrong-tenant. |
| Policy | A governed action lacks a policy decision or uses an inactive policy version. |
| Approval | Effective approval mode requires a gate but no approval event is attached. |
| Controls | Redaction, refusal, escalation, or approval controls active at compile time are missing from the record. |
| Lineage | Pack, policy, graph snapshot, model profile, or evaluator suite is unpinned. |
| Trace | trace_id is absent or does not match the runtime trace bundle. |
| Seal | Canonical hash does not verify or the signing key was not valid at emit time. |
Replay contract
Replay is the difference between evidence and reconstruction. Given a trace_id, the harness must fetch:
| Replay input | Source |
|---|---|
| Request envelope and RunContext | API envelope store |
| Context Pack and overlays | Pack registry, pinned by version and hash |
| Policy bundles and DecisionSpec | Control-plane registry |
| Knowledge Graph snapshot | snapshot store named in lineage |
| Tool transcripts | Tool Gateway transcript store |
| Model profile and routing decision | AI Gateway / LLM Router |
| Evaluator suite | Evaluation Engine |
| Persisted DecisionRecord | Decision record store |
Replay does not re-execute side-effecting tools. It replays the canonical loop against recorded transcripts. A match returns replay_equal. A mismatch returns a typed diff: changed evidence, changed policy, changed tool transcript, changed compiled context, changed scorecard, or tamper detected.
Storage model
Store DecisionRecords append-only. Retractions and corrections create new records that supersede prior records; they do not edit history.
| Storage concern | Requirement |
|---|---|
| Partition key | tenant_id plus time window; never cross-tenant by default. |
| Query keys | decision_key, subject_ids[], status, approval_mode_effective, policy_decision_id, trace_id. |
| Hashing | Canonical JSON serialization excluding transport-only fields; include prev_hash. |
| Signing | Runtime key id with effective window; revoked keys remain queryable for historical replay. |
| Retention | Destructive, denied, escalated, failed-scorecard, and incident-linked records are retained regardless of sampling. |
| Privacy | Store refs and hashes by default; keep raw payloads in evidence stores with classification controls. |
Query patterns
-- High-value refunds with approver and evidence in a quarter.
SELECT
record_id,
trace_id,
outputs->>'refund_amount_inr' AS amount_inr,
approvals,
evidence_refs,
lineage
FROM decision_records
WHERE decision_key = 'support.refund.execute'
AND (outputs->>'refund_amount_inr')::numeric > 10000
AND timestamp >= '2026-01-01'
AND timestamp < '2026-04-01';-- Policy rules producing the most denials this week.
SELECT
jsonb_array_elements(policy_decisions)->>'policy_decision_id' AS policy_decision_id,
count(*) AS records
FROM decision_records
WHERE status = 'REJECTED'
AND timestamp >= now() - interval '7 days'
GROUP BY 1
ORDER BY records DESC;-- Replay health by pack version.
SELECT
lineage->>'pack_version' AS pack_version,
count(*) AS sampled_records,
avg(CASE WHEN replay->>'last_status' = 'replay_equal' THEN 1 ELSE 0 END) AS replay_equal_rate
FROM decision_records
WHERE timestamp >= now() - interval '30 days'
GROUP BY 1;Standards alignment
DecisionRecord is ContextOS-specific, but it should interoperate with operational standards:
| Standard | How it maps |
|---|---|
| W3C Trace Context | trace_id and span correlation use traceparent / tracestate semantics so services can join the same trace. |
| OpenTelemetry | Spans, metrics, logs, and events carry the same trace and record identifiers; telemetry is the observation layer, not the decision receipt. |
| NIST AI RMF Core | The record supports continuous governance, measurement, incident response, recovery, and documented improvement across the AI lifecycle. |
Readiness checklist
| Check | Production-ready answer |
|---|---|
| Can a human explain the decision from the record alone? | The record names outcome, evidence, policy, approvals, controls, and lineage. |
| Can replay reproduce it? | Pinned inputs and tool transcripts are enough to rebuild a byte-equivalent record. |
| Can audit query it? | Decision key, subjects, policy ids, approval mode, approver, and trace are indexed. |
| Can privacy controls hold? | Raw payloads stay in classified evidence stores; the record carries refs and hashes. |
| Can incidents route from it? | Denials, failed scorecards, escalations, and destructive actions retain full trace bundles. |
| Can improvement build on it? | Feedback entries, scorecards, and StrategyRules link back to record_id. |
Common misconceptions
- A DecisionRecord is not the trace. The trace is the path; the record is the accepted governed outcome.
- A DecisionRecord is not model chain-of-thought. Store typed rationale, evidence, and policy decisions. Do not store hidden reasoning.
- A DecisionRecord is not a warehouse row. It must be append-only, hashable, replayable, and bound to runtime lineage.
- A DecisionRecord is not optional for read-only decisions. Read-only decisions still need evidence and provenance when they affect a user, a downstream workflow, or future memory.
- A DecisionRecord is not authored after the incident. It is emitted by the runtime while the decision is made.