Skip to content
Press / to search

Decision Record

The replayable audit receipt emitted by every governed ContextOS run: evidence, approvals, controls, policy decisions, lineage, trace, and scorecard.

Implementation GuideLast reviewed: Edit on GitHub
At a glance

Definition

A DecisionRecord is an append-only, schema-validated artifact emitted by the Decision plane whenever a governed decision reaches a checkpoint or terminal state.

RunContext + CompiledContext
  -> Planner / Critic / Executor loop
  -> ToolEnvelope + policy decisions + approvals + scorecard
  -> DecisionRecord
       evidence_refs
       policy_decisions
       approvals
       controls_active
       outputs
       lineage
       trace_id
       record_hash

The record is the audit contract for the run. The trace explains where execution moved. The DecisionRecord explains what the system accepted as the governed outcome.

Why it exists

Production agent systems fail audit when they can only answer with prose: “the model said this because it saw that.” Prose is interpretation. Audit needs typed evidence.

Without DecisionRecordWith DecisionRecord
Logs describe what might have happenedA typed record names the accepted outcome
Evidence is scattered across storesevidence_refs[] point to the exact artifacts
Approvals are status fieldsapprovals[] include gate, approver, mode, and frozen evidence hash
Policy is prompt textpolicy_decisions[] name rule ids and bundle versions
Replay is best-effortReplay compares byte-stable record fields against pinned inputs
Improvement is anecdotalFeedback, scorecards, and StrategyRules link to a durable decision

Relationship to nearby contracts

ContractRole
Context PackDeclares which context, tools, policies, decisions, memory, and evaluators can participate.
Decision CatalogRegisters DecisionSpec entries: allowed outcomes, required evidence, schemas, and approval mode.
API ContractsDefines the compact runtime envelopes, including the canonical DecisionRecord JSON shape.
Evaluation and ObservabilityEmits traces, scorecards, and replay bundles that attach to the record.

One DecisionSpec produces many DecisionRecords over time. One run may emit more than one DecisionRecord when it crosses multiple governed checkpoints, but it must emit one terminal record for the primary decision.

Anatomy

Field groupPurposeExamples
IdentityStable record and decision keysrecord_id, decision_key, decision_version, timestamp, status
ActorWho or what made the decisionagent workload identity, delegated user, approver
SubjectWhat the decision is aboutcustomer CEID, order CEID, account CEID
InputsRequest/session references, not raw payload dumpsrequest_id, session_id, attachment hashes
OutputsSchema-validated accepted outcomerefund amount, denial reason, escalation target
EvidencePointers to source factsKG snapshot refs, tool results, policy evals, files
PolicyDeterministic rules appliedpolicy_decision_id, bundle id, rule ids, verdict
ApprovalsHuman or delegated gatesgate_id, approver, effective mode, evidence hash
ControlsRuntime controls active at decision timeredaction rules, must-refuse, must-escalate, approval gates
Tool lineageSide-effect transcript refstool call ids, result ids, idempotency keys, reversal tokens
ScoresEvaluator outputsPolicy, Utility, Latency, Safety, Cost
BudgetResource usetokens, tool calls, wall-clock, cost
LineagePinned runtime substratepack, policy, model profile, KG snapshot, evaluator suite
TraceCorrelation spineW3C trace_id, span refs, replay id
SealTamper evidenceprior hash, record hash, signer key id

Minimum viable record

Your MVP record can be small, but it cannot be vague.

RequiredWhy
record_id and decision_keyLets every decision be addressed and queried.
decision_versionBinds the record to the DecisionSpec that allowed it.
statusMakes terminal state machine outcomes explicit.
actorSeparates model proposal, agent identity, and delegated human authority.
subject_ids[]Binds the record to stable business entities.
outputsStores the accepted typed result, not a chat transcript.
evidence_refs[]Proves which facts supported the result.
policy_decisions[]Shows which deterministic policy rules fired.
approvals[]Records high-risk human/delegated gates.
controls_activeShows which runtime controls were in force.
lineagePins pack, policy, graph, model, and evaluator versions.
trace_idConnects the record to spans and replay artifacts.

Example

{
  "record_id": "dr_2026_05_09_a17",
  "decision_key": "support.refund.execute",
  "decision_version": "1.0.0",
  "timestamp": "2026-05-09T09:31:42Z",
  "status": "DECIDED",
  "actor": {
    "type": "AGENT",
    "id": "agt_support",
    "workload_identity": "spiffe://contextos/agents/support",
    "delegated_user_id": "usr_771"
  },
  "intent_ref": "support.refund",
  "subject_ids": ["customer:cus_77", "order:ord_881"],
  "inputs_refs": {
    "request": "req_9f3a12",
    "session": "sess_42f1",
    "attachments": ["artifact:invoice_881#sha256:2f9cc0"]
  },
  "outputs": {
    "outcome": "approved",
    "refund_amount_inr": 4200,
    "currency": "INR",
    "transaction_id": "txn_q9"
  },
  "evidence_refs": [
    "kg:order:ord_881#snapshot_kg_2026_05_09_T0930",
    "tool:adp_orders.lookup:tc_117",
    "tool:adp_policy.eval:tc_119",
    "tool:adp_payments.issue_refund:tc_121"
  ],
  "policy_decisions": [
    {
      "policy_decision_id": "pol_9900",
      "bundle_id": "POLICY_RETURNS_V1",
      "rule_ids": ["R_REFUND_REQUIRES_IDV"],
      "verdict": "allow"
    },
    {
      "policy_decision_id": "pol_9901",
      "bundle_id": "POLICY_RETURNS_V1",
      "rule_ids": ["R_HIGH_VALUE_REQUIRES_APPROVAL"],
      "verdict": "require_approval"
    }
  ],
  "approvals": [
    {
      "gate_id": "GATE_FINANCE_APPROVAL",
      "approver": "user_finance_lead_77",
      "approval_mode_effective": "destructive",
      "evidence_snapshot_hash": "sha256:b2a1",
      "decided_at": "2026-05-09T09:31:30Z"
    }
  ],
  "controls_active": {
    "must_refuse": [],
    "must_escalate": ["fraud_signal_high"],
    "approval_gates_active": ["GATE_FINANCE_APPROVAL"],
    "redaction_rules_active": ["pan", "credit_card"]
  },
  "tool_lineage": [
    {
      "tool_call_id": "tc_121",
      "tool_result_id": "tr_121",
      "approval_mode_effective": "destructive",
      "idempotency_key": "ik_2x9k4j7m1q8w0p3z",
      "reversal_token": "rv_refund_txn_q9"
    }
  ],
  "scorecard": {
    "policy": 1,
    "utility": 0.94,
    "latency": 0.86,
    "safety": 1,
    "cost": 0.91
  },
  "budget_usage": {
    "tokens": 4720,
    "tool_calls": 4,
    "cost_usd_cents": 0.91,
    "wall_clock_ms": 1840
  },
  "lineage": {
    "pack_version": "ctxpack.support@1.0.0",
    "policy_versions": ["POLICY_RETURNS_V1@1.0.0"],
    "decision_spec": "support.refund.execute@1.0.0",
    "kg_snapshot": "kg_2026_05_09_T0930",
    "model_profile": "model_profile.support.safe@1.0.0",
    "evaluator_suite": "eval.support.refund@1.0.0"
  },
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "replay": {
    "replay_packet_id": "rp_2026_05_09_a17",
    "compiled_context_hash": "sha256:9f0b",
    "tool_transcript_chain_hash": "sha256:7c4a"
  },
  "audit": {
    "prev_hash": "sha256:6ad0",
    "record_hash": "sha256:9ea1",
    "signed_by": "kid_runtime_2026Q2"
  }
}

Lifecycle

  1. Select spec — the Orchestrator resolves decision_key from the Intent-Task Catalog and Decision Catalog.
  2. Compile context — the Compiler emits manifests and runtime controls from the pinned Context Pack.
  3. Verify plan — the Critic checks required evidence, tool surface, policy obligations, and budget before execution.
  4. Execute tools — every side effect goes through the Tool Gateway and emits toolCall / toolResult envelopes.
  5. Collect approvals — high-risk modes bind approver, gate, effective mode, and frozen evidence hash.
  6. Score result — evaluators produce Policy, Utility, Latency, Safety, and Cost scores.
  7. Emit record — the runtime writes the DecisionRecord with all refs, not copied raw data.
  8. Seal record — the store computes the canonical hash, links the prior hash, and signs the record.
  9. Replay — replay re-runs the canonical loop against pinned inputs and recorded transcripts, then compares the resulting record.

Status values

StatusMeaning
DECIDEDThe runtime accepted an outcome from allowed_outcomes.
DEFERREDRequired evidence or approval was missing; run can resume.
REJECTEDCritic or policy refused the proposed outcome.
ESCALATEDRuntime routed to a human or higher-trust workflow.
IN_FLIGHTLong-running decision checkpoint has not reached terminal state.
CLOSEDA previously in-flight or deferred decision was finalized or retired.

Validation rules

RuleBlock if
Schemaoutputs does not match the active DecisionSpec output schema.
EvidenceRequired evidence refs are absent, stale, unresolvable, or wrong-tenant.
PolicyA governed action lacks a policy decision or uses an inactive policy version.
ApprovalEffective approval mode requires a gate but no approval event is attached.
ControlsRedaction, refusal, escalation, or approval controls active at compile time are missing from the record.
LineagePack, policy, graph snapshot, model profile, or evaluator suite is unpinned.
Tracetrace_id is absent or does not match the runtime trace bundle.
SealCanonical hash does not verify or the signing key was not valid at emit time.

Replay contract

Replay is the difference between evidence and reconstruction. Given a trace_id, the harness must fetch:

Replay inputSource
Request envelope and RunContextAPI envelope store
Context Pack and overlaysPack registry, pinned by version and hash
Policy bundles and DecisionSpecControl-plane registry
Knowledge Graph snapshotsnapshot store named in lineage
Tool transcriptsTool Gateway transcript store
Model profile and routing decisionAI Gateway / LLM Router
Evaluator suiteEvaluation Engine
Persisted DecisionRecordDecision record store

Replay does not re-execute side-effecting tools. It replays the canonical loop against recorded transcripts. A match returns replay_equal. A mismatch returns a typed diff: changed evidence, changed policy, changed tool transcript, changed compiled context, changed scorecard, or tamper detected.

Storage model

Store DecisionRecords append-only. Retractions and corrections create new records that supersede prior records; they do not edit history.

Storage concernRequirement
Partition keytenant_id plus time window; never cross-tenant by default.
Query keysdecision_key, subject_ids[], status, approval_mode_effective, policy_decision_id, trace_id.
HashingCanonical JSON serialization excluding transport-only fields; include prev_hash.
SigningRuntime key id with effective window; revoked keys remain queryable for historical replay.
RetentionDestructive, denied, escalated, failed-scorecard, and incident-linked records are retained regardless of sampling.
PrivacyStore refs and hashes by default; keep raw payloads in evidence stores with classification controls.

Query patterns

-- High-value refunds with approver and evidence in a quarter.
SELECT
  record_id,
  trace_id,
  outputs->>'refund_amount_inr' AS amount_inr,
  approvals,
  evidence_refs,
  lineage
FROM decision_records
WHERE decision_key = 'support.refund.execute'
  AND (outputs->>'refund_amount_inr')::numeric > 10000
  AND timestamp >= '2026-01-01'
  AND timestamp < '2026-04-01';
-- Policy rules producing the most denials this week.
SELECT
  jsonb_array_elements(policy_decisions)->>'policy_decision_id' AS policy_decision_id,
  count(*) AS records
FROM decision_records
WHERE status = 'REJECTED'
  AND timestamp >= now() - interval '7 days'
GROUP BY 1
ORDER BY records DESC;
-- Replay health by pack version.
SELECT
  lineage->>'pack_version' AS pack_version,
  count(*) AS sampled_records,
  avg(CASE WHEN replay->>'last_status' = 'replay_equal' THEN 1 ELSE 0 END) AS replay_equal_rate
FROM decision_records
WHERE timestamp >= now() - interval '30 days'
GROUP BY 1;

Standards alignment

DecisionRecord is ContextOS-specific, but it should interoperate with operational standards:

StandardHow it maps
W3C Trace Contexttrace_id and span correlation use traceparent / tracestate semantics so services can join the same trace.
OpenTelemetrySpans, metrics, logs, and events carry the same trace and record identifiers; telemetry is the observation layer, not the decision receipt.
NIST AI RMF CoreThe record supports continuous governance, measurement, incident response, recovery, and documented improvement across the AI lifecycle.

Readiness checklist

CheckProduction-ready answer
Can a human explain the decision from the record alone?The record names outcome, evidence, policy, approvals, controls, and lineage.
Can replay reproduce it?Pinned inputs and tool transcripts are enough to rebuild a byte-equivalent record.
Can audit query it?Decision key, subjects, policy ids, approval mode, approver, and trace are indexed.
Can privacy controls hold?Raw payloads stay in classified evidence stores; the record carries refs and hashes.
Can incidents route from it?Denials, failed scorecards, escalations, and destructive actions retain full trace bundles.
Can improvement build on it?Feedback entries, scorecards, and StrategyRules link back to record_id.

Common misconceptions

  • A DecisionRecord is not the trace. The trace is the path; the record is the accepted governed outcome.
  • A DecisionRecord is not model chain-of-thought. Store typed rationale, evidence, and policy decisions. Do not store hidden reasoning.
  • A DecisionRecord is not a warehouse row. It must be append-only, hashable, replayable, and bound to runtime lineage.
  • A DecisionRecord is not optional for read-only decisions. Read-only decisions still need evidence and provenance when they affect a user, a downstream workflow, or future memory.
  • A DecisionRecord is not authored after the incident. It is emitted by the runtime while the decision is made.
Analytics consent

We use Google Analytics to understand site usage. You can opt in or decline.