Context Graphs: Decision Lineage as a System of Record

The most useful sentence I have ever heard from a finance lead was: “We have a system of record for what happened. We do not have one for why.”

Her team could tell me, with high confidence, the discounts that had been applied to every renewal in the last three years. They could not tell me, with any confidence, why a specific 20% discount on an enterprise account in Q1 had been considered acceptable. The CRM had the discount. Slack had a thread that probably referenced it. A VP’s email approved something, but it was not clear it was the same thing. The reasoning had happened across people and systems; the artifact stored only the outcome.

This is the gap that a context graph fills. Not better dashboards on the system of record for what. A separate, structured, queryable system of record for why.

2026 update: the graph is a projection, not the source

The strongest version of this architecture treats the Decision Record as the source of truth and the context graph as a projection. That distinction keeps audit honest. Graph edges can be re-indexed, enriched, and optimized for query; the record remains the signed, replayable contract.

In practice, that means every graph edge should point back to the record, tool transcript, policy decision, approval event, or evidence snapshot that created it. If a node cannot answer “which trace_id wrote me?”, it is analysis data, not decision lineage.

Rules are not the problem

Most enterprises codify rules — “discounts above 10% require finance approval.” The real business runs on how rules were applied in specific cases: “we used policy v3.2, granted a service-impact exception, based on a precedent set last quarter for a similar account.” The rules describe a space; the decisions describe what actually happened in that space.

What agent systems need access to is the second one. Prior exceptions and their justifications. Who approved what, under which policy version. The cross-system evidence that made an exception reasonable. The precedents that govern reality, not just the written policy. None of that lives in the system of record. Most of it does not live in any system at all.

The reason I think this is now solvable is that agent systems sit in the execution path. Unlike a warehouse, which sees an after-the-fact ETL of state, an agent runtime sees the full context at commit time: the inputs, the policy evaluation, the exceptions invoked, the approvals collected, the writes performed. The information is right there. It just needs to be persisted as a contract.

The Decision Record as a contract

ContextOS treats every consequential action as a structured decision event, recorded to a typed DecisionRecord. The record carries the inputs (evidence references), the policy bundles and rules evaluated, any conflicts and how they resolved, approvals (human or automated), the actions and their side effects, write-commit metadata, and outcome hooks for later feedback.

A condensed example, for a renewal-discount decision:

{
  "record_id": "dr_2026_04_18_a17",
  "decision_key": "revops.renewal.execute",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "subject_ids": ["account:acme-co", "renewal:acme-2026q1"],
  "evidence_refs": [
    "kg:incident:sev1#snapshot_kg_2026_04_17",
    "tool:itsm.lookup:tc_201",
    "tool:crm.lookup:tc_203"
  ],
  "policy_decisions": [
    { "policy_decision_id": "pol_5510", "rule_ids": ["R_RENEWAL_DISCOUNT_CAP"] },
    { "policy_decision_id": "pol_5511", "rule_ids": ["R_SERVICE_IMPACT_EXCEPTION"] }
  ],
  "approvals": [
    {
      "gate_id": "GATE_FINANCE_APPROVAL",
      "approver": "user_finance_lead_44",
      "approval_mode_effective": "destructive",
      "evidence_snapshot_hash": "sha256:b2a1...",
      "decided_at": "2026-04-18T12:03:10Z"
    }
  ],
  "outputs": { "discount": 0.20, "currency": "USD" },
  "lineage": { "pack_version": "ctxpack.revops@4.1.0", "snapshot_version": "kg_2026_04_17" }
}

What is not in this record is prose. There is no “the model decided X because Y.” There is a verdict, the evidence, the policies, the approvals, and a hash chain that lets the harness reproduce the verdict from the pinned snapshot. Those are facts. Prose is interpretation.

Records are append-only and hash-chained per trace_id. Each entry’s hash includes the previous entry’s hash; tampering breaks the chain. Replay against a tampered chain returns tamper_detected rather than a recomputed verdict. (The replay machinery itself is its own topic — see Replay Is the Real Audit Log.)

From records to a queryable graph

The records are the ground truth. The graph is the query surface — a projection that joins records by their entities, evidence, policies, approvals, and outcomes.

Typical nodes are the things you’d expect: Account, Renewal, Ticket, Incident, PolicyVersion, Approval, AgentRun, EvidenceArtifact, Exception, Commit. Edges carry the kind of relationship — USED_EVIDENCE, APPLIED_POLICY, INVOKED_EXCEPTION, APPROVED_BY, BASED_ON_PRECEDENT, WROTE_STATE_TO, RESULTED_IN — and the metadata you need to reason across time: event time, valid time, provenance, policy version references, redaction labels, access constraints.

Time is the load-bearing column. Most existing systems store current state and lose the picture of what the world looked like when a decision was made. A precedent without a snapshot of the world it lived in is not a precedent; it is an anecdote. The graph carries the snapshot reference on every edge so that a query like “show me decisions made under POLICY_RETURNS_V4 in February, with evidence drawn from the KG snapshot before the March migration” is a real query, not a research project.

Why warehouses can’t do this

Two structural reasons.

A warehouse sees the read path, not the write path. By the time records arrive via ETL, the decision context has often been lost — approvals collapsed into a status field, evidence references stripped down to ids, policy versions replaced with whichever version is current. You can reconstruct after the fact only if you saved enough; agent systems, sitting on the execution path, can save it correctly the first time.

Incumbents have current-state bias. The CRM stores “discount: 20%” and overwrites the previous value. The audit story has to be assembled from change-data-capture, status fields, and audit tables that record which user changed what — but not which evidence they relied on, which policy was in effect, or which precedent they were following. The CRM is the source of truth for what is true now; it is not, and was never designed to be, the source of truth for why it became true.

This is not a critique of warehouses or of incumbents. It is an observation about position. The system that sees the decision at commit time is the right place to capture decision lineage. That position is what makes the context graph idea practical.

Four APIs that earn their keep

A context graph is useful if it exposes the right operations.

EmitDecisionRecord runs as part of the canonical loop. Every governed workflow guarantees “no commit without record” — the runtime refuses to mark a side effect as committed unless the record is persisted. This is the single most important invariant in the model.

QueryPrecedent is what makes the graph valuable to agents themselves. Given a candidate decision and constraints, the API returns similar prior cases scoped by tenant and intent — including the approval patterns, the evidence types relied on, and the eventual outcomes. Agents can use this to anchor their reasoning on the actual history of the business, not on a hallucinated reading of the written policy.

ReplayDecision lets a trace_id be reconstructed from pinned inputs: the evidence pack as-of decision time, the policy versions in effect, the approvals and exception path, the exact write set committed. The verdict must byte-match the persisted record. This is the contract that makes audit honest.

AuditDecisionLineage answers the questions the compliance org actually has. Who approved deviations from policy X last quarter? Which exceptions correlate with negative outcomes downstream? Which workflows are exception-heavy and probably need policy refactor? These are projections against the record store; they used to be multi-week investigations.

Where this lands first

The highest-leverage entry points are workflows that are exception-heavy (“it depends” is a real answer), headcount-heavy (many humans reconciling context manually), or cross-system glue (RevOps, DevOps, SecOps, Finance Ops). These exist precisely because no single system of record owns the full workflow. ContextOS can automate the role and persist the decisions that role exists to produce — turning exception handling into searchable precedent rather than tribal memory.

The finance lead I quoted at the top eventually moved her team’s renewal-discount workflow onto this pattern. The first quarter, the audit pack for a regulator request took fifteen minutes instead of four days. The second quarter, the team noticed that 40% of their service-impact exceptions clustered around two specific tickets-per-account thresholds — a finding that turned into a policy refactor and reduced exception volume by a third. The win was not the time saved on the audit. The win was that the data needed to refactor the policy had finally been written down.

Production requirements worth naming

A few things that have to be true for this to hold up.

Pinned policy bundle versions go on every record. Evidence provenance — source, adapter, hashes — has to be there too. Redaction labels and tenant-scoping live on traces, nodes, and edges. Tamper-evidence depends on the append-only chain and runtime signing keys. Human approvals are first-class events, not status fields.

For quality, I’d watch a small number of metrics. Trace integrity: record emission success rate, “no commit without record” enforcement rate, missing-evidence rate. Precedent utility: precedent hit rate, agent acceptance rate, policy-deviation explainability rate (every deviation should link to an exception and an approval). Context health: the rate at which conflicting evidence or policies stay unresolved. Runtime cost: record overhead p95, replay determinism rate.

If the dashboard does not have these on it, you do not have a context graph. You have a logging pipeline.

Graph readiness checklist

Requirement	What to verify
Source of truth	Decision Records are append-only and graph projections can be rebuilt from them.
Provenance	Every node and edge links to `trace_id`, evidence ref, policy decision, approval, or tool transcript.
Time model	Event time, valid time, and snapshot version are queryable.
Access control	Tenant scope, redaction labels, and subject-level permissions apply to graph queries.
Replay	A graph answer can be traced back to a replayable run, not only a warehouse row.
Outcome loop	Outcomes, corrections, and later StrategyRules connect back to the decisions they changed.

A closing thought

Agents will not become reliable through governance alone. Reliability comes from turning organizational memory — exceptions, approvals, and precedents — into durable, queryable artifacts. The runtime that sits on the execution path is the only system in a position to capture it correctly the first time, at the moment the decision is made.

The context graph is what you get when you commit to that. It is the system of record for why we did that — and over time, the asset that lets the next decision be better than the last.