Knowledge Graph

Evidence-bound retrieval substrate for grounded reasoning.

Foundational SpecLast reviewed: 2026-05-04 Edit on GitHub

At a glance

Intelligence planeSubstrate of meaning

Evidence-bound retrieval substrate with snapshot pinning, provenance on every edge, and conflict-aware traversal.

Inputs

Ontology and identity-layer references
Source documents, databases, APIs, and event streams
Retrieval parameters from the Context Pack
Run Context (tenant, role, budgets)

Outputs

Evidence bundles with source IDs, hashes, traversal paths
Snapshot + ontology version pin
Conflict markers
Aggregate confidence

Canonical types

EvidenceBundle
KGSnapshot
Edge
ProvenanceRef

The Knowledge Graph is the Intelligence-plane primitive that turns raw enterprise data into evidence the Decision plane can cite. Every retrieval step that influences a decision must trace back through the graph to a typed source.

Definition

A typed labeled-property graph (LPG) of canonical entities and relationships, where every edge carries an evidence_ref to the source it was derived from. Reads are policy-scoped; writes require evidence; replay is deterministic via snapshot pinning.

Why it exists

Naive RAG injects unstructured passages into the prompt and lets the model decide what to trust. That is unauditable and contradicts the Decision Catalog requirement that every decision cite typed evidence. The Knowledge Graph fixes the substrate: retrieval returns evidence bundles with traversal paths, source IDs, freshness signals, and conflict markers — not opaque chunks.

How it works

Ingest: sources (databases, documents, APIs, events) are normalized against the Ontology.
Resolve: entities are linked via the Identity Layer; ambiguous matches are gated by a confidence threshold (≥ 0.92 by default).
Index: vectors, BM25, and graph indices are kept in sync against a single snapshot version.
Retrieve: GraphRAG expands from a seed set along permitted relationships under a hop budget, returning a typed evidence bundle.
Replay: every retrieval is reproducible by pinning the snapshot version and the query parameters.

Graph data model

Element	Carries
Node	`entity_type`, `ceid`, `attributes`, `provenance`, `as_of`
Edge	`relationship_type`, `evidence_ref`, `confidence`, `direction`, `valid_from`, `valid_to`
Snapshot	monotonic `snapshot_version`, `ontology_version`, content hash

Invariants enforced at write time

No edge created without evidence_ref.
Entity merges require explicit evidence and an audit record.
Relationship types must exist in the active ontology version.

GraphRAG retrieval

Retrieval is parameterized in the Context Pack and bound by the Run Context’s budgets:

{
  "retrieval": {
    "mode": "graphrag",
    "seed_strategy": "intent+entity_extraction",
    "max_hops": 2,
    "top_k": 8,
    "freshness_window": "30d",
    "evidence_required": true
  }
}

The compiler enforces hop budgets per Context Pack bucket. Each returned hop carries its own evidence ref, so the Critic can score each individually and the Decision Record can list them by source.

Access control model

Reads are scoped by tenant_id, data_classification, and the agent’s role from the Run Context.
Cross-tenant traversal is denied at the storage layer, not just the API.
Sensitive attributes are redacted before they enter the compiled context (see Memory for write-side redaction).

Conflict detection

When two edges assert contradictory facts on the same (subject, predicate), the retrieval bundle returns both with their evidence and a conflict: true marker. The Decision Catalog treats unresolved conflicts on required_evidence as a hard fail — the planner must request reconciliation before proceeding.

Deterministic replay

Every retrieval response embeds snapshot_version and ontology_version. To replay a run later (for evaluation or post-incident analysis), the same query against the same versions must return the same bundle. New edges added after the snapshot do not leak into the replay.

Interfaces

Inputs

Ontology and identity-layer references
Source documents, databases, APIs, and event streams
Retrieval parameters from the Context Pack
Run Context (tenant_id, role, budgets)

Outputs

Evidence bundles with source IDs, hashes, traversal paths
Snapshot + ontology version pin
Conflict markers
Aggregate confidence

Failure modes

Edges written without evidence (caught at write time; rejected).
Retrieval expansion exceeds hop budget — bundle silently truncated, hiding relevant evidence. Mitigation: emit a truncated: true flag the planner must handle.
Stale snapshot used in production reads after writes have advanced — mitigated by snapshot version reconciliation on every read.
Entity-resolution false positives merge two real-world entities and corrupt downstream traversals.
Cross-tenant leakage through a shared relationship type — mitigated by tenant-scoped traversal at the storage layer.

Operational concerns

Snapshot pinning per environment; promotion is a deliberate step.
Re-index cost when the ontology version advances.
Vector index drift vs. structured edges — alerting on disagreement.
Backfill jobs must replay against ontology versions, not against now.
Eviction of edges past valid_to requires audit trail.

Evaluation metrics

Evidence coverage (fraction of decisions whose evidence_refs resolve to live edges).
Retrieval precision/recall on golden sets per intent.
Conflict rate by (entity_type, relationship_type).
Cross-tenant denial rate (should be zero in steady state).
Replay determinism (identical bundle for identical inputs across snapshot pin).

Example

GraphRAG path returned for a refund-eligibility query, condensed:

{
  "snapshot_version": "kg_2026_05_03_T0930",
  "ontology_version": "ont.support@4.2.0",
  "seed": { "ceid": "order:ord_881", "entity_type": "Order" },
  "hops": [
    {
      "hop": 1,
      "edge": "order_belongs_to_customer",
      "evidence_ref": "oms:db:orders/881#row_v3",
      "to": { "ceid": "customer:cus_77", "entity_type": "Customer" },
      "confidence": 0.99
    },
    {
      "hop": 2,
      "edge": "customer_has_segment",
      "evidence_ref": "crm:exports/segments_2026_05_01.parquet#offset_4112",
      "to": { "value": "vip" },
      "confidence": 0.96,
      "as_of": "2026-05-01"
    }
  ],
  "conflicts": [],
  "truncated": false
}

Common misconceptions

It is not the embedding store. Vectors are an index strategy. The graph is the truth.
It is not optional for high-risk decisions. A decision without graph-resolved evidence cannot satisfy the Decision Catalog’s required_evidence field.
It is not where memory lives. Memory is the Memory plane; the graph is the substrate that memory promotion writes against.