Skip to content
Press / to search

Knowledge Graph

Evidence-bound retrieval substrate for grounded reasoning.

Foundational SpecLast reviewed: Edit on GitHub
At a glance
Intelligence planeSubstrate of meaning

Evidence-bound retrieval substrate with snapshot pinning, provenance on every edge, and conflict-aware traversal.

Inputs
  • Ontology and identity-layer references
  • Source documents, databases, APIs, and event streams
  • Retrieval parameters from the Context Pack
  • Run Context (tenant, role, budgets)
Outputs
  • Evidence bundles with source IDs, hashes, traversal paths
  • Snapshot + ontology version pin
  • Conflict markers
  • Aggregate confidence
Canonical types
  • EvidenceBundle
  • KGSnapshot
  • Edge
  • ProvenanceRef

The Knowledge Graph is the Intelligence-plane primitive that turns raw enterprise data into evidence the Decision plane can cite. Every retrieval step that influences a decision must trace back through the graph to a typed source.

Definition

A typed labeled-property graph (LPG) of canonical entities and relationships, where every edge carries an evidence_ref to the source it was derived from. Reads are policy-scoped; writes require evidence; replay is deterministic via snapshot pinning.

Why it exists

Naive RAG injects unstructured passages into the prompt and lets the model decide what to trust. That is unauditable and contradicts the Decision Catalog requirement that every decision cite typed evidence. The Knowledge Graph fixes the substrate: retrieval returns evidence bundles with traversal paths, source IDs, freshness signals, and conflict markers — not opaque chunks.

How it works

  1. Ingest: sources (databases, documents, APIs, events) are normalized against the Ontology.
  2. Resolve: entities are linked via the Identity Layer; ambiguous matches are gated by a confidence threshold (≥ 0.92 by default).
  3. Index: vectors, BM25, and graph indices are kept in sync against a single snapshot version.
  4. Retrieve: GraphRAG expands from a seed set along permitted relationships under a hop budget, returning a typed evidence bundle.
  5. Replay: every retrieval is reproducible by pinning the snapshot version and the query parameters.

Graph data model

ElementCarries
Nodeentity_type, ceid, attributes, provenance, as_of
Edgerelationship_type, evidence_ref, confidence, direction, valid_from, valid_to
Snapshotmonotonic snapshot_version, ontology_version, content hash

Invariants enforced at write time

  • No edge created without evidence_ref.
  • Entity merges require explicit evidence and an audit record.
  • Relationship types must exist in the active ontology version.

GraphRAG retrieval

Retrieval is parameterized in the Context Pack and bound by the Run Context’s budgets:

{
  "retrieval": {
    "mode": "graphrag",
    "seed_strategy": "intent+entity_extraction",
    "max_hops": 2,
    "top_k": 8,
    "freshness_window": "30d",
    "evidence_required": true
  }
}

The compiler enforces hop budgets per Context Pack bucket. Each returned hop carries its own evidence ref, so the Critic can score each individually and the Decision Record can list them by source.

Access control model

  • Reads are scoped by tenant_id, data_classification, and the agent’s role from the Run Context.
  • Cross-tenant traversal is denied at the storage layer, not just the API.
  • Sensitive attributes are redacted before they enter the compiled context (see Memory for write-side redaction).

Conflict detection

When two edges assert contradictory facts on the same (subject, predicate), the retrieval bundle returns both with their evidence and a conflict: true marker. The Decision Catalog treats unresolved conflicts on required_evidence as a hard fail — the planner must request reconciliation before proceeding.

Deterministic replay

Every retrieval response embeds snapshot_version and ontology_version. To replay a run later (for evaluation or post-incident analysis), the same query against the same versions must return the same bundle. New edges added after the snapshot do not leak into the replay.

Interfaces

Inputs

  • Ontology and identity-layer references
  • Source documents, databases, APIs, and event streams
  • Retrieval parameters from the Context Pack
  • Run Context (tenant_id, role, budgets)

Outputs

  • Evidence bundles with source IDs, hashes, traversal paths
  • Snapshot + ontology version pin
  • Conflict markers
  • Aggregate confidence

Failure modes

  • Edges written without evidence (caught at write time; rejected).
  • Retrieval expansion exceeds hop budget — bundle silently truncated, hiding relevant evidence. Mitigation: emit a truncated: true flag the planner must handle.
  • Stale snapshot used in production reads after writes have advanced — mitigated by snapshot version reconciliation on every read.
  • Entity-resolution false positives merge two real-world entities and corrupt downstream traversals.
  • Cross-tenant leakage through a shared relationship type — mitigated by tenant-scoped traversal at the storage layer.

Operational concerns

  • Snapshot pinning per environment; promotion is a deliberate step.
  • Re-index cost when the ontology version advances.
  • Vector index drift vs. structured edges — alerting on disagreement.
  • Backfill jobs must replay against ontology versions, not against now.
  • Eviction of edges past valid_to requires audit trail.

Evaluation metrics

  • Evidence coverage (fraction of decisions whose evidence_refs resolve to live edges).
  • Retrieval precision/recall on golden sets per intent.
  • Conflict rate by (entity_type, relationship_type).
  • Cross-tenant denial rate (should be zero in steady state).
  • Replay determinism (identical bundle for identical inputs across snapshot pin).

Example

GraphRAG path returned for a refund-eligibility query, condensed:

{
  "snapshot_version": "kg_2026_05_03_T0930",
  "ontology_version": "ont.support@4.2.0",
  "seed": { "ceid": "order:ord_881", "entity_type": "Order" },
  "hops": [
    {
      "hop": 1,
      "edge": "order_belongs_to_customer",
      "evidence_ref": "oms:db:orders/881#row_v3",
      "to": { "ceid": "customer:cus_77", "entity_type": "Customer" },
      "confidence": 0.99
    },
    {
      "hop": 2,
      "edge": "customer_has_segment",
      "evidence_ref": "crm:exports/segments_2026_05_01.parquet#offset_4112",
      "to": { "value": "vip" },
      "confidence": 0.96,
      "as_of": "2026-05-01"
    }
  ],
  "conflicts": [],
  "truncated": false
}

Common misconceptions

  • It is not the embedding store. Vectors are an index strategy. The graph is the truth.
  • It is not optional for high-risk decisions. A decision without graph-resolved evidence cannot satisfy the Decision Catalog’s required_evidence field.
  • It is not where memory lives. Memory is the Memory plane; the graph is the substrate that memory promotion writes against.