Skip to content
Press / to search

Observability

Trust-plane component owning OTEL-first tracing, audit persistence, and tail-based sampling.

Reference DesignLast reviewed: Edit on GitHub
At a glance
Trust planeControl over the other four

OTEL-first tracing, audit persistence, and tail-based sampling — replay and post-incident review depend on this.

Inputs
  • OTEL spans from every plane component
  • Tool transcripts from the Tool Manager
  • Approval-gate decisions from the Policy Engine
  • Decision Records from the Decision Record store
  • Scorecards from the Evaluation Engine
Outputs
  • Trace bundles with full W3C context
  • Audit records (one per gate, one per policy decision, one per memory write proposal)
  • Indexed trace store consumable by replay harness
  • Security events (cross-tenant denials, credential rotations, sandbox violations)
Canonical types
  • TraceBundle
  • AuditRecord
  • SecurityEvent
  • OTELSpan

Reference Architecture

The Observability component owns OTEL spans, audit records, and the trace-store substrate that makes replay possible.

Definition

A trace + audit infrastructure that propagates W3C Trace Context end-to-end (compiler → planner → critic → gateway → adapter), tags spans with namespaced ContextOS attributes, and persists trace bundles + tool transcripts + decision records as the substrate for replay and audit.

Why it exists

Logs are unstructured and untyped. Audit needs structured, signed, replayable artifacts tied to a trace_id and a Decision Record. This component is what turns every plane’s behavior into reproducible evidence.

Inputs

Outputs

  • Trace bundles with full W3C context
  • Audit records (one per gate, one per policy decision, one per memory write proposal)
  • Indexed trace store consumable by replay harness
  • Security events (cross-tenant denials, credential rotations, sandbox violations)

OTEL contract

  • Trace context propagated as traceparent, tracestate, baggage.
  • Span attributes namespaced under contextos.*:
    • contextos.run.run_id
    • contextos.context.pack_version
    • contextos.action.approval_mode_effective
    • contextos.trust.policy_decision_id
    • contextos.intelligence.snapshot_version
    • contextos.budget.tokens_used
  • Tail-based sampling forced on: any run crossing an approval gate, any run failing scorecard thresholds, any run hitting the loop guard.

How it works

  1. Every plane component emits spans against the inbound trace_id.
  2. The Tool Manager re-emits traceparent on every outbound call to adapters.
  3. Audit records are written append-only and indexed by (trace_id, decision_key, time_window).
  4. Replay queries fetch the trace bundle + recorded transcripts + pack version + snapshot version → re-derive Critic verdict and Decision Record without re-executing tools.

Failure modes

  • Custom adapter omits W3C headers — trace gap; caught by coverage assertion in CI.
  • Tail-based sampler back-pressure drops a high-risk run — alert; sampler must be sized for peak.
  • Audit store availability — critical path of every governed run; replicate.
  • Cardinality explosion in tag values — mitigated by namespaced attribute schema and lint-on-merge.

Operational concerns

  • Trace retention bands by data_classification (PUBLIC / INTERNAL / CONFIDENTIAL / RESTRICTED).
  • Per-tenant cost budgets for trace storage.
  • PII redaction applied at trace emission, not at query time.
  • Quarterly incident-response drill exercising replay end-to-end.

Evaluation metrics

  • Trace coverage (fraction of runs with full plane span chain).
  • Audit completeness (fraction of governed actions with a matching audit record).
  • Replay determinism on a pinned bundle.
  • Mean time to fetch a replay bundle.