Observability

Trust-plane component owning OTEL-first tracing, audit persistence, and tail-based sampling.

Reference DesignLast reviewed: 2026-05-04 Edit on GitHub

At a glance

Trust planeControl over the other four

OTEL-first tracing, audit persistence, and tail-based sampling — replay and post-incident review depend on this.

Inputs

OTEL spans from every plane component
Tool transcripts from the Tool Manager
Approval-gate decisions from the Policy Engine
Decision Records from the Decision Record store
Scorecards from the Evaluation Engine

Outputs

Trace bundles with full W3C context
Audit records (one per gate, one per policy decision, one per memory write proposal)
Indexed trace store consumable by replay harness
Security events (cross-tenant denials, credential rotations, sandbox violations)

Canonical types

TraceBundle
AuditRecord
SecurityEvent
OTELSpan

Reference Architecture

The Observability component owns OTEL spans, audit records, and the trace-store substrate that makes replay possible.

Definition

A trace + audit infrastructure that propagates W3C Trace Context end-to-end (compiler → planner → critic → gateway → adapter), tags spans with namespaced ContextOS attributes, and persists trace bundles + tool transcripts + decision records as the substrate for replay and audit.

Why it exists

Logs are unstructured and untyped. Audit needs structured, signed, replayable artifacts tied to a trace_id and a Decision Record. This component is what turns every plane’s behavior into reproducible evidence.

Inputs

OTEL spans from every plane component
Tool transcripts from the Tool Manager
Approval-gate decisions from the Policy Engine
Decision Records from the Decision Record store
scorecards from the Evaluation Engine

Outputs

Trace bundles with full W3C context
Audit records (one per gate, one per policy decision, one per memory write proposal)
Indexed trace store consumable by replay harness
Security events (cross-tenant denials, credential rotations, sandbox violations)

OTEL contract

Trace context propagated as traceparent, tracestate, baggage.
Span attributes namespaced under contextos.*:
- contextos.run.run_id
- contextos.context.pack_version
- contextos.action.approval_mode_effective
- contextos.trust.policy_decision_id
- contextos.intelligence.snapshot_version
- contextos.budget.tokens_used
Tail-based sampling forced on: any run crossing an approval gate, any run failing scorecard thresholds, any run hitting the loop guard.

How it works

Every plane component emits spans against the inbound trace_id.
The Tool Manager re-emits traceparent on every outbound call to adapters.
Audit records are written append-only and indexed by (trace_id, decision_key, time_window).
Replay queries fetch the trace bundle + recorded transcripts + pack version + snapshot version → re-derive Critic verdict and Decision Record without re-executing tools.

Failure modes

Custom adapter omits W3C headers — trace gap; caught by coverage assertion in CI.
Tail-based sampler back-pressure drops a high-risk run — alert; sampler must be sized for peak.
Audit store availability — critical path of every governed run; replicate.
Cardinality explosion in tag values — mitigated by namespaced attribute schema and lint-on-merge.

Operational concerns

Trace retention bands by data_classification (PUBLIC / INTERNAL / CONFIDENTIAL / RESTRICTED).
Per-tenant cost budgets for trace storage.
PII redaction applied at trace emission, not at query time.
Quarterly incident-response drill exercising replay end-to-end.

Evaluation metrics

Trace coverage (fraction of runs with full plane span chain).
Audit completeness (fraction of governed actions with a matching audit record).
Replay determinism on a pinned bundle.
Mean time to fetch a replay bundle.