Observability
Trust-plane component owning OTEL-first tracing, audit persistence, and tail-based sampling.
OTEL-first tracing, audit persistence, and tail-based sampling — replay and post-incident review depend on this.
- OTEL spans from every plane component
- Tool transcripts from the Tool Manager
- Approval-gate decisions from the Policy Engine
- Decision Records from the Decision Record store
- Scorecards from the Evaluation Engine
- Trace bundles with full W3C context
- Audit records (one per gate, one per policy decision, one per memory write proposal)
- Indexed trace store consumable by replay harness
- Security events (cross-tenant denials, credential rotations, sandbox violations)
- TraceBundle
- AuditRecord
- SecurityEvent
- OTELSpan
The Observability component owns OTEL spans, audit records, and the trace-store substrate that makes replay possible.
Definition
A trace + audit infrastructure that propagates W3C Trace Context end-to-end (compiler → planner → critic → gateway → adapter), tags spans with namespaced ContextOS attributes, and persists trace bundles + tool transcripts + decision records as the substrate for replay and audit.
Why it exists
Logs are unstructured and untyped. Audit needs structured, signed, replayable artifacts tied to a trace_id and a Decision Record. This component is what turns every plane’s behavior into reproducible evidence.
Inputs
- OTEL spans from every plane component
- Tool transcripts from the Tool Manager
- Approval-gate decisions from the Policy Engine
- Decision Records from the Decision Record store
- scorecards from the Evaluation Engine
Outputs
- Trace bundles with full W3C context
- Audit records (one per gate, one per policy decision, one per memory write proposal)
- Indexed trace store consumable by replay harness
- Security events (cross-tenant denials, credential rotations, sandbox violations)
OTEL contract
- Trace context propagated as
traceparent,tracestate,baggage. - Span attributes namespaced under
contextos.*:contextos.run.run_idcontextos.context.pack_versioncontextos.action.approval_mode_effectivecontextos.trust.policy_decision_idcontextos.intelligence.snapshot_versioncontextos.budget.tokens_used
- Tail-based sampling forced on: any run crossing an approval gate, any run failing scorecard thresholds, any run hitting the loop guard.
How it works
- Every plane component emits spans against the inbound
trace_id. - The Tool Manager re-emits
traceparenton every outbound call to adapters. - Audit records are written append-only and indexed by
(trace_id, decision_key, time_window). - Replay queries fetch the trace bundle + recorded transcripts + pack version + snapshot version → re-derive Critic verdict and Decision Record without re-executing tools.
Failure modes
- Custom adapter omits W3C headers — trace gap; caught by coverage assertion in CI.
- Tail-based sampler back-pressure drops a high-risk run — alert; sampler must be sized for peak.
- Audit store availability — critical path of every governed run; replicate.
- Cardinality explosion in tag values — mitigated by namespaced attribute schema and lint-on-merge.
Operational concerns
- Trace retention bands by
data_classification(PUBLIC / INTERNAL / CONFIDENTIAL / RESTRICTED). - Per-tenant cost budgets for trace storage.
- PII redaction applied at trace emission, not at query time.
- Quarterly incident-response drill exercising replay end-to-end.
Evaluation metrics
- Trace coverage (fraction of runs with full plane span chain).
- Audit completeness (fraction of governed actions with a matching audit record).
- Replay determinism on a pinned bundle.
- Mean time to fetch a replay bundle.