ContextOS Metrics Glossary

Unified metric glossary across the five planes — Intelligence, Context, Decision, Action, Trust.

Reference DesignLast reviewed: 2026-05-09 Edit on GitHub

At a glance

This page is the stable metrics contract for ContextOS. It defines the names, dimensions, source artifacts, owners, and minimum scorecard that every implementation should keep consistent across the five planes.

It is not an exhaustive dashboard inventory. Teams can add local metrics, but release gates, incidents, and executive scorecards should roll up through the contract below.

Naming Conventions

Metric names use one namespace and one shape:

contextos.<plane>.<component>.<signal>

Examples:

Metric	Meaning
`contextos.intelligence.gateway.request_duration_ms`	AI Gateway request latency.
`contextos.context.pack.evidence_coverage_rate`	Share of required evidence represented in the compiled Context Pack.
`contextos.decision.plan.validity_rate`	Share of generated plans passing structural validation.
`contextos.action.tool.success_rate`	Share of tool calls returning a completed `ToolResultEnvelope`.
`contextos.trust.trace.completeness_rate`	Share of runs with the required trace and audit artifacts.

Rules:

Use lowercase snake case for components and signals.
Use _duration_ms for latency and wall-clock duration.
Use _rate for fractions, _count for event counts, _total for monotonic counters, and _cost_usd or _cost_inr for cost.
Record latency as distributions with at least p50, p95, and p99 rollups.
Keep model names, provider names, tool IDs, and policy IDs in dimensions, not in metric names.
Do not encode raw user text, prompts, document titles, or error messages in metric names or labels.

Required Dimensions

Every metric must be joinable to the run trace. High-cardinality IDs such as run_id and trace_id belong on spans, events, exemplars, and rollup records; only attach them as time-series labels when the backend is designed for that cardinality.

Dimension	Required on	Purpose
`tenant_id`	span, event, rollup	Tenant-level slicing and data isolation.
`environment`	span, event, metric	`dev`, `staging`, `prod`, or equivalent.
`release_version`	span, event, metric	Runtime or service release attribution.
`plane`	span, event, metric	One of `intelligence`, `context`, `decision`, `action`, `trust`.
`component`	span, event, metric	Gateway, compiler, planner, tool manager, evaluator, or observability component.
`component_version`	span, event, rollup	Localizes regressions to a deployed component version.
`workflow_id`	span, event, rollup	Workflow or journey-level rollup.
`intent_id`	span, event, rollup	Intent-level quality, safety, and cost comparison.
`task_type`	span, event, rollup	Stable task taxonomy used by scorecards.
`risk_class`	span, event, rollup	Approval and safety tier.
`run_id`	span, event, exemplar	Joins artifacts for a single run.
`trace_id`	span, event, exemplar	Joins telemetry with the trace bundle.

Use the following dimensions when they apply:

Dimension	Applies to
`model_profile_id`, `provider`, `routing_policy_id`	AI Gateway and LLM Router calls.
`pack_version`, `context_pack_id`, `evidence_source_type`	Context Pack compilation and evidence metrics.
`memory_tier`, `knowledge_snapshot_id`	Memory and knowledge-substrate metrics.
`tool_id`, `adapter_id`, `approval_mode_declared`, `approval_mode_effective`	Tool Manager and adapter calls.
`policy_profile`, `policy_decision_id`	Policy and approval metrics.
`evaluator_id`, `golden_set_id`, `replay_dataset_id`	Evaluation and Observability metrics.
`channel`, `locale`, `user_cohort`	User-facing adoption and UX metrics.

Minimum Scorecard

Every production scorecard should expose these metrics by tenant_id, intent_id, risk_class, release_version, and pack_version when available.

Metric	Definition	Owner	Direction
`contextos.decision.task.verified_success_rate`	Tasks that pass the configured verifier divided by tasks started.	Decision plane	Up
`contextos.decision.task.safe_completion_rate`	Tasks completed without policy or approval violations divided by tasks started.	Trust plane	Up
`contextos.run.end_to_end_duration_ms`	Time from accepted user or system request to terminal run state.	Platform runtime	Down
`contextos.run.first_useful_response_duration_ms`	Time from request acceptance to first useful response or action proposal.	Product + runtime	Down
`contextos.budget.cost_per_verified_success`	Total run cost divided by verified successes in the rollup window.	Platform + product	Down
`contextos.context.answer.evidence_backed_rate`	Responses requiring evidence that include valid `evidence_refs`.	Context plane	Up
`contextos.action.tool.success_rate`	Successful tool results divided by attempted tool calls.	Action plane	Up
`contextos.trust.policy.violation_rate`	Runs with policy violations divided by completed runs.	Trust plane	Down
`contextos.trust.trace.completeness_rate`	Runs with required spans, scorecard, evidence, and audit links.	Observability	Up
`contextos.trust.replay.determinism_rate`	Replay runs that reproduce the pinned expected verdict or record.	Evaluation	Up

Thresholds are environment- and intent-specific. The contract defines formulas and owners; each deployment defines alert thresholds, sampling policy, and release gates.

Per-Plane Metrics

Intelligence Plane

The Intelligence plane owns model invocation, model routing, provider behavior, and model-side budgets through the AI Gateway and LLM Router.

Area	Contract metrics
Availability	`contextos.intelligence.gateway.availability_rate`, `contextos.intelligence.provider.error_rate`, `contextos.intelligence.provider.timeout_rate`
Latency	`contextos.intelligence.gateway.request_duration_ms`, `contextos.intelligence.router.decision_duration_ms`, `contextos.intelligence.provider.request_duration_ms`
Routing	`contextos.intelligence.router.fallback_rate`, `contextos.intelligence.router.model_switch_rate`, `contextos.intelligence.router.policy_rejection_count`
Quality controls	`contextos.intelligence.output.invalid_schema_rate`, `contextos.intelligence.output.refusal_rate`, `contextos.intelligence.output.repair_rate`
Budget	`contextos.intelligence.tokens.input_total`, `contextos.intelligence.tokens.output_total`, `contextos.intelligence.gateway.cost_usd`, `contextos.intelligence.gateway.cache_hit_rate`

Context Plane

The Context plane owns Context Packs, retrieval, memory, evidence, conflict handling, and knowledge snapshots. See Context Pack, Memory Model, and Knowledge Graph.

Area	Contract metrics
Pack build	`contextos.context.pack.build_duration_ms`, `contextos.context.pack.token_count`, `contextos.context.pack.context_window_utilization_rate`
Evidence	`contextos.context.pack.evidence_coverage_rate`, `contextos.context.answer.evidence_backed_rate`, `contextos.context.claim.attribution_rate`
Retrieval	`contextos.context.retrieval.precision_at_k`, `contextos.context.retrieval.recall_at_k`, `contextos.context.retrieval.stale_source_rate`
Memory	`contextos.context.memory.promotion_accept_rate`, `contextos.context.memory.correction_rate`, `contextos.context.memory.stale_read_rate`
Context hazards	`contextos.context.noise.irrelevant_token_rate`, `contextos.context.conflict.detected_rate`, `contextos.context.conflict.resolved_rate`, `contextos.context.poisoning.suspected_rate`

Decision Plane

The Decision plane owns planning, execution choice, critique, loop controls, and the Decision Record. See also the Decision Catalog.

Area	Contract metrics
Planning	`contextos.decision.plan.validity_rate`, `contextos.decision.plan.feasibility_rate`, `contextos.decision.plan.revision_count`
Execution control	`contextos.decision.executor.step_success_rate`, `contextos.decision.executor.loop_guard_trigger_rate`, `contextos.decision.executor.escalation_rate`
Critique	`contextos.decision.critic.veto_rate`, `contextos.decision.critic.repair_success_rate`, `contextos.decision.critic.false_pass_rate`
Outcomes	`contextos.decision.task.success_rate`, `contextos.decision.task.verified_success_rate`, `contextos.decision.task.abandonment_rate`
Records	`contextos.decision.record.completeness_rate`, `contextos.decision.record.evidence_ref_count`, `contextos.decision.record.policy_ref_count`

Action Plane

The Action plane owns side effects through the Tool Manager and the Adapter Mesh.

Area	Contract metrics
Tool calls	`contextos.action.tool.success_rate`, `contextos.action.tool.error_rate`, `contextos.action.tool.request_duration_ms`, `contextos.action.tool.retry_rate`
Approval binding	`contextos.action.approval.required_rate`, `contextos.action.approval.honored_rate`, `contextos.action.approval.denied_rate`
Idempotency	`contextos.action.idempotency.replay_hit_rate`, `contextos.action.idempotency.duplicate_effect_rate`
Adapter health	`contextos.action.adapter.availability_rate`, `contextos.action.adapter.schema_validation_error_rate`, `contextos.action.adapter.version_drift_count`
Evidence return	`contextos.action.tool.evidence_return_rate`, `contextos.action.tool.audit_link_rate`

Trust Plane

The Trust plane owns policy, approval gates, evaluation, observability, audit, replay, and security posture. See Evaluation and Observability, Observability, and the Policy Engine.

Area	Contract metrics
Policy	`contextos.trust.policy.violation_rate`, `contextos.trust.policy.must_refuse_coverage_rate`, `contextos.trust.policy.decision_duration_ms`
Evaluation	`contextos.trust.scorecard.coverage_rate`, `contextos.trust.eval.pass_rate`, `contextos.trust.eval.regression_rate`, `contextos.trust.eval.judge_agreement_rate`
Observability	`contextos.trust.trace.completeness_rate`, `contextos.trust.audit.completeness_rate`, `contextos.trust.trace.fetch_duration_ms`
Replay	`contextos.trust.replay.determinism_rate`, `contextos.trust.replay.dataset_coverage_rate`, `contextos.trust.replay.duration_ms`
Security	`contextos.trust.security.event_count`, `contextos.trust.security.cross_tenant_denial_count`, `contextos.trust.security.redaction_failure_rate`
Adoption	`contextos.trust.user_correction_rate`, `contextos.trust.operator_override_rate`, `contextos.trust.human_escalation_rate`

Emitted Artifacts

Metrics are only useful when their source artifacts are stable. Each run should emit the artifacts needed for its path; for example, a read-only answer may not emit an approval decision, but any governed action must.

Artifact	Required contents	Primary owner
`ContextPackManifest`	pack version, item IDs, evidence refs, token spans, source timestamps, retrieval query refs	Context plane
`RoutingDecision`	model profile, provider adapter, routing policy, rejected candidates summary, fallback index, usage, estimated cost	Intelligence plane
`PlanRecord`	plan ID, steps, feasibility checks, revisions, loop-guard state	Decision plane
`DecisionRecord`	final decision, verifier result, evidence refs, policy refs, scorecard ref	Decision plane
`ToolCall` / `ToolResult`	tool ID, adapter ID, schema refs, approval mode, idempotency key, result status, evidence refs	Action plane
`PolicyDecision`	policy profile, decision ID, rule refs, gate status, approver ref when applicable	Trust plane
`ConflictLedger`	conflicting sources, severity, chosen resolution, rule or reviewer reference	Context + Trust planes
`MemoryWriteProposal`	proposed fact, provenance, promotion decision, reviewer or policy result	Context plane
`Scorecard`	evaluator IDs, dimension scores, thresholds, release-gate verdict	Evaluation
`TraceBundle`	W3C trace context, plane span chain, artifact refs, audit refs, sampling reason	Observability
`ReplayDataset`	pinned input envelope, pack version, snapshot refs, tool transcripts, expected verdict	Evaluation

Owners

Every contract metric needs a named owner before it is used in release gates or incident review.

Owner	Responsibilities
Platform runtime	Gateway latency, routing, cost, token accounting, run-level duration, service availability.
Context engineering	Context Pack quality, retrieval, memory promotion, evidence attribution, conflict and poisoning signals.
Decision engineering	Plan validity, execution control, verifier outcomes, Decision Record completeness.
Action platform	Tool Manager reliability, approval-mode binding, adapter health, idempotency, tool evidence return.
Trust, security, and SRE	Policy decisions, audit, trace completeness, replay, security events, redaction, incident scorecards.
Product or domain owner	Intent taxonomy, done criteria, golden sets, acceptable thresholds, user correction interpretation.

Owner duties:

Maintain the formula, numerator, denominator, unit, and rollup windows.
Declare the source artifact and required dimensions.
Define alert thresholds and release-gate thresholds per environment.
Review metric behavior after any schema, policy, model, tool, or Context Pack version change.

Link Map

AI Gateway and LLM Router: model routing, provider calls, budgets, and token metrics.
Tool Manager: tool-call success, approval binding, adapter health, and idempotency metrics.
Evaluation and Observability: scorecards, replay, release gates, evaluator dimensions.
Observability: trace bundles, audit records, sampling, and replay substrate.
Policy Engine: policy decisions and approval gates.