Security and Compliance

Trust-plane controls — policy outside agent code, identity propagation, approval-mode tiers, sandboxing, attestation, and OTEL-bound audit.

Living DocumentLast reviewed: 2026-05-17 Edit on GitHub

At a glance

Security in ContextOS is a runtime primitive, not a perimeter add-on. It is implemented by the Trust plane, which sits over every other plane and decides what is allowed before any side effect can occur.

Definition

A coordinated set of: deterministic policy enforcement at every plane boundary; user-delegated and agent-workload identity propagation through every Run Context; the approval-mode tier taxonomy bound to every adapter capability; cryptographic provenance (signed artifacts, hash-chained audit, optional attestation); OTEL-tied audit; and replayable evidence. No control depends on model self-policing.

Why it exists

Agents act on production systems. Prompts leak, tools misbehave, models drift, third parties go rogue. Security has to be enforced where actions actually happen — at the Tool Gateway, at the policy boundary, at the memory promotion gate, at the sandbox kernel — and recorded in a way that survives audit and incident response.

How it works

Compile-time controls — the Context Pack Compiler emits runtime_controls (must_refuse, must_escalate, approval_gates_active, redaction_rules_active) baked into the prompt manifests.
Plan-time controls — the Critic verifies plans against tool allow-lists, evidence requirements, and approval-mode declarations.
Execute-time controls — the Tool Gateway brokers identity, validates schemas, enforces approval modes, and emits trace + audit envelopes.
Memory-write controls — the memory promotion pipeline gates writes by class, contradiction check, and consent.
Audit — every applied policy decision, gate verdict, tool transcript, and memory promotion is recorded against the trace_id of the Run Context and is hash-chained for tamper-evidence.

Threat model

ContextOS treats the agent system as multi-principal under partial trust, following the framing in Trusted AI Agents in the Cloud:

The agent provider ships agent code and policies.
The model provider ships base and tuned models.
The tool ecosystem (MCP servers, OpenAPI services, A2A peers) defines capabilities.
The cloud / runtime hosts execution.
The enterprise / user delegates authority and demands proof.

No single principal is fully trusted. Compromise of the model, of any single tool, or of any single tenant must not compromise the whole system. The architecture’s job is to make every action mediated, evidenced, and attributable.

Attack classes the architecture defends against

Class	Concrete examples	Primary defense
Prompt / context injection	indirect injection from KG documents, tool outputs, memory recall	redaction at compile; deterministic policy at execute; tool surface narrowed to capability allow-list
Excessive agency	model invokes tools outside intent	`capability_class` + intent shape verified by the Critic; `act` denied for `read_only` intents
Credential exfiltration	tool argument carries a secret to a third-party domain	egress allow-list; STS-style per-call credential exchange; redaction of secrets in compiled context
Tool argument tampering	model emits an out-of-bounds amount, recipient, or scope	`arg_constraints` on the capability registration; Critic re-verify; param-level policy rules
Cross-tenant traversal	KG read or memory recall reaches another tenant’s data	storage-level `tenant_id` scoping; cross-tenant denials emitted as security events
Repeated / amplified calls	model loops a destructive action	per-`(tenant, capability)` rate budgets; idempotency keys; `numCalls` policy constraint
Memory poisoning	adversarial input promoted to long-term memory	promotion-aware memory pipeline with consent + classification + contradiction checks
Supply chain	unsigned tool, mutable image, unpinned model	Context Pack and adapter signing; content-hash image pinning; pinned model + pack version per environment
Sandbox escape	code execution capability breaks isolation	default-deny sandbox profile; signed images; per-call attestation for `e2b` / `firecracker`
Audit tampering	post-hoc edit of decision history	append-only, hash-chained Decision Records bound to W3C `trace_id`

OWASP LLM Top 10 mapping

ID	Risk	ContextOS control
LLM01	Prompt injection	redaction-at-compile + tool allow-list + Critic re-verify (see Prompt injection defense)
LLM02	Insecure output handling	typed tool envelopes; outputs flow through promotion pipeline before reaching memory or context
LLM03	Training data poisoning	spec scope: not in runtime threat model; managed by upstream model provider controls
LLM04	Model DoS	RunBudget caps (`total_tokens`, `wall_clock_ms`, `max_tool_calls`); rate budgets per `(tenant, capability)`
LLM05	Supply chain	signed Context Packs and adapters; pinned model + pack versions; image content hashes for sandbox kernels
LLM06	Sensitive info disclosure	redaction rules in `runtime_controls`; data classification on every artifact
LLM07	Insecure plugin / tool design	typed capability schemas; `arg_constraints`; `capability_class` enforcement
LLM08	Excessive agency	approval-mode tiers; intent-shape verification; default-deny on actions
LLM09	Overreliance	Decision Record exposes `evidence_refs` and `confidence`; evaluators score safety per run
LLM10	Model theft	spec scope: out of scope at runtime; managed at the model-serving boundary

The architecture also references MITRE ATLAS tactics for AI-system threat modeling; per-intent adversarial suites are tracked via the Attack Success Rate metric described in the trusted-cloud reference.

Defense in depth

Every plane has its own deterministic check. Compromise of one layer must not compromise the whole.

Layer	Control point	What is checked
Compile	Context Pack Compiler	resolved tools = `Registry ∩ Permissions − Prohibitions`; `runtime_controls` baked into manifests
Plan	Critic	tools used ⊆ surfaced; intent shape (`read_only` excludes `act`); evidence requirements
Execute	Tool Gateway	identity, schema, approval mode, rate budget, idempotency, egress allow-list
Sandbox	Sandbox kernel	profile invariants enforced by the kernel, not by the adapter
Memory write	Promotion gate	classification, consent, contradiction
Audit	OTEL + Decision Record	hash-chained envelope linked by `trace_id`
Replay	Replay harness	DecisionRecord must be reproducible from pinned snapshot

Boundary enforcement model

Every plane boundary is a deterministic check, not a model check:

Context → Decision — CompiledContext only contains capabilities the Compiler resolved through Registry ∩ Permissions − Prohibitions with approval-mode filters applied.
Decision -> Action — the Critic re-verifies the Plan; the Tool Gateway re-validates each ToolCallEnvelope at execution time.
Action → External — egress credentials are exchanged per call (STS-style); ingress credentials are verified for both the delegated user and the agent workload identity.
Memory write — capture is immutable; promotion checks consent, classification, and contradiction.
External → Audit — every accepted or denied effect emits a signed envelope; the chain breaks if any past entry is altered.

Identity propagation

Two identities flow through every Run Context, and the agent side is split into registry identity, workload identity, and a short-lived signed claim:

Delegated user identity — run_context.user.delegation carries an OAuth-style token reference, scopes, and subject claims. Used by delegated-mode tools.
Agent registry identity — run_context.agent.agent_urn is a versioned subject such as agent:contextos/support-refund@1.2.0. Used for ownership, lifecycle, manifest-scoped permissions, and audit.
Agent workload identity — run_context.agent.workload_identity is a SPIFFE-style URI (spiffe://contextos/agents/<role>). Used for service-to-service auth and for non-delegated calls.
Agent identity claim — run_context.agent.identity_claim binds the registry subject, workload, tenant, run ID, principal chain, scopes, expiry, and key ID for this run or A2A hop.

Both identities are signed metadata on every ToolCallEnvelope and on every audit record.

Token chaining and minimum privilege

Delegation tokens are exchanged at the Gateway for per-call scoped credentials (STS-style); long-lived bearer tokens never reach an adapter.
Scopes on the exchanged credential are the intersection of delegation.scopes, agent.identity_claim.scopes, permission.scopes, and the capability’s declared scope set — never the union.
A2A hops do not forward the inbound delegation token or blindly reuse the parent claim. The receiving agent re-attests its workload identity, receives a fresh narrower child claim, and re-evaluates its own policy bundle. See A2A trust boundary.

Workload identity rotation and revocation

Agent workload identities rotate on a fixed cadence; rotation is a signed change recorded against the agent registration.
A workload identity that fails attestation is denied at the Gateway before any policy is evaluated; this is logged as security_event:workload_identity_attestation_failed.
Agent signing keys rotate under change control. Historical keys may remain trusted only for replay until a declared deprecation window expires.
A revoked agent registration is removed from active discovery and denied at claim minting, A2A dispatch, and Tool Gateway execution.
Claim hashes, key IDs, agent URNs, and principal-chain summaries are logged; raw bearer tokens and raw signing secrets are never logged.

Approval-mode tiers (the policy contract)

The five canonical modes (read_only / local_write / network / delegated / destructive) are defined in Governance. Security implications:

read_only — no credential exchange beyond ingress; full audit; no approval gate.
local_write — idempotency required; tenant-scoped credentials; full audit.
network — egress allow-list; per-(tenant, capability) rate budget.
delegated — valid user delegation token required; scopes minimum-privilege; per-call evidence captured.
destructive — named approver; frozen evidence snapshot at gate; post-execution audit + reconciliation.

Policy may select a lower effective approval mode for a specific bounded request when the capability’s declared maximum allows it, but it cannot exceed that maximum.

Prompt injection defense

Prompts in ContextOS are treated as untrusted by default. The defense is structural, not heuristic:

Compile-time isolation. The Context Pack Compiler emits typed buckets (policy / tool / evidence / memory / business / session). Untrusted strings (KG documents, tool outputs, memory recall) flow through evidence and memory buckets that the Critic understands as advisory, not authoritative.
Tool surface narrowing. The model only sees the schemas of tools the Compiler resolved through Registry ∩ Permissions − Prohibitions. An injection that names a tool not in the surfaced set has no effect — the tool simply does not exist in the model’s view.
Deterministic policy outside the model. Every ToolCallEnvelope re-evaluates policy at the Gateway. Even if the model is fully manipulated by an injected instruction, the call is denied unless deterministic policy permits it.
Argument constraints, not just tool gates. arg_constraints (regex, enum, min/max, idempotency key required) bound the values the model can submit. An injection that says “send to attacker@…” against a capability with to_in: ["@…"] returns denied.
Redaction rules. runtime_controls.redaction_rules_active strip declared sensitive substrings before any prompt is sent to the model and on every ToolResultEnvelope ingestion.
Critic re-verify before execute — the plan’s tools, arg shapes, and intent class are re-checked deterministically.
Default deny on every plane boundary. An unrecognized capability, scope, or destination denies; it does not fall through.

Indirect injection from third-party content is the dominant LLM-era attack class. The defense is not “smarter prompts” — it is moving authority out of the prompt entirely.

Tenant isolation

Every read in the Knowledge Graph and every recall in Memory is scoped by tenant_id at the storage layer, not by application code.
The Tool Gateway rejects any ToolCallEnvelope whose argument-derived tenant scope does not match run_context.tenant_id.
Cross-tenant traversal denials emit a security_event:cross_tenant_denied with the offending capability, evidence_refs, and the offending value; on-call is paged on a non-zero rate.
Adapter credentials are scoped per tenant; the Tool Gateway exchanges them per call rather than holding long-lived bearer tokens.
Caches (including the cached read-only aliases) key on tenant_id first; a cache miss across tenants is structurally impossible.

Cryptographic primitives

The Trust plane treats provenance as a typed contract, not a deployment detail.

Artifact	Signed by	Verified at
Context Pack	pack owner role	Compiler at load; runtime refuses unsigned or unpinned refs
Adapter / capability registration	adapter owner	Gateway at registration; periodic re-verification
Policy bundle	governance role	Policy Engine at load; `bundle.signed_by` recorded with every `policy_decision_id`
Sandbox profile	`security_lead`	Sandbox kernel before container start
Decision Record	runtime signing key	replay harness; auditor on export
Tool transcript	runtime signing key	included in `audit.tool_transcript_id`

Hash-chained audit

Decision Records, tool transcripts, and policy_decisions[] form an append-only, hash-chained log keyed by trace_id. Each entry’s hash includes the previous entry’s hash; tampering is detectable on replay. Hash-chain construction follows the pattern in the trusted-cloud reference.

Key management

Signing keys are stored in a KMS; the runtime never holds a raw private key.
Rotation is a signed change recorded against the registry; verifiers accept any non-revoked key within the rotation window.
Revoked keys remain queryable for replay of historical runs against the keys valid at that time.

Secrets and credential exchange

The Tool Gateway is the only component that holds long-lived credentials, and only as references.

Per-call exchange. Every ToolCallEnvelope triggers an STS-style exchange producing a credential scoped to (tenant, capability, run_context.user.subject, scopes intersect permission.scopes, max-lifetime <= wall-clock budget).
Vault references, not values. auth.delegated_user_token_ref and auth.agent_token_ref are opaque vault handles. Adapters never see raw tokens.
No secret transit through the model. Secrets that must reach a tool (e.g., per-tenant API keys) bypass the Compiler — they are resolved at the Gateway from the (tenant, capability) mapping.
Permitted env vars (sandbox) are an explicit allow-list. The closed list is defined per profile and signed.
Rotation. Rotation cadence per credential class is declared at the adapter; the registry refuses adapters without a declared rotation contract.

Egress controls

network, delegated, and destructive capabilities can produce outbound traffic. The Gateway constrains every dimension:

Destination allow-list. Each capability declares endpoint_in[]. Calls to any other host are denied at the Gateway, not at the network.
DNS pinning. Hostnames in endpoint_in resolve through the runtime’s DNS layer, which refuses CNAMEs to non-allow-listed origins.
TLS pinning (optional) per capability via pinned_sha256[] of acceptable certificate fingerprints; rotation is a signed registry change.
Rate budget. Every (tenant, capability) declares a per-window rate; the Gateway returns denied with error_code: rate_exceeded rather than queueing.
Egress audit. Every outbound request emits a span carrying trace_id, policy_decision_id, capability id, and destination — sufficient for traffic-side reconciliation against the Decision Record.

Data classification and redaction

Every artifact crossing a plane boundary carries a classification:

Class	Examples	Default handling
`PUBLIC`	non-tenant docs, marketing copy	retained per default policy
`INTERNAL`	aggregate metrics, generic intents	retained per tenant policy
`CONFIDENTIAL`	customer PII, financial detail	redaction-on-emit; promotion-gated for memory
`RESTRICTED`	secrets, IDV documents, regulated payloads	never enters the prompt; never promoted to memory

Redaction rules live in runtime_controls.redaction_rules_active and are evaluated at the Compiler (before prompt construction) and at the Gateway (on ToolResultEnvelope ingestion). A redaction failure is a release-blocking Safety regression.

Data residency and retention bands are declared per classification and tenant; the audit trail records the band against every persisted artifact.

Memory-write controls

Memory is a primary security boundary because it is the channel by which untrusted content can persist into future runs.

Capture is immutable. Raw evidence is stored append-only with classification, source, and trace_id.
Promotion is explicit. A candidate is promoted only when it passes the consent-record check, the contradiction check against existing high-confidence facts, and the classification check (no RESTRICTED content is promotable).
Consent ledger. Every promotion records the consent basis: explicit user opt-in, contractual basis, or operator-authored policy. A promotion without a consent reference is rejected.
Tenant scoping. Memory recall scopes by tenant_id and subject claims; a cross-subject recall is a security_event:memory_cross_subject_denied.
Decay and erasure. Subject-level erasure cascades to embeddings, evidence references, and any derived strategy rules; a deletion request is itself recorded against trace_id for replay-aware audit.

Sandbox controls

For tool capabilities that execute untrusted code (e.g., user-provided scripts in a research workflow), the runtime exposes a Sandbox layer as a first-class component, not as a tool feature flag.

Sandbox profiles

A sandbox profile is a typed, signed contract declaring what the sandbox can and cannot do.

{
  "sandbox_profile_id": "sbx_research_default",
  "version": "1.0.0",
  "kernel": "docker",
  "image_pin": "sha256:...",
  "filesystem": { "host_mount": [], "tmpfs_mb": 256 },
  "network": { "ingress": "deny", "egress": "deny", "allowlist": [] },
  "compute": { "cpu_cores": 1.0, "memory_mb": 512, "wall_clock_ms": 30000 },
  "stdio": { "stdin_max_bytes": 65536, "stdout_max_bytes": 1048576, "stderr_max_bytes": 1048576 },
  "secrets": { "permitted_env_vars": [] },
  "result_classification": "INTERNAL"
}

Sandbox kernels

Kernel	When
`docker`	local-first deployments; image is pinned by content hash
`e2b`	hosted ephemeral sandbox; per-call attestation
`firecracker`	regulated workloads; microvm isolation
`wasm`	pure-compute thinking helpers; no syscalls

Invariants

Default deny everything. Capabilities that need code execution must reference a registered sandbox_profile_id; profiles cannot be inlined.
No host filesystem by default. Any host_mount requires a separate operator approval recorded as a signed change to the profile.
No inbound network. Egress only via explicit allowlist.
Hard wall-clock and memory caps enforced by the kernel, not the adapter.
Output is typed. stdout/stderr are captured as artifacts with classification; nothing flows into memory or the model context without going through the promotion pipeline.
Image pinning is mandatory. Tag-based image references (latest, stable) are refused.
Per-call attestation for e2b / firecracker kernels; attestation hash recorded against the Decision Record.
No secrets transit. permitted_env_vars is the closed allow-list; the sandbox never sees the broader credential store.

Lifecycle

Profile authored under change control with required reviewer (security_lead).
Promoted into the pack registry alongside Context Packs.
Bound to capabilities via sandbox_profile_id on the adapter declaration.
Retired profiles remain queryable for replay; cannot be re-promoted without a new version.

A2A trust boundary

A2A (agent-to-agent) hops are the highest-leverage place for trust to leak. The contract:

No delegation forwarding. The receiving agent does not inherit the caller’s delegation token. It re-attests its own workload identity and re-evaluates its own policy bundle against the original run_context.user.
Narrow child claims. A parent A2A call may authorize a child only through a fresh signed claim whose scopes are a subset of the parent scopes and the receiver manifest’s identity-scope ceiling.
Typed message envelopes with explicit correlation_id and parent_decision_id so the receiving Decision Record links to the parent for replay.
Approval-mode propagation. A delegated or destructive parent call that triggers an A2A hop must be re-acknowledged by the receiving agent; the receiver may select a lower effective mode only when its policy and the capability’s declared maximum allow it.
Per-hop audit. Every A2A message emits a span with the agent.agent_urn, agent.workload_identity, claim hash, and principal chain of both sender and receiver; the chain is verifiable on replay.
Loop guard. A2A graph depth and cycle detection are enforced by the Orchestrator; cycles are rejected with error_code: a2a_cycle_detected.

Supply chain

The runtime treats every artifact as supply-chain-relevant:

Context Packs. Pack registry holds pack_id@semver with content hash, signing key id, and an SBOM-style dependency tree (referenced packs, decision catalog versions, prompt fragment refs).
Adapters. Each adapter declares image hash (for hosted), source repo + commit (for in-repo), and the auth contract. Tag-based references are refused.
Models. Model id + content hash recorded against every Decision Record’s lineage; mismatch on replay surfaces as a non-determinism event.
Sandbox images pinned by content hash; tag references like latest / stable are rejected at profile validation.
Build provenance. Packs and adapters are produced through a CI pipeline that emits provenance attestations (Sigstore-compatible); the registry refuses artifacts whose attestation does not verify against the declared signer.

Attestation

For deployments that require cryptographic proof of “what executed what against which inputs,” ContextOS aligns with the differential-attestation pattern in the Trusted AI Agents in the Cloud reference. The Decision Record can carry an attestation_ref that binds the platform measurement, the artifact manifest (pack, model, policies, tools), the input envelope, the policy decisions, and the result. The replay harness consumes the manifest to reproduce the verdict; an external auditor can verify the record without re-executing tools.

This is optional infrastructure, not a baseline requirement: deployments that do not need attested execution still get hash-chained audit and signed Decision Records.

Boundary controls (this repo)

This repo is the public spec surface. Even there, controls are explicit:

Centralized security headers in next.config.ts: Content-Security-Policy (default-src 'self', script-src includes 'wasm-unsafe-eval' for Pagefind WebAssembly search without enabling production 'unsafe-eval', frame-ancestors 'none', object-src 'none', upgrade-insecure-requests, block-all-mixed-content), Strict-Transport-Security (max-age=63072000; includeSubDomains; preload), X-Frame-Options: DENY, Referrer-Policy: strict-origin-when-cross-origin, Permissions-Policy: camera=(), microphone=(), geolocation=(), X-Content-Type-Options: nosniff, Cross-Origin-Opener-Policy: same-origin, Cross-Origin-Resource-Policy: same-origin for pages with cross-origin on static and generated social preview image routes, X-Permitted-Cross-Domain-Policies: none.
Optional Basic Auth at the edge via src/proxy.ts for private review deployments (gated by BASIC_AUTH_ENABLED, bypassed for local development, loopback hosts, and canonical public domains).
CI enforces npm audit --audit-level=high, gitleaks for credential scanning (with # gitleaks:allow for documented examples), and Semgrep p/ci for static analysis.
Dependabot updates skip Vercel deployments via vercel.json’s ignoreCommand to keep the deploy queue clean.

Compliance mapping

ContextOS aligns with widely-adopted control frameworks. The mapping is intentionally control-to-primitive, not control-to-document.

Framework	ContextOS primitive
NIST AI RMF Govern / Map / Measure / Manage	Trust-plane bundles / Decision Catalog / evaluators / audit + replay
ISO/IEC 42001 AI management system	policy lifecycle + evaluator runs + improvement-proposal change control
ISO/IEC 27001 / SOC 2	tenant isolation, identity propagation, hash-chained audit, KMS key rotation, vendor (adapter) management
EU AI Act	see Governance regulatory timeline; transparency obligations encoded as runtime-enforced rules
GDPR / India DPDPA	consent ledger on memory promotion; subject erasure cascades; data classification + residency on every artifact
OWASP LLM Top 10	see the mapping table
MITRE ATLAS	adversarial suites tracked per intent; Attack Success Rate metric

Vulnerability disclosure

Security issues in this repository should be reported per SECURITY.md: email piyush@piyush.me with a description, impact, reproduction steps, and any PoC. Public disclosure is requested only after coordinated investigation. Only the latest release is supported for security updates.

Auditability

Every Decision Record carries enough to reconstruct the run:

trace_id, run_id, session_id, tenant_id, pinned pack version, snapshot version.
agent_identity.subject, agent_identity.claim_hash, principal_chain, and identity key ID.
policy_decisions[] with policy_decision_id and matched rule_ids[].
tool_transcripts[] with policy_decision_id, evidence_refs, and audit metadata.
approvals[] with approver identity, frozen evidence snapshot hash, effective approval mode.
controls_active[] (must_refuse, must_escalate, approval_gates_active, redaction_rules_active).
lineage — pack version, model id + hash, snapshot id, policy bundle versions.

Audit envelopes are append-only, hash-chained per trace_id, and signed by the runtime signing key. Tampering is detectable on replay because the chain breaks; replay against a tampered chain returns replay_status: tamper_detected rather than a recomputed verdict.

This is the substrate for replay.

Incident response and replay

Replay is the primary IR primitive. Given a trace_id:

Resolve the pinned pack_version, snapshot, and policy bundle versions from lineage.
Recover the recorded invokeAgent envelope and tool transcripts.
Re-run the canonical loop offline against the recorded transcripts; verify the produced DecisionRecord byte-matches the persisted one.
Re-score against the current evaluators; the delta isolates whether behavior has drifted in the runtime, the data, or the evaluator itself.

Operational commitments:

Quarterly drill end-to-end against a chosen production trace_id.
Tail-based sampling retains every run that crossed destructive, that hit a loop guard, or that failed scorecard thresholds — these are the IR-relevant runs by construction.
Time-to-replay is an operational metric; an IR contract that cannot replay within hours is treated as a regression.
Rotation-aware verification. Replays against historical signing keys are accepted; rotation does not invalidate prior audit.

Interfaces

Inputs

Policy bundles (versioned, signed)
Identity assertions (delegated user, agent workload, optional attestation report)
Adapter capability declarations (with approval_mode, capability_class, arg_constraints, endpoint_in, rotation contract)
Sandbox profiles (signed)
Run Context (tenant_id, role, claims, budgets)

Outputs

Allow / deny verdicts with reasons and remediation hints
Approval-gate prompts with frozen evidence
Audit records bound to trace_id and signed
Security events (cross-tenant denials, credential rotations, sandbox violations, attestation failures, A2A cycle detections)
Replay datasets

Failure modes

Policy drift across environments.
Stale credentials at the Tool Gateway after a rotation.
Revoked agent registration still discoverable by a stale A2A card or adapter cache.
Child agent claim broader than parent claim or manifest ceiling.
Approval gates bypassed by a Planner that skips a checkpoint (mitigated by Critic re-verify).
Memory write that violates classification (mitigated by consent + classification check at candidate stage).
Audit gap when a custom adapter forgets to propagate W3C trace headers (caught by trace-coverage assertion).
Cached read-only alias hiding a permission change after a policy update.
Indirect prompt injection from a recently-promoted memory entry that bypassed candidate-stage classification.
Egress allow-list expressed as a tag/CNAME that resolves outside the intended origin (mitigated by DNS pinning).
A2A receiver that accepts inbound delegation without re-attesting workload identity.
Hash-chain truncation from a partial outage masquerading as a legitimate gap.

Operational concerns

Policy version pinning per environment; promotion is a deliberate step.
Secrets rotation cadence at the Tool Gateway, signed and registry-recorded.
Workload identity rotation cadence; revoked-keys window for replay.
Agent manifest lifecycle review and revoked-agent discovery sweeps.
Trusted-previous-key deprecation schedule after signing-key rotation.
Sampling stratified by risk tier; tail-based sampling forced for destructive runs.
Trace retention bands by data classification.
Quarterly IR drill that exercises replay end-to-end against a real trace_id.
DNS / TLS pinning maintenance for endpoint_in[] capabilities.
SBOM and adapter-attestation review on every adapter version bump.
Sandbox profile re-verification on kernel upgrade.

Evaluation metrics

Policy compliance rate (target: 100% on guardrails).
Approval-gate honored rate (target: 100%).
Cross-tenant denial rate (target: zero in steady state; non-zero pages on-call).
Audit coverage (fraction of runs with full trace + manifests + decision record).
Replay determinism rate (DecisionRecord byte-match against pinned snapshot).
Time to replay for a given trace_id.
Mean time to detect / respond on Trust-plane events.
Attack Success Rate on the per-intent adversarial suite (target: zero successful attacks on gated release suites; see the trusted-cloud reference).
Redaction failure rate (target: 0% on CONFIDENTIAL / RESTRICTED classes).

Example

A condensed Trust-plane receipt on a ToolResultEnvelope:

{
  "tool_call_id": "tc_118",
  "run_id": "req_9f3a12",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "capability_id": "adp_payments.issue_refund",
  "tenant_id": "tenant_acme_prod",
  "status": "completed",
  "citations": ["policy:POLICY_RETURNS_V1#R_HIGH_VALUE_REQUIRES_APPROVAL"],
  "mutations": [{ "mutation_ref": "tool:adp_payments.issue_refund:tc_118" }],
  "policy_decision_id": "pol_9901",
  "metadata": {
    "approval_mode_effective": "destructive",
    "approver": "user_finance_lead_77",
    "approval_evidence_snapshot_hash": "sha256:b2a1...",
    "tool_transcript_id": "tool_tx_118",
    "redaction_applied": false,
    "chain_prev_hash": "sha256:7c4a...",
    "signed_by": "kid_runtime_2026Q2"
  },
  "latency_ms": 242
}

Common misconceptions

Security is not a single perimeter. It is enforced at every plane boundary.
The model is not the security boundary. Policy outside agent code is.
Audit is not logging. Audit is structured, signed, hash-chained, replayable, and tied to a DecisionRecord.
Sandbox is not a backup plan. It is the default for any capability that runs untrusted code.
Prompt injection is not solved by better prompts. The defense is moving authority out of the prompt — tool surfacing, deterministic policy, arg constraints.
A2A is not a way to launder authority. Each hop re-attests, re-evaluates, and re-records.
Compliance is a byproduct. The runtime produces audit and replay material as part of normal operation; compliance reports are derived, not authored.