Security and Compliance
Trust-plane controls — policy outside agent code, identity propagation, approval-mode tiers, sandboxing, attestation, and OTEL-bound audit.
Security in ContextOS is a runtime primitive, not a perimeter add-on. It is implemented by the Trust plane, which sits over every other plane and decides what is allowed before any side effect can occur.
Definition
A coordinated set of: deterministic policy enforcement at every plane boundary; user-delegated and agent-workload identity propagation through every Run Context; the approval-mode tier taxonomy bound to every adapter capability; cryptographic provenance (signed artifacts, hash-chained audit, optional attestation); OTEL-tied audit; and replayable evidence. No control depends on model self-policing.
Why it exists
Agents act on production systems. Prompts leak, tools misbehave, models drift, third parties go rogue. Security has to be enforced where actions actually happen — at the Tool Gateway, at the policy boundary, at the memory promotion gate, at the sandbox kernel — and recorded in a way that survives audit and incident response.
How it works
- Compile-time controls — the Context Pack Compiler emits
runtime_controls(must_refuse,must_escalate,approval_gates_active,redaction_rules_active) baked into the prompt manifests. - Plan-time controls — the Critic verifies plans against tool allow-lists, evidence requirements, and approval-mode declarations.
- Execute-time controls — the Tool Gateway brokers identity, validates schemas, enforces approval modes, and emits trace + audit envelopes.
- Memory-write controls — the memory promotion pipeline gates writes by class, contradiction check, and consent.
- Audit — every applied policy decision, gate verdict, tool transcript, and memory promotion is recorded against the
trace_idof the Run Context and is hash-chained for tamper-evidence.
Threat model
ContextOS treats the agent system as multi-principal under partial trust, following the framing in Trusted AI Agents in the Cloud:
- The agent provider ships agent code and policies.
- The model provider ships base and tuned models.
- The tool ecosystem (MCP servers, OpenAPI services, A2A peers) defines capabilities.
- The cloud / runtime hosts execution.
- The enterprise / user delegates authority and demands proof.
No single principal is fully trusted. Compromise of the model, of any single tool, or of any single tenant must not compromise the whole system. The architecture’s job is to make every action mediated, evidenced, and attributable.
Attack classes the architecture defends against
| Class | Concrete examples | Primary defense |
|---|---|---|
| Prompt / context injection | indirect injection from KG documents, tool outputs, memory recall | redaction at compile; deterministic policy at execute; tool surface narrowed to capability allow-list |
| Excessive agency | model invokes tools outside intent | capability_class + intent shape verified by the Critic; act denied for read_only intents |
| Credential exfiltration | tool argument carries a secret to a third-party domain | egress allow-list; STS-style per-call credential exchange; redaction of secrets in compiled context |
| Tool argument tampering | model emits an out-of-bounds amount, recipient, or scope | arg_constraints on the capability registration; Critic re-verify; param-level policy rules |
| Cross-tenant traversal | KG read or memory recall reaches another tenant’s data | storage-level tenant_id scoping; cross-tenant denials emitted as security events |
| Repeated / amplified calls | model loops a destructive action | per-(tenant, capability) rate budgets; idempotency keys; numCalls policy constraint |
| Memory poisoning | adversarial input promoted to long-term memory | promotion-aware memory pipeline with consent + classification + contradiction checks |
| Supply chain | unsigned tool, mutable image, unpinned model | Context Pack and adapter signing; content-hash image pinning; pinned model + pack version per environment |
| Sandbox escape | code execution capability breaks isolation | default-deny sandbox profile; signed images; per-call attestation for e2b / firecracker |
| Audit tampering | post-hoc edit of decision history | append-only, hash-chained Decision Records bound to W3C trace_id |
OWASP LLM Top 10 mapping
| ID | Risk | ContextOS control |
|---|---|---|
| LLM01 | Prompt injection | redaction-at-compile + tool allow-list + Critic re-verify (see Prompt injection defense) |
| LLM02 | Insecure output handling | typed tool envelopes; outputs flow through promotion pipeline before reaching memory or context |
| LLM03 | Training data poisoning | spec scope: not in runtime threat model; managed by upstream model provider controls |
| LLM04 | Model DoS | RunBudget caps (total_tokens, wall_clock_ms, max_tool_calls); rate budgets per (tenant, capability) |
| LLM05 | Supply chain | signed Context Packs and adapters; pinned model + pack versions; image content hashes for sandbox kernels |
| LLM06 | Sensitive info disclosure | redaction rules in runtime_controls; data classification on every artifact |
| LLM07 | Insecure plugin / tool design | typed capability schemas; arg_constraints; capability_class enforcement |
| LLM08 | Excessive agency | approval-mode tiers; intent-shape verification; default-deny on actions |
| LLM09 | Overreliance | Decision Record exposes evidence_refs and confidence; evaluators score safety per run |
| LLM10 | Model theft | spec scope: out of scope at runtime; managed at the model-serving boundary |
The architecture also references MITRE ATLAS tactics for AI-system threat modeling; per-intent adversarial suites are tracked via the Attack Success Rate metric described in the trusted-cloud reference.
Defense in depth
Every plane has its own deterministic check. Compromise of one layer must not compromise the whole.
| Layer | Control point | What is checked |
|---|---|---|
| Compile | Context Pack Compiler | resolved tools = Registry ∩ Permissions − Prohibitions; runtime_controls baked into manifests |
| Plan | Critic | tools used ⊆ surfaced; intent shape (read_only excludes act); evidence requirements |
| Execute | Tool Gateway | identity, schema, approval mode, rate budget, idempotency, egress allow-list |
| Sandbox | Sandbox kernel | profile invariants enforced by the kernel, not by the adapter |
| Memory write | Promotion gate | classification, consent, contradiction |
| Audit | OTEL + Decision Record | hash-chained envelope linked by trace_id |
| Replay | Replay harness | DecisionRecord must be reproducible from pinned snapshot |
Boundary enforcement model
Every plane boundary is a deterministic check, not a model check:
- Context → Decision —
CompiledContextonly contains capabilities the Compiler resolved throughRegistry ∩ Permissions − Prohibitionswith approval-mode filters applied. - Decision -> Action — the Critic re-verifies the Plan; the Tool Gateway re-validates each
ToolCallEnvelopeat execution time. - Action → External — egress credentials are exchanged per call (STS-style); ingress credentials are verified for both the delegated user and the agent workload identity.
- Memory write — capture is immutable; promotion checks consent, classification, and contradiction.
- External → Audit — every accepted or denied effect emits a signed envelope; the chain breaks if any past entry is altered.
Identity propagation
Two identities flow through every Run Context, and the agent side is split into registry identity, workload identity, and a short-lived signed claim:
- Delegated user identity —
run_context.user.delegationcarries an OAuth-style token reference, scopes, and subject claims. Used bydelegated-mode tools. - Agent registry identity —
run_context.agent.agent_urnis a versioned subject such asagent:contextos/support-refund@1.2.0. Used for ownership, lifecycle, manifest-scoped permissions, and audit. - Agent workload identity —
run_context.agent.workload_identityis a SPIFFE-style URI (spiffe://contextos/agents/<role>). Used for service-to-service auth and for non-delegated calls. - Agent identity claim —
run_context.agent.identity_claimbinds the registry subject, workload, tenant, run ID, principal chain, scopes, expiry, and key ID for this run or A2A hop.
Both identities are signed metadata on every ToolCallEnvelope and on every audit record.
Token chaining and minimum privilege
- Delegation tokens are exchanged at the Gateway for per-call scoped credentials (STS-style); long-lived bearer tokens never reach an adapter.
- Scopes on the exchanged credential are the intersection of
delegation.scopes,agent.identity_claim.scopes,permission.scopes, and the capability’s declared scope set — never the union. - A2A hops do not forward the inbound delegation token or blindly reuse the parent claim. The receiving agent re-attests its workload identity, receives a fresh narrower child claim, and re-evaluates its own policy bundle. See A2A trust boundary.
Workload identity rotation and revocation
- Agent workload identities rotate on a fixed cadence; rotation is a signed change recorded against the agent registration.
- A workload identity that fails attestation is denied at the Gateway before any policy is evaluated; this is logged as
security_event:workload_identity_attestation_failed. - Agent signing keys rotate under change control. Historical keys may remain trusted only for replay until a declared deprecation window expires.
- A revoked agent registration is removed from active discovery and denied at claim minting, A2A dispatch, and Tool Gateway execution.
- Claim hashes, key IDs, agent URNs, and principal-chain summaries are logged; raw bearer tokens and raw signing secrets are never logged.
Approval-mode tiers (the policy contract)
The five canonical modes (read_only / local_write / network / delegated / destructive) are defined in Governance. Security implications:
read_only— no credential exchange beyond ingress; full audit; no approval gate.local_write— idempotency required; tenant-scoped credentials; full audit.network— egress allow-list; per-(tenant, capability)rate budget.delegated— valid user delegation token required; scopes minimum-privilege; per-call evidence captured.destructive— named approver; frozen evidence snapshot at gate; post-execution audit + reconciliation.
Policy may select a lower effective approval mode for a specific bounded request when the capability’s declared maximum allows it, but it cannot exceed that maximum.
Prompt injection defense
Prompts in ContextOS are treated as untrusted by default. The defense is structural, not heuristic:
- Compile-time isolation. The Context Pack Compiler emits typed buckets (policy / tool / evidence / memory / business / session). Untrusted strings (KG documents, tool outputs, memory recall) flow through
evidenceandmemorybuckets that the Critic understands as advisory, not authoritative. - Tool surface narrowing. The model only sees the schemas of tools the Compiler resolved through
Registry ∩ Permissions − Prohibitions. An injection that names a tool not in the surfaced set has no effect — the tool simply does not exist in the model’s view. - Deterministic policy outside the model. Every
ToolCallEnvelopere-evaluates policy at the Gateway. Even if the model is fully manipulated by an injected instruction, the call is denied unless deterministic policy permits it. - Argument constraints, not just tool gates.
arg_constraints(regex, enum, min/max, idempotency key required) bound the values the model can submit. An injection that says “send to attacker@…” against a capability withto_in: ["@…"]returnsdenied. - Redaction rules.
runtime_controls.redaction_rules_activestrip declared sensitive substrings before any prompt is sent to the model and on everyToolResultEnvelopeingestion. - Critic re-verify before execute — the plan’s tools, arg shapes, and intent class are re-checked deterministically.
- Default deny on every plane boundary. An unrecognized capability, scope, or destination denies; it does not fall through.
Indirect injection from third-party content is the dominant LLM-era attack class. The defense is not “smarter prompts” — it is moving authority out of the prompt entirely.
Tenant isolation
- Every read in the Knowledge Graph and every recall in Memory is scoped by
tenant_idat the storage layer, not by application code. - The Tool Gateway rejects any
ToolCallEnvelopewhose argument-derived tenant scope does not matchrun_context.tenant_id. - Cross-tenant traversal denials emit a
security_event:cross_tenant_deniedwith the offending capability, evidence_refs, and the offending value; on-call is paged on a non-zero rate. - Adapter credentials are scoped per tenant; the Tool Gateway exchanges them per call rather than holding long-lived bearer tokens.
- Caches (including the cached read-only aliases) key on
tenant_idfirst; a cache miss across tenants is structurally impossible.
Cryptographic primitives
The Trust plane treats provenance as a typed contract, not a deployment detail.
| Artifact | Signed by | Verified at |
|---|---|---|
| Context Pack | pack owner role | Compiler at load; runtime refuses unsigned or unpinned refs |
| Adapter / capability registration | adapter owner | Gateway at registration; periodic re-verification |
| Policy bundle | governance role | Policy Engine at load; bundle.signed_by recorded with every policy_decision_id |
| Sandbox profile | security_lead | Sandbox kernel before container start |
| Decision Record | runtime signing key | replay harness; auditor on export |
| Tool transcript | runtime signing key | included in audit.tool_transcript_id |
Hash-chained audit
Decision Records, tool transcripts, and policy_decisions[] form an append-only, hash-chained log keyed by trace_id. Each entry’s hash includes the previous entry’s hash; tampering is detectable on replay. Hash-chain construction follows the pattern in the trusted-cloud reference.
Key management
- Signing keys are stored in a KMS; the runtime never holds a raw private key.
- Rotation is a signed change recorded against the registry; verifiers accept any non-revoked key within the rotation window.
- Revoked keys remain queryable for replay of historical runs against the keys valid at that time.
Secrets and credential exchange
The Tool Gateway is the only component that holds long-lived credentials, and only as references.
- Per-call exchange. Every
ToolCallEnvelopetriggers an STS-style exchange producing a credential scoped to(tenant, capability, run_context.user.subject, scopes intersect permission.scopes, max-lifetime <= wall-clock budget). - Vault references, not values.
auth.delegated_user_token_refandauth.agent_token_refare opaque vault handles. Adapters never see raw tokens. - No secret transit through the model. Secrets that must reach a tool (e.g., per-tenant API keys) bypass the Compiler — they are resolved at the Gateway from the
(tenant, capability)mapping. - Permitted env vars (sandbox) are an explicit allow-list. The closed list is defined per profile and signed.
- Rotation. Rotation cadence per credential class is declared at the adapter; the registry refuses adapters without a declared rotation contract.
Egress controls
network, delegated, and destructive capabilities can produce outbound traffic. The Gateway constrains every dimension:
- Destination allow-list. Each capability declares
endpoint_in[]. Calls to any other host are denied at the Gateway, not at the network. - DNS pinning. Hostnames in
endpoint_inresolve through the runtime’s DNS layer, which refuses CNAMEs to non-allow-listed origins. - TLS pinning (optional) per capability via
pinned_sha256[]of acceptable certificate fingerprints; rotation is a signed registry change. - Rate budget. Every
(tenant, capability)declares a per-window rate; the Gateway returnsdeniedwitherror_code: rate_exceededrather than queueing. - Egress audit. Every outbound request emits a span carrying
trace_id,policy_decision_id, capability id, and destination — sufficient for traffic-side reconciliation against the Decision Record.
Data classification and redaction
Every artifact crossing a plane boundary carries a classification:
| Class | Examples | Default handling |
|---|---|---|
PUBLIC | non-tenant docs, marketing copy | retained per default policy |
INTERNAL | aggregate metrics, generic intents | retained per tenant policy |
CONFIDENTIAL | customer PII, financial detail | redaction-on-emit; promotion-gated for memory |
RESTRICTED | secrets, IDV documents, regulated payloads | never enters the prompt; never promoted to memory |
Redaction rules live in runtime_controls.redaction_rules_active and are evaluated at the Compiler (before prompt construction) and at the Gateway (on ToolResultEnvelope ingestion). A redaction failure is a release-blocking Safety regression.
Data residency and retention bands are declared per classification and tenant; the audit trail records the band against every persisted artifact.
Memory-write controls
Memory is a primary security boundary because it is the channel by which untrusted content can persist into future runs.
- Capture is immutable. Raw evidence is stored append-only with classification, source, and
trace_id. - Promotion is explicit. A candidate is promoted only when it passes the consent-record check, the contradiction check against existing high-confidence facts, and the classification check (no
RESTRICTEDcontent is promotable). - Consent ledger. Every promotion records the consent basis: explicit user opt-in, contractual basis, or operator-authored policy. A promotion without a consent reference is rejected.
- Tenant scoping. Memory recall scopes by
tenant_idandsubjectclaims; a cross-subject recall is asecurity_event:memory_cross_subject_denied. - Decay and erasure. Subject-level erasure cascades to embeddings, evidence references, and any derived strategy rules; a deletion request is itself recorded against
trace_idfor replay-aware audit.
Sandbox controls
For tool capabilities that execute untrusted code (e.g., user-provided scripts in a research workflow), the runtime exposes a Sandbox layer as a first-class component, not as a tool feature flag.
Sandbox profiles
A sandbox profile is a typed, signed contract declaring what the sandbox can and cannot do.
{
"sandbox_profile_id": "sbx_research_default",
"version": "1.0.0",
"kernel": "docker",
"image_pin": "sha256:...",
"filesystem": { "host_mount": [], "tmpfs_mb": 256 },
"network": { "ingress": "deny", "egress": "deny", "allowlist": [] },
"compute": { "cpu_cores": 1.0, "memory_mb": 512, "wall_clock_ms": 30000 },
"stdio": { "stdin_max_bytes": 65536, "stdout_max_bytes": 1048576, "stderr_max_bytes": 1048576 },
"secrets": { "permitted_env_vars": [] },
"result_classification": "INTERNAL"
}Sandbox kernels
| Kernel | When |
|---|---|
docker | local-first deployments; image is pinned by content hash |
e2b | hosted ephemeral sandbox; per-call attestation |
firecracker | regulated workloads; microvm isolation |
wasm | pure-compute thinking helpers; no syscalls |
Invariants
- Default deny everything. Capabilities that need code execution must reference a registered
sandbox_profile_id; profiles cannot be inlined. - No host filesystem by default. Any
host_mountrequires a separate operator approval recorded as a signed change to the profile. - No inbound network. Egress only via explicit
allowlist. - Hard wall-clock and memory caps enforced by the kernel, not the adapter.
- Output is typed. stdout/stderr are captured as artifacts with classification; nothing flows into memory or the model context without going through the promotion pipeline.
- Image pinning is mandatory. Tag-based image references (
latest,stable) are refused. - Per-call attestation for
e2b/firecrackerkernels; attestation hash recorded against the Decision Record. - No secrets transit.
permitted_env_varsis the closed allow-list; the sandbox never sees the broader credential store.
Lifecycle
- Profile authored under change control with required reviewer (
security_lead). - Promoted into the pack registry alongside Context Packs.
- Bound to capabilities via
sandbox_profile_idon the adapter declaration. - Retired profiles remain queryable for replay; cannot be re-promoted without a new version.
A2A trust boundary
A2A (agent-to-agent) hops are the highest-leverage place for trust to leak. The contract:
- No delegation forwarding. The receiving agent does not inherit the caller’s delegation token. It re-attests its own workload identity and re-evaluates its own policy bundle against the original
run_context.user. - Narrow child claims. A parent A2A call may authorize a child only through a fresh signed claim whose scopes are a subset of the parent scopes and the receiver manifest’s identity-scope ceiling.
- Typed message envelopes with explicit
correlation_idandparent_decision_idso the receiving Decision Record links to the parent for replay. - Approval-mode propagation. A
delegatedordestructiveparent call that triggers an A2A hop must be re-acknowledged by the receiving agent; the receiver may select a lower effective mode only when its policy and the capability’s declared maximum allow it. - Per-hop audit. Every A2A message emits a span with the
agent.agent_urn,agent.workload_identity, claim hash, and principal chain of both sender and receiver; the chain is verifiable on replay. - Loop guard. A2A graph depth and cycle detection are enforced by the Orchestrator; cycles are rejected with
error_code: a2a_cycle_detected.
Supply chain
The runtime treats every artifact as supply-chain-relevant:
- Context Packs. Pack registry holds
pack_id@semverwith content hash, signing key id, and an SBOM-style dependency tree (referenced packs, decision catalog versions, prompt fragment refs). - Adapters. Each adapter declares image hash (for hosted), source repo + commit (for in-repo), and the auth contract. Tag-based references are refused.
- Models. Model id + content hash recorded against every Decision Record’s
lineage; mismatch on replay surfaces as a non-determinism event. - Sandbox images pinned by content hash; tag references like
latest/stableare rejected at profile validation. - Build provenance. Packs and adapters are produced through a CI pipeline that emits provenance attestations (Sigstore-compatible); the registry refuses artifacts whose attestation does not verify against the declared signer.
Attestation
For deployments that require cryptographic proof of “what executed what against which inputs,” ContextOS aligns with the differential-attestation pattern in the Trusted AI Agents in the Cloud reference. The Decision Record can carry an attestation_ref that binds the platform measurement, the artifact manifest (pack, model, policies, tools), the input envelope, the policy decisions, and the result. The replay harness consumes the manifest to reproduce the verdict; an external auditor can verify the record without re-executing tools.
This is optional infrastructure, not a baseline requirement: deployments that do not need attested execution still get hash-chained audit and signed Decision Records.
Boundary controls (this repo)
This repo is the public spec surface. Even there, controls are explicit:
- Centralized security headers in
next.config.ts:Content-Security-Policy(default-src 'self',script-srcincludes'wasm-unsafe-eval'for Pagefind WebAssembly search without enabling production'unsafe-eval',frame-ancestors 'none',object-src 'none',upgrade-insecure-requests,block-all-mixed-content),Strict-Transport-Security(max-age=63072000; includeSubDomains; preload),X-Frame-Options: DENY,Referrer-Policy: strict-origin-when-cross-origin,Permissions-Policy: camera=(), microphone=(), geolocation=(),X-Content-Type-Options: nosniff,Cross-Origin-Opener-Policy: same-origin,Cross-Origin-Resource-Policy: same-originfor pages withcross-originon static and generated social preview image routes,X-Permitted-Cross-Domain-Policies: none. - Optional Basic Auth at the edge via
src/proxy.tsfor private review deployments (gated byBASIC_AUTH_ENABLED, bypassed for local development, loopback hosts, and canonical public domains). - CI enforces
npm audit --audit-level=high,gitleaksfor credential scanning (with# gitleaks:allowfor documented examples), and Semgrepp/cifor static analysis. - Dependabot updates skip Vercel deployments via
vercel.json’signoreCommandto keep the deploy queue clean.
Compliance mapping
ContextOS aligns with widely-adopted control frameworks. The mapping is intentionally control-to-primitive, not control-to-document.
| Framework | ContextOS primitive |
|---|---|
| NIST AI RMF Govern / Map / Measure / Manage | Trust-plane bundles / Decision Catalog / evaluators / audit + replay |
| ISO/IEC 42001 AI management system | policy lifecycle + evaluator runs + improvement-proposal change control |
| ISO/IEC 27001 / SOC 2 | tenant isolation, identity propagation, hash-chained audit, KMS key rotation, vendor (adapter) management |
| EU AI Act | see Governance regulatory timeline; transparency obligations encoded as runtime-enforced rules |
| GDPR / India DPDPA | consent ledger on memory promotion; subject erasure cascades; data classification + residency on every artifact |
| OWASP LLM Top 10 | see the mapping table |
| MITRE ATLAS | adversarial suites tracked per intent; Attack Success Rate metric |
Vulnerability disclosure
Security issues in this repository should be reported per SECURITY.md: email piyush@piyush.me with a description, impact, reproduction steps, and any PoC. Public disclosure is requested only after coordinated investigation. Only the latest release is supported for security updates.
Auditability
Every Decision Record carries enough to reconstruct the run:
trace_id,run_id,session_id,tenant_id, pinned pack version, snapshot version.agent_identity.subject,agent_identity.claim_hash,principal_chain, and identity key ID.policy_decisions[]withpolicy_decision_idand matchedrule_ids[].tool_transcripts[]withpolicy_decision_id,evidence_refs, and audit metadata.approvals[]with approver identity, frozen evidence snapshot hash, effective approval mode.controls_active[](must_refuse,must_escalate,approval_gates_active,redaction_rules_active).lineage— pack version, model id + hash, snapshot id, policy bundle versions.
Audit envelopes are append-only, hash-chained per trace_id, and signed by the runtime signing key. Tampering is detectable on replay because the chain breaks; replay against a tampered chain returns replay_status: tamper_detected rather than a recomputed verdict.
This is the substrate for replay.
Incident response and replay
Replay is the primary IR primitive. Given a trace_id:
- Resolve the pinned
pack_version, snapshot, and policy bundle versions fromlineage. - Recover the recorded
invokeAgentenvelope and tool transcripts. - Re-run the canonical loop offline against the recorded transcripts; verify the produced DecisionRecord byte-matches the persisted one.
- Re-score against the current evaluators; the delta isolates whether behavior has drifted in the runtime, the data, or the evaluator itself.
Operational commitments:
- Quarterly drill end-to-end against a chosen production
trace_id. - Tail-based sampling retains every run that crossed
destructive, that hit a loop guard, or that failed scorecard thresholds — these are the IR-relevant runs by construction. - Time-to-replay is an operational metric; an IR contract that cannot replay within hours is treated as a regression.
- Rotation-aware verification. Replays against historical signing keys are accepted; rotation does not invalidate prior audit.
Interfaces
Inputs
- Policy bundles (versioned, signed)
- Identity assertions (delegated user, agent workload, optional attestation report)
- Adapter capability declarations (with
approval_mode,capability_class,arg_constraints,endpoint_in, rotation contract) - Sandbox profiles (signed)
- Run Context (
tenant_id, role, claims, budgets)
Outputs
- Allow / deny verdicts with reasons and remediation hints
- Approval-gate prompts with frozen evidence
- Audit records bound to
trace_idand signed - Security events (cross-tenant denials, credential rotations, sandbox violations, attestation failures, A2A cycle detections)
- Replay datasets
Failure modes
- Policy drift across environments.
- Stale credentials at the Tool Gateway after a rotation.
- Revoked agent registration still discoverable by a stale A2A card or adapter cache.
- Child agent claim broader than parent claim or manifest ceiling.
- Approval gates bypassed by a Planner that skips a checkpoint (mitigated by Critic re-verify).
- Memory write that violates classification (mitigated by consent + classification check at candidate stage).
- Audit gap when a custom adapter forgets to propagate W3C trace headers (caught by trace-coverage assertion).
- Cached read-only alias hiding a permission change after a policy update.
- Indirect prompt injection from a recently-promoted memory entry that bypassed candidate-stage classification.
- Egress allow-list expressed as a tag/CNAME that resolves outside the intended origin (mitigated by DNS pinning).
- A2A receiver that accepts inbound delegation without re-attesting workload identity.
- Hash-chain truncation from a partial outage masquerading as a legitimate gap.
Operational concerns
- Policy version pinning per environment; promotion is a deliberate step.
- Secrets rotation cadence at the Tool Gateway, signed and registry-recorded.
- Workload identity rotation cadence; revoked-keys window for replay.
- Agent manifest lifecycle review and revoked-agent discovery sweeps.
- Trusted-previous-key deprecation schedule after signing-key rotation.
- Sampling stratified by risk tier; tail-based sampling forced for
destructiveruns. - Trace retention bands by data classification.
- Quarterly IR drill that exercises replay end-to-end against a real
trace_id. - DNS / TLS pinning maintenance for
endpoint_in[]capabilities. - SBOM and adapter-attestation review on every adapter version bump.
- Sandbox profile re-verification on kernel upgrade.
Evaluation metrics
- Policy compliance rate (target: 100% on guardrails).
- Approval-gate honored rate (target: 100%).
- Cross-tenant denial rate (target: zero in steady state; non-zero pages on-call).
- Audit coverage (fraction of runs with full trace + manifests + decision record).
- Replay determinism rate (DecisionRecord byte-match against pinned snapshot).
- Time to replay for a given
trace_id. - Mean time to detect / respond on Trust-plane events.
- Attack Success Rate on the per-intent adversarial suite (target: zero successful attacks on gated release suites; see the trusted-cloud reference).
- Redaction failure rate (target: 0% on
CONFIDENTIAL/RESTRICTEDclasses).
Example
A condensed Trust-plane receipt on a ToolResultEnvelope:
{
"tool_call_id": "tc_118",
"run_id": "req_9f3a12",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"capability_id": "adp_payments.issue_refund",
"tenant_id": "tenant_acme_prod",
"status": "completed",
"citations": ["policy:POLICY_RETURNS_V1#R_HIGH_VALUE_REQUIRES_APPROVAL"],
"mutations": [{ "mutation_ref": "tool:adp_payments.issue_refund:tc_118" }],
"policy_decision_id": "pol_9901",
"metadata": {
"approval_mode_effective": "destructive",
"approver": "user_finance_lead_77",
"approval_evidence_snapshot_hash": "sha256:b2a1...",
"tool_transcript_id": "tool_tx_118",
"redaction_applied": false,
"chain_prev_hash": "sha256:7c4a...",
"signed_by": "kid_runtime_2026Q2"
},
"latency_ms": 242
}Common misconceptions
- Security is not a single perimeter. It is enforced at every plane boundary.
- The model is not the security boundary. Policy outside agent code is.
- Audit is not logging. Audit is structured, signed, hash-chained, replayable, and tied to a
DecisionRecord. - Sandbox is not a backup plan. It is the default for any capability that runs untrusted code.
- Prompt injection is not solved by better prompts. The defense is moving authority out of the prompt — tool surfacing, deterministic policy, arg constraints.
- A2A is not a way to launder authority. Each hop re-attests, re-evaluates, and re-records.
- Compliance is a byproduct. The runtime produces audit and replay material as part of normal operation; compliance reports are derived, not authored.