Governance

Policy outside agent code, multidimensional action risk, compatibility approval modes, and the audit contract.

Foundational SpecLast reviewed: 2026-07-11 Edit on GitHub

At a glance

Trust planeread_onlylocal_writenetworkdelegateddestructiveControl over the other four

Policy outside agent code — deterministic enforcement at compile, plan, and execute, bound to approval-mode tiers.

Inputs

Versioned policy bundles (JsonLogic rules)
Guardrails: must_refuse / must_escalate / redaction_rules
Run Context + claims (user, agent, tenant, role)
Tool capability declarations bound to an approval mode

Outputs

Allow/deny decisions with structured obligations
Approval-gate verdicts with frozen evidence snapshot
Audit records bound to the DecisionRecord
Effective approval mode + approver identity per execution

Lifecycle

intercept
evaluate
decide
gate
audit

Canonical types

PolicyBundle
ApprovalGate
ApprovalMode
AuditRecord

Governance is the Trust-plane capability that decides what an agent can do, when, on whose authority, and what evidence the action must produce. It is enforced at a deterministic boundary — never delegated to model self-policing.

Definition

A coordinated set of: policy bundles (versioned JsonLogic rules scoped by intent and risk), guardrails (must_refuse, must_escalate, redaction_rules), native action-risk vectors, compatibility approval modes, approval gates (human checkpoints for high-risk steps), and an audit contract that ties every applied rule and gate decision back to a Decision Record.

Why it exists

Enterprise systems require enforceable controls. Models drift, prompts leak, tools misbehave. Governance makes the runtime safe under change by moving the decision boundary outside the agent — no model can talk its way around a deterministic policy check, an approval-mode tier, or an evidence requirement.

How it works

Policy bundles — versioned rules scoped by intent, risk, and channel.
Guardrails — must-refuse and must-escalate rules that override all other logic.
Action-risk vectors — new capabilities declare effect, authority, reversibility, interaction, and data scope independently.
Approval-mode compatibility — existing v1 capabilities retain a coarse mode while they migrate to the risk vector.
Approval gates — human authorization for high-risk steps with frozen evidence snapshot.
Enforcement — policies evaluated at compile, plan, and execute stages.
Audit — every applied rule, gate decision, and policy outcome is recorded against the run.

Action-risk vector (2026)

ApprovalMode is retained for v1 interoperability, but it is not a complete risk model and must not be treated as a universal total order. It combines unrelated properties: network describes an effect boundary, delegated describes authority, and destructive describes reversibility. New policy evaluates the native ActionRisk dimensions:

Dimension	Values	Question answered
`effect`	`none`, `local_state`, `external_state`, `physical_world`	Where can state change?
`authority`	`agent`, `service`, `user_delegated`, `human_approved`	Whose authority permits the action?
`reversibility`	`read_only`, `reversible`, `compensatable`, `irreversible`	Can the effect be undone?
`interaction`	`api`, `browser`, `computer`, `agent_to_agent`, `human_handoff`	Which execution boundary is used?
`data_scope`	`PUBLIC`, `INTERNAL`, `CONFIDENTIAL`, `RESTRICTED`	What is the most sensitive data exposed?

Each dimension is a policy input, not a score. A browser action does not become safe because it is reversible; a read does not become permitted merely because it has no write effect. The policy engine applies conjunctive constraints, then emits an effective vector and the legacy mode projection when a v1 consumer requires one.

Example: sending a calendar invitation through browser automation is { effect: external_state, authority: user_delegated, reversibility: compensatable, interaction: browser }. Deleting a local credential is { effect: local_state, authority: human_approved, reversibility: irreversible, interaction: computer }. The old tier ladder cannot represent that distinction safely.

Approval-mode tiers

Approval gates work better when bound to a shared vocabulary rather than to ad-hoc gate names per workflow. The five compatibility modes are retained for v1 contracts:

Mode	Examples	Default policy
`read_only`	lookups, search, retrieval	allow with audit
`local_write`	tenant-scoped writes that can be reverted in-tenant (notes, drafts, memory)	allow with idempotency key + audit
`network`	outbound calls, webhooks, third-party reads	allow with egress policy + rate budget
`delegated`	acts on behalf of a user against an external system (booking, message send, calendar write)	require valid user delegation token + per-call evidence
`destructive`	irreversible side effects (payment capture, account deletion, data export)	require named approver + frozen evidence snapshot + post-execution audit

Wiring

Tool capability declares approval_mode: destructive on the adapter contract.
Policy can select a lower effective approval mode for a bounded request when the capability’s declared maximum allows it; it cannot exceed the declared maximum or invent a mode outside the taxonomy.
The Decision Catalog records the effective mode and the approver identity per execution, enabling cross-workflow audit by risk class.
New adapters also declare action_risk; during migration, policy validates that the legacy mode is a conservative projection of the vector.

Policy outside agent code

Policy decisions never rely on model self-policing.

Boundary enforcement model

Intercept every ToolCallEnvelope before adapter execution.
Evaluate deterministic allow/deny decisions against current Run Context and claims.
Refuse revoked agent registrations, expired or invalid identity claims, tenant mismatches, and child claims broader than the parent or manifest scope ceiling.
Enforce parameter-level constraints (required fields, value limits, regex/pattern checks).
Return structured denial obligations (escalate, request_approval, collect_evidence) to the Orchestrator.

Policy language and authoring

Primary authoring surface: ContextOS policy DSL (JsonLogic-based in current runtime).
Optional accelerator: NL-to-policy compilation for draft rules.
Required safety checks for generated rules:
- permissiveness drift (rule expands allowed surface unexpectedly),
- restrictiveness drift (rule blocks critical safe paths),
- unsatisfiable conditions (no runtime context can satisfy the rule).

Policy lifecycle

Author — create or update a policy bundle with versioned rules.
Validate — lint for unreachable rules, conflicts, and missing evidence.
Approve — security/governance review for high-risk domains.
Publish — promote to an environment-specific bundle version.
Enforce — apply at compile, plan, and execute stages.
Audit — record applied rules, evidence, and gate outcomes.

Runtime checkpoints

Compile-time — policy selection + guardrail activation.
Plan-time — verify steps and required evidence/approvals.
Execution-time — just-in-time checks with latest context.

Audit expectations

Emit a policy_decision_id and matched rule_ids[] for each boundary verdict.
Persist input claims, normalized arguments, and decision rationale.
Persist agent_identity.subject, agent_identity.claim_hash, principal_chain, kid, and scope summaries for each governed action.
Tie every enforced policy outcome to the Decision Record for replay and compliance evidence.
Approval-gate decisions persist the approver identity, frozen evidence snapshot hash, and the effective approval mode.

Regulatory timeline checkpoints (EU)

For teams operating in or serving the EU market, governance controls should map to:

2025-02-02: AI literacy obligations and prohibited-practice rules become applicable.
2025-08-02: General-purpose AI (GPAI) obligations apply.
2026-08-02: transparency obligations (including Article 50) broadly apply.

Operationally: policy bundles must encode transparency, disclosure, and human-oversight controls as runtime-enforced requirements, not post-hoc documentation tasks.

Implementation mapping

Governance is implemented primarily by:

Policy Engine (rules, guardrails, approval gates)
Decision Catalog (evidence requirements)
Context Pack Compiler (runtime controls and redaction)

Implementation references

Policy bundle structure (example)

{
  "bundle_id": "POLICY_RETURNS_V4",
  "effective_from": "2026-01-01",
  "priority": 10,
  "policy_dsl": {
    "language": "jsonlogic",
    "rules": [
      {
        "rule_id": "R_REFUND_REQUIRES_IDV",
        "applies_to": { "intent": "support.refund" },
        "if": { "==": [{ "var": "request.context.identity_verified" }, true] },
        "then": { "allow": true, "requires": ["order_lookup"] },
        "rationale": "Refunds require verified identity.",
        "citations": ["policy/returns_v4#sec2.1"]
      },
      {
      "rule_id": "R_HIGH_VALUE_REQUIRES_APPROVAL",
      "applies_to": { "intent": "support.refund" },
      "if": {
        "and": [
          { "==": [{ "var": "user.role" }, "support_agent"] },
            { ">": [{ "var": "request.context.refund_amount" }, 3000] }
          ]
        },
        "then": {
          "allow": true,
          "approval_mode": "destructive",
          "requires_approval_gate": "GATE_FINANCE_APPROVAL",
          "arg_constraints": { "refund_amount": { "max": 3000, "unless_approved": true } }
        },
        "decision_binding": "decision.support.refund.execute",
        "rationale": "High-value refunds require finance approval."
      }
    ]
  },
  "prohibited_claims": ["refund_guaranteed"]
}

Approval workflow (runtime)

Gate triggered — policy returns requires_approval_gate.
Context frozen — tool args + evidence snapshot stored.
Approver notified — role-based routing (e.g., fraud, finance).
Decision recorded — approve / deny with rationale.
Resume or halt — execution continues against the frozen evidence or is blocked.

Conflict resolution

Priority wins — higher-priority bundles override lower ones.
Guardrails first — must-refuse / must-escalate are absolute.
Explicit deny beats allow — if two rules conflict, deny unless explicitly overridden by a higher-priority rule.

Interfaces

Inputs

Policy DSL bundles and invariants
Approval workflows and role definitions
Risk classifications per intent or task
Run Context (user, agent, tenant, claims)

Outputs

Allow/deny decisions with reasons
Required evidence references
Effective approval mode
Audit trails of policy application

Failure modes

Policy drift across environments.
Missing approval gates for high-risk actions.
Evidence requirements not enforced at the right checkpoint.
Overly strict policies causing deadlocks.
Auto-generated rules with unsatisfiable conditions slipping past validation.

Operational concerns

Policy version pinning per environment.
Separation of duties for policy changes.
Policy evaluation latency budgets.
Approval queue SLAs by risk tier.
Policy rollback and deprecation windows.
Regulatory control mapping to NIST AI RMF and ISO/IEC 42001 control families.

Evaluation metrics

Policy compliance rate.
Approval latency and rate by tier.
Evidence attachment success rate.
Audit gap rate (target: zero).
Permission-violation rate.

Example

A VIP-instant-refund rule that selects a lower effective approval mode within the capability’s declared maximum:

{
  "rule_id": "R_VIP_INSTANT_REFUND",
  "applies_to": { "intent": "support.refund" },
  "if": {
    "and": [
      { "==": [{ "var": "intent" }, "support.refund"] },
      { "==": [{ "var": "request.context.user.is_vip" }, true] },
      { "<=": [{ "var": "request.context.refund_amount" }, 200] }
    ]
  },
  "then": {
    "allow": true,
    "approval_mode": "delegated",
    "requires_approval_gate": null
  },
  "decision_binding": "decision.support.refund.execute",
  "rationale": "VIP members get instant refunds up to limit; effective mode is delegated for this bounded request."
}

An identity- and supplier-bound execution example:

{
  "rule_id": "R_BOOKING_CANCEL_OWNERSHIP_AND_WINDOW",
  "applies_to": { "intent": "booking.cancel" },
  "if": {
    "and": [
      { "==": [{ "var": "request.context.booking.user_id" }, { "var": "user.user_id" }] },
      { "==": [{ "var": "request.context.supplier.cancel_window_open" }, true] }
    ]
  },
  "then": {
    "allow": true,
    "approval_mode": "delegated",
    "requires": ["supplier_policy_ref", "booking_ownership_proof"]
  },
  "else": { "allow": false, "reason": "supplier_window_or_identity_failed" },
  "decision_binding": "decision.booking.cancel.eligibility"
}

Common misconceptions

Governance is not just logging. It is active enforcement at runtime with evidence-bound audit.
Approval gates are not a bottleneck when scoped by risk. Most calls are read_only or local_write and never see a gate.
Policy is not the model’s responsibility. The model proposes; the boundary decides.
Approval modes are not interchangeable with gate names. Modes are compatibility classifications; gate names are runtime artifacts selected by policy.
Approval modes are not a total risk order. Native policy evaluates the ActionRisk dimensions; the v1 mode is a compatibility projection.