Intent / Risk Classifier

Decision-plane component that resolves intent + risk_class for the Compiler and Critic.

Reference DesignLast reviewed: 2026-05-04 Edit on GitHub

At a glance

Decision planeread_onlylocal_writenetworkdelegateddestructiveBounded execution loop

Resolves canonical intent + risk class so the Compiler and Critic can size budgets, sampling, and gates correctly.

Inputs

RunContext.intent (caller-declared, validated by the classifier)
invokeAgent.input.message, channel, locale
invokeAgent.input.context (any structured signals already present)
Intent catalog snapshot

Outputs

Canonical intent_id from the Intent-Task Catalog
risk_class ∈ read_only / local_write / network / delegated / destructive
confidence and alternatives[] for the Critic to consider

Canonical types

IntentId
RiskClass
ClassifierVerdict

Reference Architecture

The Intent / Risk Classifier turns a raw user message (and the Run Context) into a canonical intent name and a risk_class that the Intent-Task Catalog can route on.

Definition

A small, fast classifier (typically a distilled LLM or a deterministic ruleset for known channels) that emits an intent ∈ catalog plus a risk_class ∈ approval-mode tiers. Outputs are injected into the Run Context before policy resolution and tool surfacing.

Why it exists

Without a single classification step, every component re-derives intent from prompts and they drift apart. The classifier centralizes the mapping so the Compiler, Planner, and Critic agree on what the request actually is.

Inputs

RunContext.intent (if the caller declared one — preferred; classifier validates)
invokeAgent.input.message, channel, locale
invokeAgent.input.context (any structured signals already present)
Intent catalog snapshot

Outputs

Canonical intent_id from the Intent-Task Catalog
risk_class ∈ read_only / local_write / network / delegated / destructive
confidence and alternatives[] for the Critic to consider

How it works

If RunContext.intent is supplied and matches the catalog, validate and return it (preferred path).
Otherwise, run the classifier against the message + channel + locale; produce top-k candidates with confidence.
Apply tiebreakers: highest specificity, then most recent template version.
Cross-check the resulting intent’s risk_class against RunContext.safety_mode; refuse if intent risk exceeds safety mode without explicit policy permission.

Failure modes

Classifier confidence below threshold — the Critic surfaces a clarifying question instead of guessing.
Intent resolves to a deprecated entry — refuse and emit a typed error.
Risk-class conflict between intent and safety_mode — refuse before compilation.

Operational concerns

Classifier model pinned per environment; upgrades go through release-gate evaluation.
p50 / p99 latency budget folded into the Planner timeout.
Per-tenant classifier quotas.
Drift monitoring against golden classification sets.

Evaluation metrics

Top-1 precision/recall on golden intents.
Calibration of confidence buckets.
Refusal rate (clarifying-question routes).
Risk-class drift detection.