Skip to content
Press / to search

Intent / Risk Classifier

Decision-plane component that resolves intent + risk_class for the Compiler and Critic.

Reference DesignLast reviewed: Edit on GitHub
At a glance
Decision planeread_onlylocal_writenetworkdelegateddestructiveBounded execution loop

Resolves canonical intent + risk class so the Compiler and Critic can size budgets, sampling, and gates correctly.

Inputs
  • RunContext.intent (caller-declared, validated by the classifier)
  • invokeAgent.input.message, channel, locale
  • invokeAgent.input.context (any structured signals already present)
  • Intent catalog snapshot
Outputs
  • Canonical intent_id from the Intent-Task Catalog
  • risk_class ∈ read_only / local_write / network / delegated / destructive
  • confidence and alternatives[] for the Critic to consider
Canonical types
  • IntentId
  • RiskClass
  • ClassifierVerdict

Reference Architecture

The Intent / Risk Classifier turns a raw user message (and the Run Context) into a canonical intent name and a risk_class that the Intent-Task Catalog can route on.

Definition

A small, fast classifier (typically a distilled LLM or a deterministic ruleset for known channels) that emits an intent ∈ catalog plus a risk_class ∈ approval-mode tiers. Outputs are injected into the Run Context before policy resolution and tool surfacing.

Why it exists

Without a single classification step, every component re-derives intent from prompts and they drift apart. The classifier centralizes the mapping so the Compiler, Planner, and Critic agree on what the request actually is.

Inputs

  • RunContext.intent (if the caller declared one — preferred; classifier validates)
  • invokeAgent.input.message, channel, locale
  • invokeAgent.input.context (any structured signals already present)
  • Intent catalog snapshot

Outputs

  • Canonical intent_id from the Intent-Task Catalog
  • risk_classread_only / local_write / network / delegated / destructive
  • confidence and alternatives[] for the Critic to consider

How it works

  1. If RunContext.intent is supplied and matches the catalog, validate and return it (preferred path).
  2. Otherwise, run the classifier against the message + channel + locale; produce top-k candidates with confidence.
  3. Apply tiebreakers: highest specificity, then most recent template version.
  4. Cross-check the resulting intent’s risk_class against RunContext.safety_mode; refuse if intent risk exceeds safety mode without explicit policy permission.

Failure modes

  • Classifier confidence below threshold — the Critic surfaces a clarifying question instead of guessing.
  • Intent resolves to a deprecated entry — refuse and emit a typed error.
  • Risk-class conflict between intent and safety_mode — refuse before compilation.

Operational concerns

  • Classifier model pinned per environment; upgrades go through release-gate evaluation.
  • p50 / p99 latency budget folded into the Planner timeout.
  • Per-tenant classifier quotas.
  • Drift monitoring against golden classification sets.

Evaluation metrics

  • Top-1 precision/recall on golden intents.
  • Calibration of confidence buckets.
  • Refusal rate (clarifying-question routes).
  • Risk-class drift detection.