Reviewer Agents

The seven canonical reviewer agents — architecture, security, reliability, product, data, cost, compliance — their concerns, output schema, and integration with approval gates.

ReferenceLast reviewed: 2026-05-09 Edit on GitHub

At a glance

Reviewer Agents are specialized harness components that inspect proposed changes — pack diffs, policy edits, tool additions, planner skill updates, agent-generated PRs — before they reach a human approver. They are not a replacement for human review. They are how humans stop being the linear-scaling bottleneck on mechanical checks so they can focus on judgment, architecture, and risk.

A reviewer agent is itself a versioned harness primitive: it has a skill, a rubric, an output schema, a golden set, and a release gate. It is treated like any other component of the Improvement Loop — its findings produce typed proposals that land in the same change-control queue as everything else.

Why specialized reviewers, not one giant reviewer

A single “review the change” agent fails in three predictable ways:

Concerns get blurred. Security findings get diluted by stylistic noise; cost regressions hide behind feature wins.
Severity calibration drifts. A reviewer that sees everything calibrates to the median, missing tail-risk findings.
Ownership becomes diffuse. Nobody owns the rubric, so it ages silently.

Specialized reviewers solve all three: each has a single concern, a tightly-scoped rubric, and a named owner. Severity is calibrated against a concern-specific golden set, not a generic one.

The seven canonical reviewers

Reviewer	Owns the answer to
Architecture	Does this change respect plane boundaries and dependency direction?
Security	Does this change leak PII, secrets, or sandbox isolation?
Reliability	Will this change behave under partial failure, retries, or rollback?
Product	Does this change actually serve the user intent at the edges?
Data	Does this change preserve evidence coverage, event schema, and analytics signal?
Cost	Does this change keep run-budget headroom and tool-call counts within target?
Compliance	Does this change preserve audit, consent, and regulated-action gates?

Architecture reviewer

Owns: plane boundaries, dependency direction, primitive layering.

Checks include:

No direct DB call from a tool implementation (tools must go through the Adapter Mesh).
No cross-domain dependency without an adapter manifest.
No business action without a decision_id declared in the Decision Catalog.
No tool added without a schema and capability class.
No agent response written without a trace_id propagated end-to-end.

Severity floor: any finding that breaks the reference architecture is blocker.

Security reviewer

Owns: sensitive data, secrets, auth, sandbox profile, injection, isolation.

Checks include:

No plaintext sensitive field in logs or traces (policy_id: no_plaintext_sensitive_data_logging).
All secrets resolved through the secrets adapter; no inline credentials.
Tool sandbox profile matches the tool’s side-effect classification.
Identity propagation (CEID, SID, agent workload identity) is preserved across every Tool Gateway call.
No prompt fragment includes user-controlled content without explicit injection-safe rendering.

Severity floor: any sensitive-data leak is blocker and emits a security event regardless of approval state.

Reliability reviewer

Owns: timeouts, retries, fallbacks, idempotency, rollback paths.

Checks include:

Every tool call has an explicit timeout; no inherited defaults from a generic HTTP layer.
Retries are scoped (max attempts, backoff, idempotency key); no unbounded retry loops.
Every destructive tool emits a reversal token, idempotency key, or compensating action.
Failure-playbook coverage exists for the typed verdicts the change can produce.
New planner / executor / critic skills declare their loop guard and budget posture.

Severity floor: a destructive tool without a reversal path is blocker.

Product reviewer

Owns: user experience, intent fidelity, edge cases.

Checks include:

Response is clear, minimal, and useful at the user surface.
Edge cases on the intent (empty result, expired state, partial failure) are explicitly handled.
User-facing messages do not expose internal system details, internal IDs, or stack-shaped strings.
Clarification questions are asked only when the harness genuinely lacks the information; otherwise the agent acts on its best evidence and says so.
Tone, naming, and surface area match the product principles registered in the repo.

Severity floor: leaking internal details to the end user is blocker.

Data reviewer

Owns: event schema, evidence coverage, analytics impact.

Checks include:

Every DecisionRecord carries the evidence_refs declared by the Decision Spec.
New event types are registered in the schema registry before they ship.
Analytics events include the canonical correlation IDs (trace_id, run_id, decision_id) — no event is “for product” only.
Memory promotions (working → episodic → semantic → durable) carry the consent and contradiction evidence required by the Memory doc.

Severity floor: a DecisionRecord without resolvable evidence_refs is blocker.

Cost reviewer

Owns: tokens, infra cost, retrieval cost, run-budget headroom.

Checks include:

Token usage per decision stays within the intent’s RunBudget band; new pack versions cannot regress economics by more than the configured guard.
Tool-call count per decision is bounded; loop guards activate before the budget is exhausted.
Retrieval breadth (top_k, max_hops) is justified by an evaluator metric, not by intuition.
Caching: any new prompt fragment that can be reused is part of the cache surface.
Model selection: the AI Gateway & LLM Router tier is appropriate for the intent’s risk class.

Severity floor: a regression on economics_cents_per_decision greater than the guardrail is blocker.

Compliance reviewer

Owns: audit, consent, regulated actions, ApprovalMode binding.

Checks include:

Every regulated action has a typed gate and a named approver role.
ApprovalMode binding (read_only / local_write / network / delegated / destructive) matches the action’s true side-effect class; lower effective modes require an explicit policy rule and rationale.
Consent records exist for any action touching regulated data.
Audit trail covers the full lifecycle: proposal, approval, execution, reversal.
Regulatory mapping (NIST AI RMF, ISO/IEC 42001, sector-specific frames) is current for any new control.

Severity floor: any destructive or regulated action without a binding gate is blocker.

Output schema

Every reviewer emits the same envelope, regardless of concern:

{
  "review_id": "rv_2026_05_08_a17",
  "reviewer_agent": "security-reviewer",
  "reviewer_version": "1.4.0",
  "subject": {
    "kind": "pack_change",
    "pack_id": "ctxpack.support",
    "from_version": "5.1.0",
    "to_version": "5.2.0",
    "diff_ref": "git:repo@a13b9c2..7777eaa"
  },
  "status": "fail",
  "findings": [
    {
      "finding_id": "f_01",
      "severity": "blocker",
      "file": "harness/tools/payment.refund.json",
      "issue": "Reversal token field is missing from the tool manifest",
      "policy_id": "no_destructive_action_without_reversal",
      "recommendation": "Add reversal_token to the output_schema; reference issuer_adapter."
    }
  ],
  "evidence_refs": ["dr_2026_05_07_q88"],
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "completed_at": "2026-05-08T09:14:00Z"
}

Severity vocabulary is fixed: blocker, must-fix, should-fix, nit. A blocker from any reviewer prevents promotion past the relevant rollout stage; a must-fix requires either the fix or an ADR (decision record) explaining the deferral.

Integration with approval gates

Reviewer findings are inputs to the governance approval gate, not a replacement for it.

proposal arrives
    ↓
reviewers fan out in parallel
    ↓
findings aggregate by severity
    ↓
if any blocker: gate denied, proposal re-enters change-control as 'changes_requested'
if any must-fix: gate held; either fix lands or ADR is filed
if only should-fix / nit: gate goes to human approver with summary
    ↓
human approver makes the final call (judgment, risk, scope)
    ↓
if approved: rollout begins at 0%_shadow per the harness rollout stages

This split is deliberate: reviewers are excellent at mechanical and pattern-driven checks, and humans are excellent at judgment under ambiguity. Reviewers should never auto-approve. They can only auto-deny on a blocker.

Reviewer skills are first-class harness components

Each reviewer is a versioned skill in harness/reviewers/:

harness/reviewers/
  architecture/
    skill.md               # role, scope, rubric
    policies.json          # JsonLogic checks the reviewer applies
    fixtures/              # known-good and known-bad changes
    golden_set.json        # labeled findings for evaluator scoring
  security/
    skill.md
    policies.json
    fixtures/
    golden_set.json
  ...

The skill follows the same minimal-skill pattern that works elsewhere in the harness (Meta-Harness Appendix D calls this out as the strongest single lever on search quality): constrain what is forbidden and what artifacts to produce, but leave the reviewer free to inspect anything.

A reviewer’s golden set is exercised on every reviewer-skill change. False-positive rate, false-negative rate, and severity-calibration drift are tracked the same way evaluator metrics are tracked in Evaluation and Observability.

Failure modes

Reviewer becomes a style cop. Findings devolve into nits and humans stop reading. Mitigation: severity floor by concern; cap nit-rate per review.
Reviewer over-blocks. Every change becomes a blocker; throughput collapses. Mitigation: track per-reviewer block rate; rebaseline rubric quarterly.
Reviewer drifts. New patterns appear in the codebase that the rubric doesn’t cover. Mitigation: feed Improvement-Loop insights of kind gap_detected back into reviewer rubrics.
Reviewer is bypassed. A team adds an “emergency lane” that skips reviewers. Mitigation: emergency handling is a typed approval-gate state on the existing approval-mode taxonomy, emits a security event, and forces post-hoc review.
Reviewer hallucinates findings. Mitigation: every finding must cite a policy_id from the policy bundle or a rule from the reviewer’s own rubric; ungrounded findings are dropped at the gate.

Operational concerns

Reviewers run in parallel; the slowest reviewer sets gate latency.
Reviewer cost budget is separate from production RunBudget; a reviewer cannot block an emergency rollback by exceeding its budget.
Reviewer skills are versioned independently; pinning a reviewer to an older skill is allowed for replay but never for live gating.
Reviewer outputs are append-only and replayable; the same diff against the same reviewer version must produce the same findings.

Evaluation metrics

Block rate per reviewer — fraction of reviews ending in any blocker. Target band depends on concern (security ~5–15%, cost ~10–25%); outside-band signals miscalibration.
False-positive rate — blocker findings overturned at human approval.
False-negative rate — incidents whose root cause was a finding the reviewer missed.
Severity calibration drift — distribution of severities over time; sharp shifts trigger rubric review.
Time-to-finding — median wall-clock from proposal arrival to reviewer verdict.
Coverage — fraction of diffs that exercise at least one rule in the reviewer’s rubric.

Example: a security reviewer catching a regression

A pack diff adds a new tool adp_support.fetch_thread. The Architecture reviewer passes; the Security reviewer fails:

{
  "review_id": "rv_2026_05_08_b04",
  "reviewer_agent": "security-reviewer",
  "reviewer_version": "1.4.0",
  "subject": {
    "kind": "tool_addition",
    "tool_name": "adp_support.fetch_thread",
    "diff_ref": "git:repo@7777eaa..b21c5d4"
  },
  "status": "fail",
  "findings": [
    {
      "finding_id": "f_01",
      "severity": "blocker",
      "file": "harness/tools/support.fetch_thread.json",
      "issue": "Output schema includes raw user_email and phone_e164 with no redaction profile bound",
      "policy_id": "no_plaintext_sensitive_data_logging",
      "recommendation": "Bind redaction_profile=customer_pii_v3 on output_schema; or move fields behind a delegated read."
    },
    {
      "finding_id": "f_02",
      "severity": "must-fix",
      "file": "harness/tools/support.fetch_thread.json",
      "issue": "capability_class=observe but tool returns identity-mapped fields; should be capability_class=recall with consent evidence",
      "policy_id": "capability_class_matches_side_effect",
      "recommendation": "Reclassify as recall; require consent_record_ref in the input_schema."
    }
  ]
}

The gate denies promotion; the Improvement Loop captures the pattern as gap_detected so the Security reviewer’s rubric can be extended; an ADR is filed if the team wants to defer the must-fix.

Why specialized reviewers, not one giant reviewer

The seven canonical reviewers

Architecture reviewer

Security reviewer

Reliability reviewer

Product reviewer

Data reviewer

Cost reviewer

Compliance reviewer

Output schema

Integration with approval gates

Reviewer skills are first-class harness components

Failure modes

Operational concerns

Evaluation metrics

Example: a security reviewer catching a regression

See also