Reviewer Agents
The seven canonical reviewer agents — architecture, security, reliability, product, data, cost, compliance — their concerns, output schema, and integration with approval gates.
Reviewer Agents are specialized harness components that inspect proposed changes — pack diffs, policy edits, tool additions, planner skill updates, agent-generated PRs — before they reach a human approver. They are not a replacement for human review. They are how humans stop being the linear-scaling bottleneck on mechanical checks so they can focus on judgment, architecture, and risk.
A reviewer agent is itself a versioned harness primitive: it has a skill, a rubric, an output schema, a golden set, and a release gate. It is treated like any other component of the Improvement Loop — its findings produce typed proposals that land in the same change-control queue as everything else.
Why specialized reviewers, not one giant reviewer
A single “review the change” agent fails in three predictable ways:
- Concerns get blurred. Security findings get diluted by stylistic noise; cost regressions hide behind feature wins.
- Severity calibration drifts. A reviewer that sees everything calibrates to the median, missing tail-risk findings.
- Ownership becomes diffuse. Nobody owns the rubric, so it ages silently.
Specialized reviewers solve all three: each has a single concern, a tightly-scoped rubric, and a named owner. Severity is calibrated against a concern-specific golden set, not a generic one.
The seven canonical reviewers
| Reviewer | Owns the answer to |
|---|---|
| Architecture | Does this change respect plane boundaries and dependency direction? |
| Security | Does this change leak PII, secrets, or sandbox isolation? |
| Reliability | Will this change behave under partial failure, retries, or rollback? |
| Product | Does this change actually serve the user intent at the edges? |
| Data | Does this change preserve evidence coverage, event schema, and analytics signal? |
| Cost | Does this change keep run-budget headroom and tool-call counts within target? |
| Compliance | Does this change preserve audit, consent, and regulated-action gates? |
Architecture reviewer
Owns: plane boundaries, dependency direction, primitive layering.
Checks include:
- No direct DB call from a tool implementation (tools must go through the Adapter Mesh).
- No cross-domain dependency without an adapter manifest.
- No business action without a
decision_iddeclared in the Decision Catalog. - No tool added without a schema and capability class.
- No agent response written without a
trace_idpropagated end-to-end.
Severity floor: any finding that breaks the reference architecture is blocker.
Security reviewer
Owns: sensitive data, secrets, auth, sandbox profile, injection, isolation.
Checks include:
- No plaintext sensitive field in logs or traces (
policy_id: no_plaintext_sensitive_data_logging). - All secrets resolved through the secrets adapter; no inline credentials.
- Tool sandbox profile matches the tool’s side-effect classification.
- Identity propagation (CEID, SID, agent workload identity) is preserved across every Tool Gateway call.
- No prompt fragment includes user-controlled content without explicit injection-safe rendering.
Severity floor: any sensitive-data leak is blocker and emits a security event regardless of approval state.
Reliability reviewer
Owns: timeouts, retries, fallbacks, idempotency, rollback paths.
Checks include:
- Every tool call has an explicit timeout; no inherited defaults from a generic HTTP layer.
- Retries are scoped (max attempts, backoff, idempotency key); no unbounded retry loops.
- Every destructive tool emits a reversal token, idempotency key, or compensating action.
- Failure-playbook coverage exists for the typed verdicts the change can produce.
- New planner / executor / critic skills declare their loop guard and budget posture.
Severity floor: a destructive tool without a reversal path is blocker.
Product reviewer
Owns: user experience, intent fidelity, edge cases.
Checks include:
- Response is clear, minimal, and useful at the user surface.
- Edge cases on the intent (empty result, expired state, partial failure) are explicitly handled.
- User-facing messages do not expose internal system details, internal IDs, or stack-shaped strings.
- Clarification questions are asked only when the harness genuinely lacks the information; otherwise the agent acts on its best evidence and says so.
- Tone, naming, and surface area match the product principles registered in the repo.
Severity floor: leaking internal details to the end user is blocker.
Data reviewer
Owns: event schema, evidence coverage, analytics impact.
Checks include:
- Every
DecisionRecordcarries theevidence_refsdeclared by the Decision Spec. - New event types are registered in the schema registry before they ship.
- Analytics events include the canonical correlation IDs (
trace_id,run_id,decision_id) — no event is “for product” only. - Memory promotions (working → episodic → semantic → durable) carry the consent and contradiction evidence required by the Memory doc.
Severity floor: a DecisionRecord without resolvable evidence_refs is blocker.
Cost reviewer
Owns: tokens, infra cost, retrieval cost, run-budget headroom.
Checks include:
- Token usage per decision stays within the intent’s
RunBudgetband; new pack versions cannot regress economics by more than the configured guard. - Tool-call count per decision is bounded; loop guards activate before the budget is exhausted.
- Retrieval breadth (
top_k,max_hops) is justified by an evaluator metric, not by intuition. - Caching: any new prompt fragment that can be reused is part of the cache surface.
- Model selection: the AI Gateway & LLM Router tier is appropriate for the intent’s risk class.
Severity floor: a regression on economics_cents_per_decision greater than the guardrail is blocker.
Compliance reviewer
Owns: audit, consent, regulated actions, ApprovalMode binding.
Checks include:
- Every regulated action has a typed gate and a named approver role.
- ApprovalMode binding (
read_only/local_write/network/delegated/destructive) matches the action’s true side-effect class; lower effective modes require an explicit policy rule and rationale. - Consent records exist for any action touching regulated data.
- Audit trail covers the full lifecycle: proposal, approval, execution, reversal.
- Regulatory mapping (NIST AI RMF, ISO/IEC 42001, sector-specific frames) is current for any new control.
Severity floor: any destructive or regulated action without a binding gate is blocker.
Output schema
Every reviewer emits the same envelope, regardless of concern:
{
"review_id": "rv_2026_05_08_a17",
"reviewer_agent": "security-reviewer",
"reviewer_version": "1.4.0",
"subject": {
"kind": "pack_change",
"pack_id": "ctxpack.support",
"from_version": "5.1.0",
"to_version": "5.2.0",
"diff_ref": "git:repo@a13b9c2..7777eaa"
},
"status": "fail",
"findings": [
{
"finding_id": "f_01",
"severity": "blocker",
"file": "harness/tools/payment.refund.json",
"issue": "Reversal token field is missing from the tool manifest",
"policy_id": "no_destructive_action_without_reversal",
"recommendation": "Add reversal_token to the output_schema; reference issuer_adapter."
}
],
"evidence_refs": ["dr_2026_05_07_q88"],
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"completed_at": "2026-05-08T09:14:00Z"
}Severity vocabulary is fixed: blocker, must-fix, should-fix, nit. A blocker from any reviewer prevents promotion past the relevant rollout stage; a must-fix requires either the fix or an ADR (decision record) explaining the deferral.
Integration with approval gates
Reviewer findings are inputs to the governance approval gate, not a replacement for it.
proposal arrives
↓
reviewers fan out in parallel
↓
findings aggregate by severity
↓
if any blocker: gate denied, proposal re-enters change-control as 'changes_requested'
if any must-fix: gate held; either fix lands or ADR is filed
if only should-fix / nit: gate goes to human approver with summary
↓
human approver makes the final call (judgment, risk, scope)
↓
if approved: rollout begins at 0%_shadow per the harness rollout stagesThis split is deliberate: reviewers are excellent at mechanical and pattern-driven checks, and humans are excellent at judgment under ambiguity. Reviewers should never auto-approve. They can only auto-deny on a blocker.
Reviewer skills are first-class harness components
Each reviewer is a versioned skill in harness/reviewers/:
harness/reviewers/
architecture/
skill.md # role, scope, rubric
policies.json # JsonLogic checks the reviewer applies
fixtures/ # known-good and known-bad changes
golden_set.json # labeled findings for evaluator scoring
security/
skill.md
policies.json
fixtures/
golden_set.json
...The skill follows the same minimal-skill pattern that works elsewhere in the harness (Meta-Harness Appendix D calls this out as the strongest single lever on search quality): constrain what is forbidden and what artifacts to produce, but leave the reviewer free to inspect anything.
A reviewer’s golden set is exercised on every reviewer-skill change. False-positive rate, false-negative rate, and severity-calibration drift are tracked the same way evaluator metrics are tracked in Evaluation and Observability.
Failure modes
- Reviewer becomes a style cop. Findings devolve into nits and humans stop reading. Mitigation: severity floor by concern; cap nit-rate per review.
- Reviewer over-blocks. Every change becomes a
blocker; throughput collapses. Mitigation: track per-reviewer block rate; rebaseline rubric quarterly. - Reviewer drifts. New patterns appear in the codebase that the rubric doesn’t cover. Mitigation: feed Improvement-Loop insights of kind
gap_detectedback into reviewer rubrics. - Reviewer is bypassed. A team adds an “emergency lane” that skips reviewers. Mitigation: emergency handling is a typed approval-gate state on the existing approval-mode taxonomy, emits a security event, and forces post-hoc review.
- Reviewer hallucinates findings. Mitigation: every finding must cite a policy_id from the policy bundle or a rule from the reviewer’s own rubric; ungrounded findings are dropped at the gate.
Operational concerns
- Reviewers run in parallel; the slowest reviewer sets gate latency.
- Reviewer cost budget is separate from production
RunBudget; a reviewer cannot block an emergency rollback by exceeding its budget. - Reviewer skills are versioned independently; pinning a reviewer to an older skill is allowed for replay but never for live gating.
- Reviewer outputs are append-only and replayable; the same diff against the same reviewer version must produce the same findings.
Evaluation metrics
- Block rate per reviewer — fraction of reviews ending in any
blocker. Target band depends on concern (security ~5–15%, cost ~10–25%); outside-band signals miscalibration. - False-positive rate —
blockerfindings overturned at human approval. - False-negative rate — incidents whose root cause was a finding the reviewer missed.
- Severity calibration drift — distribution of severities over time; sharp shifts trigger rubric review.
- Time-to-finding — median wall-clock from proposal arrival to reviewer verdict.
- Coverage — fraction of diffs that exercise at least one rule in the reviewer’s rubric.
Example: a security reviewer catching a regression
A pack diff adds a new tool adp_support.fetch_thread. The Architecture reviewer passes; the Security reviewer fails:
{
"review_id": "rv_2026_05_08_b04",
"reviewer_agent": "security-reviewer",
"reviewer_version": "1.4.0",
"subject": {
"kind": "tool_addition",
"tool_name": "adp_support.fetch_thread",
"diff_ref": "git:repo@7777eaa..b21c5d4"
},
"status": "fail",
"findings": [
{
"finding_id": "f_01",
"severity": "blocker",
"file": "harness/tools/support.fetch_thread.json",
"issue": "Output schema includes raw user_email and phone_e164 with no redaction profile bound",
"policy_id": "no_plaintext_sensitive_data_logging",
"recommendation": "Bind redaction_profile=customer_pii_v3 on output_schema; or move fields behind a delegated read."
},
{
"finding_id": "f_02",
"severity": "must-fix",
"file": "harness/tools/support.fetch_thread.json",
"issue": "capability_class=observe but tool returns identity-mapped fields; should be capability_class=recall with consent evidence",
"policy_id": "capability_class_matches_side_effect",
"recommendation": "Reclassify as recall; require consent_record_ref in the input_schema."
}
]
}The gate denies promotion; the Improvement Loop captures the pattern as gap_detected so the Security reviewer’s rubric can be extended; an ADR is filed if the team wants to defer the must-fix.
See also
- Harness Engineering — the discipline that this primitive composes into
- Governance — approval modes and policy bundles
- Improvement Loop — how reviewer findings feed change control
- Evaluation and Observability — how reviewers are themselves evaluated