Skip to content
Back to Blog
Blog series
5 posts · 35 min read

Reviewers & improvement

Reviewer agents, rollouts, operator corrections becoming versioned StrategyRules.

Share:XHN
1
May 12, 2026·11 min read

Autotune the Harness: Baking the Improvement Loop into ContextOS

Autotune should not mean a script that edits prompts in production. In ContextOS it is a release-gated loop over harness artifacts: traces, scorecards, replay sets, bounded candidate changes, human approval, and staged rollout.

2
March 15, 2026·6 min read

Building a Compliance Reviewer Agent in 60 Lines and a Golden Set

A concrete walkthrough of one reviewer agent — the typed envelope it emits, the rubric it follows, the golden set that pins its judgment, and where the envelope lands in change control. Use the compliance reviewer as your first; it is the cheapest one to start with.

3
March 18, 2026·4 min read

Building a Reliability Reviewer Agent: 70 Lines Past the Compliance One

The compliance reviewer was the cheapest reviewer to start with. The reliability reviewer is the second cheapest, and copies most of the scaffolding. Same envelope, same change-control queue, different rubric — timeouts, retries, idempotency, fallbacks, and rollback declarations.

4
April 11, 2026·7 min read

Pack Rollout in Five Stages: Shipping a Context Pack Without Blowing Up Production

0% shadow → 1% internal → 5% low-risk → 25% monitored → 100%. Each stage is a contract, not a vibe. Here are the routing rules, the scorecard-delta SQL, the advance-stage script, and the kill switch — concrete enough to run on Monday.

5
April 15, 2026·7 min read

From Operator Correction to Released StrategyRule: The Improvement Loop, Coded

An operator overrides one refund decision. Three weeks later, that single override has become a versioned StrategyRule preventing the same class of mistake on every future run. Here is the chain — schemas, code, and replay — that turns that correction into shipped harness behavior.

Analytics consent

We use Google Analytics to understand site usage. You can opt in or decline.