Reviewers & improvement
Reviewer agents, rollouts, operator corrections becoming versioned StrategyRules.
Autotune the Harness: Baking the Improvement Loop into ContextOS
Autotune should not mean a script that edits prompts in production. In ContextOS it is a release-gated loop over harness artifacts: traces, scorecards, replay sets, bounded candidate changes, human approval, and staged rollout.
Building a Compliance Reviewer Agent in 60 Lines and a Golden Set
A concrete walkthrough of one reviewer agent — the typed envelope it emits, the rubric it follows, the golden set that pins its judgment, and where the envelope lands in change control. Use the compliance reviewer as your first; it is the cheapest one to start with.
Building a Reliability Reviewer Agent: 70 Lines Past the Compliance One
The compliance reviewer was the cheapest reviewer to start with. The reliability reviewer is the second cheapest, and copies most of the scaffolding. Same envelope, same change-control queue, different rubric — timeouts, retries, idempotency, fallbacks, and rollback declarations.
Pack Rollout in Five Stages: Shipping a Context Pack Without Blowing Up Production
0% shadow → 1% internal → 5% low-risk → 25% monitored → 100%. Each stage is a contract, not a vibe. Here are the routing rules, the scorecard-delta SQL, the advance-stage script, and the kill switch — concrete enough to run on Monday.
From Operator Correction to Released StrategyRule: The Improvement Loop, Coded
An operator overrides one refund decision. Three weeks later, that single override has become a versioned StrategyRule preventing the same class of mistake on every future run. Here is the chain — schemas, code, and replay — that turns that correction into shipped harness behavior.