Tag

Harness Engineering

22 essays tagged with Harness Engineering.

May 19, 2026·33 min read·Intermediate

Agent Harness: An Architectural Framework for Production AI Agents

A whitepaper on typed contracts, policy gates, traces, verification loops, and release control for production AI agents.

Read essay

May 14, 2026·7 min read·Intermediate

Harness Improvement Loops Need Replayable Environments

Why harness improvement needs replayable episodes, bounded mutations, scorecards, source closure, and promotion gates.

Read essay

May 13, 2026·18 min read·Beginner

Product Managers: How to Think About and Build Complex Agentic Systems

A practical PM guide to building agentic systems with workflow maps, intents, context packs, tools, records, evals, and rollout gates.

Read essay

May 12, 2026·17 min read·Intermediate

How to Develop an Agent with an Agent Harness, End to End

An end-to-end field guide for building agents as measurable harnesses: context, planning, tools, records, evals, rollout, and learning.

Read essay

May 12, 2026·11 min read·Intermediate

Autotune the Harness: Baking the Improvement Loop into ContextOS

How ContextOS treats autotune as a gated loop over traces, scorecards, replay sets, bounded candidates, approval, and rollout.

Read essay

May 12, 2026·13 min read·Intermediate

How Great AI Engineers Build Agents: Datasets, Scores, and Harnesses That Improve

Why strong AI engineers build datasets, scorecards, traces, and improvement loops instead of treating agents as prompts plus tools.

Read essay

May 12, 2026·6 min read·Intermediate

Harness Candidates Are Model Checkpoints: How to Improve Agents Without Silent Mutation

How to treat every prompt, retrieval, tool, policy, and evaluator change as a scored, reviewed, reversible harness candidate.

Read essay

May 9, 2026·9 min read·Intermediate

AGENTS.md Done Right: The Navigation File That Actually Helps Coding Agents

How to write AGENTS.md as a short, scoped, testable navigation file for coding agents instead of a bloated prompt dump.

Read essay

May 9, 2026·21 min read·Intermediate

The Agent Harness Audit: A Production Readiness Checklist for Governed AI Agents

A production readiness audit for agent harnesses: forty-four runtime controls grouped into eight evidence-backed outcomes.

Read essay

May 9, 2026·6 min read·Intermediate

Replay Harness in Code: Reproducing a DecisionRecord Byte-for-Byte

A TypeScript build-along for replay: input loading, hash-chain verification, canonical loop replay, and DecisionRecord diffing.

Read essay

May 8, 2026·5 min read·Intermediate

End-to-End Refund: How 12 Primitives Compose in One Production Run

A single refund run traced through 12 ContextOS primitives, from invokeAgent envelope to byte-equal replay.

Read essay

May 7, 2026·6 min read·Intermediate

Failure Playbooks: The Typed Verdict Map

How to replace generic retry loops with typed failure verdicts, compensations, escalation paths, and reversal-token checks.

Read essay

May 6, 2026·5 min read·Intermediate

Approval Gates in Code: The Destructive-Mode Handshake

A build-along for approval gates: frozen evidence, human signatures, gateway redemption, and replayable destructive-action handshakes.

Read essay

May 5, 2026·5 min read·Intermediate

Build the Tool Gateway: The Boundary That Actually Stops a Bad Action

A build-along for the Tool Gateway: adapter manifests, typed envelopes, resolver checks, dispatch, and destructive-action boundaries.

Read essay

May 2, 2026·5 min read·Intermediate

The Critic: verify, score, consolidate — in 80 Lines

A compact Critic implementation that verifies plans, scores outcomes, consolidates results, and records caveats.

Read essay

April 25, 2026·6 min read·Intermediate

Promotion-Aware Memory: Capture, Review, Promote, Recall in Code

A build-along for agent memory: capture, review, promote, recall, contradiction checks, and governed memory writes.

Read essay

April 21, 2026·6 min read·Intermediate

Build the Context Pack Compiler: Eight Stages, Eight Files

A build-along for the Context Pack compiler: eight deterministic stages that turn runtime inputs into a typed compiled context.

Read essay

April 15, 2026·7 min read·Intermediate

From Operator Correction to Released StrategyRule: The Improvement Loop, Coded

How one operator correction becomes a reviewed, replayed, versioned StrategyRule that prevents repeat agent failures.

Read essay

April 11, 2026·7 min read·Intermediate

Pack Rollout in Five Stages: Shipping a Context Pack Without Blowing Up Production

A five-stage rollout model for Context Packs: shadow, internal, low-risk, monitored expansion, full release, and rollback.

Read essay

April 5, 2026·6 min read·Intermediate

Wiring the Five Evaluators: Policy, Utility, Latency, Safety, Cost

A build-along for wiring policy, utility, latency, safety, and cost evaluators into a release-gated scorecard.

Read essay

March 18, 2026·4 min read·Intermediate

Building a Reliability Reviewer Agent: 70 Lines Past the Compliance One

How to extend the reviewer pattern for reliability: timeouts, retries, idempotency, fallback behavior, and rollback declarations.

Read essay

March 15, 2026·6 min read·Intermediate

Building a Compliance Reviewer Agent in 60 Lines and a Golden Set

How to build a compliance reviewer agent with a typed verdict envelope, rubric, golden set, and change-control queue.

Read essay