Skip to content

Blog

Operator-grade essays on harness engineering, context packs, replay, evaluators, and running agents in production.

Get new posts by email

ContextOS essays, field notes, and implementation guides.

RSS remains available for feed readers.

Start with the path that matches your job

The best ContextOS essays are built as field guides: pick the role path, then go deeper through the series.

Open start guide

Start here

If you have read nothing yet, read these in order.

AI literacy series

Mental models for business leaders, domain experts, and operators learning how to think about real agentic systems.

AI Tokenomics: From Cost per Token to Cost per Trusted Outcome illustration
May 26, 2026·16 min

AI Tokenomics: From Cost per Token to Cost per Trusted Outcome

AI tokenomics connects cost per token, agentic cost multipliers, routing, evals, governance, and cost per trusted outcome.

The Autonomy Budget: How Enterprises Should Decide What AI Agents Are Allowed to Do illustration
May 23, 2026·12 min

The Autonomy Budget: How Enterprises Should Decide What AI Agents Are Allowed to Do

A practical governance model for granting AI agents bounded authority based on risk, evidence, policy confidence, evals, and approval.

AI Agents for Business Leaders: Build the Airport, Not Just the Plane illustration
May 13, 2026·20 min

AI Agents for Business Leaders: Build the Airport, Not Just the Plane

A practical executive playbook for agentic AI: define the work, evidence, authority, scorecards, approvals, security, observability, and improvement loop.

May 13, 2026·4 min

Before Your Team Asks for an AI Agent, Map the Real Work

A practical guide for business teams mapping real work before building agents: actors, evidence, tools, decisions, risks, exceptions, and feedback loops.

May 13, 2026·4 min

Trusting AI at Work: Approvals, Boundaries, and Receipts

A plain-English guide to agent trust: what AI can read, draft, send, change, approve, and how receipts make decisions accountable.

May 13, 2026·4 min

How to Judge AI Work: Scorecards, Not Vibes

A practical guide for business teams evaluating AI agents with scorecards, examples, traces, human corrections, and launch gates instead of demos and vibes.

May 13, 2026·8 min

AI Does Not Launch Once: Feedback Loops After Go-Live

A plain-English guide to operating agents after launch: corrections, recurring failures, proposal queues, rollout, rollback, and review.

Product management series

How product managers shape real agentic systems with intents, authority, scorecards, rollout gates, and improvement loops.

Agent engineering series

How strong AI engineers build agents with datasets, scorecards, traces, and harness improvement loops.

Architecture & foundations

The five planes, why prompts alone do not scale, what context engineering means.

Building the runtime

Compile, gateway, Critic, evaluators, failure handling — the per-request pipeline.

Trust, audit, governance

Replay, approval modes, approval-gate handshakes, and the security boundary.

Memory & evidence

How agents remember, what gets promoted, how knowledge is grounded.

Enterprise use cases

Concrete agentic workflows for incident response, financial crime, regulated operations, data stewardship, and software delivery.

Reviewers & improvement

Reviewer agents, rollouts, operator corrections becoming versioned StrategyRules.