Skip to content
Back to Blog
Tag

Scorecards

7 essays tagged with Scorecards.

May 13, 2026·4 min read·Beginner

How to Judge AI Work: Scorecards, Not Vibes

A practical guide for business teams evaluating AI agents with scorecards, examples, traces, human corrections, and launch gates instead of demos and vibes.

Read essay
May 13, 2026·6 min read·Beginner

Scorecards Before Screens: Evals and Launch Gates for PMs Building Agents

A PM guide to defining agent quality with datasets, trace reviews, scorecards, release gates, and business metrics before building the agent UI.

Read essay
May 12, 2026·13 min read·Intermediate

How Great AI Engineers Build Agents: Datasets, Scores, and Harnesses That Improve

Why strong AI engineers build datasets, scorecards, traces, and improvement loops instead of treating agents as prompts plus tools.

Read essay
May 12, 2026·6 min read·Intermediate

Scorecards Over Vibes: The Five Metrics That Keep Agents Honest

The five metrics that keep agents honest: policy, utility, latency, safety, and economics.

Read essay
May 9, 2026·18 min read·Intermediate

The Agent Harness Audit: A Production Readiness Checklist for Governed AI Agents

A production readiness audit for agent harnesses: forty runtime controls grouped into eight evidence-backed outcomes.

Read essay
May 2, 2026·5 min read·Intermediate

The Critic: verify, score, consolidate — in 80 Lines

A compact Critic implementation that verifies plans, scores outcomes, consolidates results, and records caveats.

Read essay
April 5, 2026·6 min read·Intermediate

Wiring the Five Evaluators: Policy, Utility, Latency, Safety, Cost

A build-along for wiring policy, utility, latency, safety, and cost evaluators into a release-gated scorecard.

Read essay