The Control Tower Pattern: How PMs Should Design Multi-Agent Products

Multi-agent systems are easy to draw and hard to operate.

The common failure is an org chart made of prompts:

Research Agent -> Planning Agent -> Execution Agent -> QA Agent

It looks sophisticated. It often creates more latency, more unclear ownership, more context loss, and more places where nobody knows why the final answer happened.

Product managers need a stricter rule:

Add another agent only when the product needs a separate context, authority, tool surface, or scorecard.

Use the airport analogy again. A control tower does not create a “landing agent,” “runway agent,” and “weather agent” because the diagram looks cleaner. It separates responsibilities because the work has different data, timing, authority, and failure modes.

That is the ContextOS view of multi-agent products: a parent orchestrator, specialist lanes, a Critic, a Tool Gateway, and one final receipt.

Workflow, agent, or multi-agent?

Before designing a multi-agent product, decide the runtime shape.

Shape	Product fit	PM warning
Single call	Short, low-risk answer	Do not overbuild
Fixed workflow	Known steps, predictable handoffs	Better than “agent” for many products
Planner / Executor / Critic	Adaptive tool use and recovery needed	Requires trace and budget discipline
Orchestrator + lanes	Parallel or specialized work creates measurable value	Must preserve one owner for final decision
Long-running session	Work spans hours, days, or systems	Requires checkpoints and progress contracts

Anthropic’s effective-agent guidance makes the same practical distinction: workflows use predefined code paths, while agents dynamically direct tool use. The PM implication is simple: do not buy autonomy you cannot score.

The control tower pattern

In ContextOS, a complex multi-agent system should look like this:

Parent Orchestrator
  owns intent, RunContext, budget, final DecisionRecord
 
Specialist Lanes
  run bounded subtasks with scoped tools and context
 
Critic
  verifies plans, scores lane outputs, accepts or rejects synthesis
 
Tool Gateway
  enforces schemas, policy, approval modes, and audit
 
DecisionRecord
  records final outcome, evidence, approvals, trace, replay handle

The parent orchestrator is the control tower. Specialist agents are crews. Crews can inspect and prepare. They do not clear the runway.

When to split into specialist lanes

Use this test:

Split condition	Example
Different evidence set	Contract review needs signed terms; billing needs SKU catalog
Different tool surface	Compliance can call screening tools; comms can draft emails
Different risk class	Intake is read-only; payment activation is destructive
Different evaluator	Legal accuracy and customer tone need different rubrics
Parallelizable work	KYC, contract extraction, and environment setup can run concurrently
Different owner	Legal, finance, support, and implementation have separate accountability

If none of these are true, keep it in one workflow.

Multi-agent product anti-patterns

Anti-pattern	Why it fails	Better pattern
Agent per department	Mirrors org politics, not work boundaries	Intent and evidence-based lanes
Worker can mutate final state	No single accountable decision	Parent accepts worker output before effects
Every worker sees everything	Context bloat and leakage	Scoped Context Pack per lane
Agent debate without evidence	More tokens, same uncertainty	Require evidence refs and Critic verdicts
No lane-specific evals	Cannot tell which specialist regressed	Score by lane and final outcome
Shared tool pool	Risk bleed across lanes	Tool Gateway per lane authority

The PM should reject multi-agent diagrams that do not show authority, evidence, and final ownership.

Worked example: enterprise renewal desk

Goal:

Help account teams prepare, approve, and send enterprise renewal proposals.

The naive product idea:

A renewal agent that handles renewals.

The control tower version:

Lane	Job	Context	Tools	Risk
Account Intake	Normalize account, renewal date, owners	CRM, account notes	read CRM	`read_only`
Usage Analyst	Analyze adoption and expansion signals	product analytics	query metrics	`network`
Contract Reviewer	Extract terms, renewal clauses, restrictions	contract repo	read contracts	`read_only`
Pricing Specialist	Draft pricing options	price book, discount policy	create quote draft	`local_write`
Risk Reviewer	Identify churn, legal, and finance risks	history, exceptions	policy eval	`network`
Comms Drafter	Draft customer-facing renewal narrative	approved facts	draft email	`local_write`
Deal Desk Gate	Approve discount or non-standard terms	full packet	approval gate	`destructive`

The parent orchestrator owns the renewal packet and final DecisionRecord.

The PM spec for each lane

Each specialist lane needs a mini-spec:

lane: pricing_specialist
parent_intent: renewal.prepare_proposal
mission: draft pricing options and discount rationale
context_pack:
  required:
    - account_tier
    - current_contract_value
    - usage_trend
    - approved_price_book
    - discount_policy
tools:
  allowed:
    - pricebook.lookup
    - quote.create_draft
  denied:
    - quote.send_to_customer
approval_mode: local_write
output:
  type: pricing_recommendation
  fields:
    - recommended_package
    - discount_percent
    - rationale
    - evidence_refs
evals:
  - discount_policy_compliance
  - margin_floor_preserved
  - rationale_evidence_coverage

If a lane cannot be specified this way, it is not ready to be a separate agent.

Parent orchestration rules

The parent orchestrator should have rules like:

It may spawn lanes only from approved task templates.
It must pass each lane a scoped RunContext.
It must set lane budgets.
It must reject lane outputs without required evidence refs.
It must not let lane outputs directly produce side effects.
It must synthesize one final plan.
It must produce one final DecisionRecord.

This is Orchestration, not “coordination by vibes.”

The Critic is the product safety net

The Critic is not a “QA agent” bolted on at the end.

It verifies:

Check	Product question
Plan validity	Is this path allowed for the intent?
Evidence sufficiency	Do we have the facts needed to decide?
Tool authorization	Are these tools allowed for this RunContext?
Approval mode	Is the right gate required before side effects?
Lane quality	Did each specialist return a typed, usable result?
Final receipt	Does the DecisionRecord explain the work?

For PMs, the Critic is where many acceptance criteria become executable.

Context management for multi-agent products

Do not share one giant prompt across all agents.

Use per-lane context:

Context strategy	PM meaning
Up-front briefing	Stable mission, policy, owner, output shape
Just-in-time retrieval	Let lane fetch specific evidence when needed
Compaction	Preserve decisions and open questions, drop raw chatter
Structured notes	Persist progress outside the context window
Parent summary	Return typed output, not full lane transcript

This follows the practical lesson from context engineering: context is finite and should be treated as an attention budget.

Product metrics for multi-agent systems

Do not only measure final task success.

Measure the system shape:

Metric	Why it matters
Lane spawn rate	Detects unnecessary decomposition
Lane acceptance rate	Shows whether specialists produce useful artifacts
Parent rejection reasons	Reveals unclear lane contracts
Cross-lane contradiction rate	Shows context or policy conflicts
Tool denial rate by lane	Reveals authority mismatch
Critical path latency	Measures whether parallelism actually helps
Final DecisionRecord completeness	Determines audit readiness

If multi-agent architecture does not improve utility, latency, or risk control, remove it.

Rollout path

Roll out multi-agent systems by lanes:

Shadow the parent workflow with no lane side effects.
Enable one read-only lane.
Add lane-specific scorecards.
Enable parallel lanes only after trace review shows value.
Add delegated actions behind approval gates.
Add destructive paths last, with rollback rehearsed.

The safest launch is not “all agents on.” It is “one lane earns trust at a time.”

PM checklist

Before approving a multi-agent design, ask:

Why is a fixed workflow not enough?
Which lanes have different context, tools, risk, or evals?
Who owns each lane?
What typed artifact does each lane return?
Can the parent reject a lane output?
Which lane can create side effects?
Which approval gates apply?
What is the final DecisionRecord?
Which trace shows the full parent/child path?
What metric proves multi-agent is better than single-agent?

If the diagram cannot answer these questions, it is not architecture. It is decoration.