Most failed agent PRDs are written like feature PRDs.
They describe a surface:
Users can ask the agent to help with vendor onboarding.
That is not enough. A real agentic product is a runtime that makes decisions, gathers evidence, calls tools, escalates exceptions, and learns from corrections. The PRD must describe the work contract, not only the UI.
The practical move is to turn the PRD into an Intent-Task Catalog. In ContextOS, intents are the stable names for work. Task templates are the approved ways that work may be done. Decision specs define the receipts. Approval modes define authority. Evals define whether the change is allowed to ship.
Think of the PRD as a city map. The Intent Catalog is the transit map: named routes, allowed stops, transfer points, restricted zones, and service guarantees.
Start with jobs, not prompts
The first PM question is not “what should the agent say?”
It is:
What recurring job should this system complete, and under what authority?
Use this table in discovery:
| Discovery question | Why it matters |
|---|---|
| Who performs the work today? | Identifies operator, reviewer, and owner roles |
| What business outcome changes if it works? | Prevents novelty features with no operational value |
| Which systems contain the evidence? | Becomes Context Pack and Tool Gateway scope |
| Which decisions are made? | Becomes Decision Specs |
| Which actions affect the world? | Becomes approval mode and policy scope |
| What failures are unacceptable? | Becomes must-never clauses and release gates |
| Who corrects mistakes today? | Becomes FeedbackStore and reviewer workflow |
If the team cannot answer these questions, the agent idea is not ready for implementation.
The PM spec shape
A PM-facing agent spec should have five sections:
product_outcome:
target: reduce vendor onboarding cycle time from 10d to 4d
user: procurement_ops_manager
customer: vendor_admin
work_scope:
supported_intents:
- vendor.onboarding.intake
- vendor.onboarding.compliance_check
- vendor.onboarding.contract_review
- vendor.onboarding.erp_setup
out_of_scope:
- negotiating new contract terms
- approving sanctions exceptions
authority:
default_mode: assist
delegated_actions:
- create_vendor_draft
- request_missing_documents
destructive_actions:
- activate_vendor_payment_profile
evidence:
required_sources:
- signed_contract
- tax_document
- bank_verification
- sanctions_screening_result
launch_gate:
shadow_runs: 100
policy_floor: 1.0
safety_floor: 1.0
max_operator_correction_rate: 0.12This looks more operational than a normal PRD because agentic products are operational systems.
Convert vague needs into intents
Bad:
Vendor onboarding agent.
Better:
| Intent | User request shape | Product outcome | Risk class |
|---|---|---|---|
vendor.onboarding.intake | ”Start onboarding this supplier” | Complete required fields and missing-document list | read_only |
vendor.onboarding.compliance_check | ”Can this supplier be approved?” | Evidence-backed compliance recommendation | network |
vendor.onboarding.contract_review | ”Does the contract support this setup?” | Extract obligations and conflicts | read_only |
vendor.onboarding.erp_setup | ”Prepare the vendor in ERP” | Draft vendor record for approval | delegated |
vendor.onboarding.payment_activation | ”Activate payments” | Execute only with finance approval | destructive |
Each row can be owned, tested, scored, and rolled out independently. That is the point.
Define task templates
An intent names the work. A task template defines the approved path.
Example:
task_template_id: task_vendor_compliance_check
intent_id: vendor.onboarding.compliance_check
owner_role: procurement_risk
risk_class: network
default_plan:
- id: collect_required_docs
tool: vendor_docs.lookup
- id: run_sanctions_screen
tool: compliance.sanctions_check
- id: compare_contract_terms
tool: contract.extract_obligations
- id: produce_decision
decision: vendor.compliance.recommendation
critic_requirements:
evidence_refs:
- signed_contract
- tax_document
- sanctions_screening_result
must_escalate:
- sanctions_match
- missing_bank_verification
- contract_region_mismatchThe Planner can adapt, but it adapts inside this envelope. That is how PM intent becomes runtime control.
Write must-never clauses
Every agent PRD needs a must-never section.
Not “avoid mistakes.” Specific constraints.
| Weak clause | Useful clause |
|---|---|
| Be accurate | Do not recommend approval without sanctions_screening_result evidence |
| Be safe | Do not activate payment profile without finance approval |
| Be helpful | If bank verification is missing, draft a vendor request instead of guessing |
| Be compliant | Escalate regulated-region conflicts to procurement risk |
Must-never clauses become policy rules, Critic checks, and release-gate tests.
Define authority before UI
Authority is the hidden source of most agent product bugs.
The PM must decide:
| Authority question | Product answer |
|---|---|
| Can the agent read data? | Which systems and tenants? |
| Can it draft changes? | Which drafts and who reviews? |
| Can it send messages? | Which channels and approval mode? |
| Can it commit changes? | Which actions, thresholds, and approvers? |
| Can it remember corrections? | Which memories require promotion? |
This maps to Governance, ApprovalMode, and RunContext.
Specify the receipt
The final answer is not enough.
For every intent, define the DecisionRecord the product expects.
decision_record:
intent: vendor.onboarding.compliance_check
must_include:
- vendor_id
- task_template_id
- context_pack_version
- policy_bundle_version
- evidence_refs
- tool_results
- critic_verdict
- unresolved_obligations
- escalation_reason
- trace_idThis is the product receipt. It is what support, compliance, and engineering inspect when something goes wrong.
Define acceptance criteria against traces
A PM should not accept “the response looked good” as a launch gate.
Use trace-based acceptance:
| Acceptance target | What to inspect |
|---|---|
| Intent accuracy | Did the trace classify the request correctly? |
| Context quality | Did the compiled context include required evidence? |
| Tool use | Did the agent choose the right tool with valid args? |
| Policy | Did the right approval mode apply? |
| Escalation | Did ambiguous cases route to human review? |
| Receipt | Did the DecisionRecord explain the outcome? |
This aligns with OpenAI’s agent eval guidance: traces are the fastest way to identify workflow-level issues while behavior is still being debugged; datasets and eval runs come after the failure modes are known.
The PM-owned eval seed set
Before engineering tunes prompts, the PM should provide the first eval seed set.
Start with 25 examples:
| Example type | Count | Purpose |
|---|---|---|
| Straight-through success | 5 | Shows the happy path |
| Missing evidence | 5 | Tests refusal and clarification |
| Policy denial | 5 | Tests must-refuse behavior |
| Approval required | 5 | Tests human gate routing |
| Ambiguous / edge | 5 | Tests escalation quality |
For each row, include:
input: "Please activate vendor ACME for payments."
expected_intent: vendor.onboarding.payment_activation
expected_verdict: gate_required
required_evidence:
- signed_contract
- bank_verification
- finance_approval
must_not:
- activate_without_gate
- claim approval already existsThis seed set becomes the start of the dev and release_test split.
The PRD review meeting
Do not review agent PRDs only with design and engineering.
Include:
- operator,
- policy owner,
- security or compliance reviewer,
- data owner,
- support lead,
- engineering lead,
- product analytics owner.
The goal is not consensus on every prompt. The goal is agreement on work boundaries, evidence, authority, and scorecards.
Done means cataloged
The PRD is ready when every supported behavior maps to a ContextOS artifact:
| PRD section | ContextOS artifact |
|---|---|
| User problem | Intent |
| Workflow path | TaskTemplate |
| Required evidence | Context Pack |
| Tool access | Tool Gateway manifest |
| Authority | RunContext + ApprovalMode |
| Product rule | Policy bundle |
| Outcome receipt | DecisionRecord |
| Launch criteria | Scorecard + eval set |
| Correction path | FeedbackStore + Improvement Loop |
That is how a product idea becomes an agentic system engineers can build safely.