Agentic Context Engineering
The Context-plane discipline of compiling governed, bounded, observable context for each agent run.
Context as a compiled runtime artifact: declared in packs, resolved under policy, budgeted, traced, and improved from replay.
- RunContext with tenant, actor, intent, safety mode, budget, and trace
- Versioned Context Pack declarations
- Policy, tool, evidence, memory, and session state
- Pressure signals from prior evaluated runs
- CompiledContext
- Policy, tool, evidence, memory, and omission manifests
- Runtime controls and budget report
- Diagnostics for the Improvement Loop
- ContextPack
- CompiledContext
- BudgetReport
- ContextManifest
Agentic Context Engineering is the discipline of deciding what the model is allowed to see for a specific run, then making that decision reproducible.
It is not prompt writing. A prompt is one output of the Context plane. The real artifact is the CompiledContext: the bounded prompt plus the manifests, controls, omissions, source versions, and budget report that explain how it was produced.
Definition
Agentic Context Engineering turns a versioned Context Pack, a RunContext, and runtime evidence into a CompiledContext envelope.
The envelope must answer four questions:
| Question | Required artifact |
|---|---|
| What source material was eligible? | pack version, registry refs, policy refs, memory refs, evidence refs |
| What did the compiler include? | compiled prompt, bucket manifests, tool manifest, evidence manifest |
| What did it exclude or compress? | omission manifest, truncation reason, budget report |
| Which controls were active? | runtime controls, approval gates, redaction rules, loop guards |
If those artifacts are missing, the system may still be using good prompts, but it is not doing ContextOS-grade context engineering.
Why It Exists
Enterprise agents fail when context assembly is informal:
- Policy is copied into prompts and quietly goes stale.
- Tool availability is implied rather than bound to identity and approval mode.
- Retrieval injects evidence without provenance, freshness, or access scope.
- Memory writes return as future context before consent or promotion.
- Token pressure silently drops the facts needed for an auditable decision.
The Context plane makes context assembly explicit. It lets the Decision plane reason over a known input envelope and lets the Trust plane replay the run later.
Compiler Contract
The Context Pack Compiler follows the same eight-stage shape used in How It Works:
| Stage | Purpose | Output |
|---|---|---|
| Intent classification | Resolve the requested intent and risk class. | intent binding, confidence, unresolved entities |
| Policy resolution | Evaluate relevant policy bundles against the run. | policy manifest, active gates, prohibitions |
| Tool surfacing | Intersect registry, permissions, policy, and safety mode. | tool manifest |
| Evidence retrieval | Retrieve scoped evidence under ontology, identity, and graph rules. | evidence manifest |
| Memory recall | Select eligible promoted memory only. | memory manifest |
| Token budget allocation | Allocate per-bucket limits from the run budget. | budget plan |
| Bucket assembly | Rank, redact, compress, and pack context blocks. | compiled prompt, omission manifest |
| Runtime controls | Emit controls for the Decision and Action planes. | loop guards, approval gates, refusal/escalation rules |
The compiler may use model assistance for classification, summarization, or compression, but the final envelope must still be typed, versioned, and traceable.
Context Buckets
ContextOS uses buckets so teams can reason about pressure and governance consistently.
| Bucket | Typical content | Common failure |
|---|---|---|
business | goals, tone, operating rules, domain constraints | generic behavior that ignores business reality |
policy | active rules, gates, refusals, escalation conditions | model acts from stale or informal policy |
tool | eligible capabilities, schemas, approval modes | planner calls tools it cannot execute |
evidence | retrieved records, documents, graph facts, receipts | unsupported claims or wrong joins |
memory | promoted episodic, semantic, and durable memory | contaminated or unapproved recall |
session | recent conversation and run state | loss of continuity or repeated work |
Working state that only belongs to the current turn should remain session state. Durable recall belongs in memory only after the Memory Model promotion rules allow it.
CompiledContext Shape
The implementation guide owns the full schema. A useful mental model:
{
"compiled_context_id": "cc_run_refund_001",
"run_id": "run_refund_001",
"pack_ref": "ctxpack.support@1.0.0",
"compiled_prompt_ref": "sha256:...",
"manifests": {
"policy": ["POLICY_RETURNS_V1:R_HIGH_VALUE_REQUIRES_APPROVAL"],
"tools": ["adp_orders.lookup", "adp_payments.issue_refund"],
"evidence": ["kg:order:ord_881#snapshot_2026_05_09"],
"memory": [],
"omissions": [
{
"bucket": "session",
"reason": "budget_pressure",
"source_ref": "msg_older_than_window"
}
]
},
"runtime_controls": {
"max_tool_calls": 8,
"max_replan_attempts": 2,
"approval_gates_active": ["GATE_FINANCE_APPROVAL"],
"must_refuse": [],
"must_escalate": ["fraud_signal_high"]
},
"budget_report": {
"limit_tokens": 12000,
"used_tokens": 7420,
"truncated": true
}
}Quality Signals
Context quality is not a vibe. The compiler should emit signals that can be scored later:
| Signal | Meaning |
|---|---|
| Relevance | The block matched the current intent and entities. |
| Authority | The source is allowed and trusted for this decision type. |
| Freshness | The block is inside its validity window or snapshot pin. |
| Density | The block carries useful information per token. |
| Coverage | Required evidence and policy inputs are present. |
| Pressure | Budget pressure did not remove required material. |
These signals feed Evaluation and Observability and the Improvement Loop. They should be stored as diagnostics, not hidden in logs.
Autotune surfaces
The Context plane is the safest first place to apply autotune because many changes are bounded, replayable, and reversible. A Context Pack may declare tunable surfaces, but the compiler must refuse proposals outside that declaration.
| Surface | Candidate examples | Guardrail |
|---|---|---|
| Retrieval | top_k, max_hops, source priority, freshness window | Required evidence coverage and authority score cannot regress. |
| Bucket budgets | evidence_tokens, memory_tokens, tool_tokens, compression threshold | Required blocks cannot be omitted under budget pressure. |
| Prompt fragments | Small instruction-block changes with token and forbidden-term limits | Policy, redaction, and tool manifests remain outside model discretion. |
| Memory recall | promoted-memory class filters, recency window, contradiction handling | Unpromoted or consent-missing memory cannot enter CompiledContext. |
| Runtime controls | loop guards, escalation hints, refusal messages | Approval gates and must-refuse rules cannot be weakened by a cost target. |
Every candidate should produce a TuningProposal that names the target intent, target metric, baseline pack, replay sets, expected scorecard delta, and rollback target. The Context plane does not promote the candidate; the Trust plane gates it through replay, review, and staged rollout.
Boundary With Adjacent Pages
| Page | Owns |
|---|---|
| Cognitive Core | where the compiler sits in the runtime loop |
| Context Pack | the concrete pack schema and lifecycle |
| API Contracts | invocation and runtime envelope examples |
| Memory Model | what can be recalled or written back |
| Governance | policy and approval-mode taxonomy |
Agentic Context Engineering is the discipline. The Context Pack Compiler is the component. The Context Pack is the source artifact. CompiledContext is the runtime artifact.
Failure Modes
| Failure | Runtime response |
|---|---|
| Required evidence is missing | return a typed missing_evidence verdict before planning destructive action |
| Policy bundle fails to evaluate | fail closed for enforced policy; do not treat as “not fired” |
| Tool manifest conflicts with safety mode | exclude the tool and record the exclusion |
| Memory candidate has no promotion or consent record | omit from compiled context |
| Token budget drops a required block | reject or escalate instead of silently truncating |
| Source snapshot cannot be pinned | mark replay as incomplete and block high-risk decisions |
Operational Metrics
- Context compile latency by stage.
- Required evidence coverage.
- Omission rate by bucket and reason.
- Tool manifest eligibility mismatch rate.
- Policy manifest evaluation error rate.
- Replay match rate for
CompiledContextreconstruction. - Budget pressure by intent and pack version.
- Decision quality delta after context-pack changes.
Example
For support.refund, the compiler should include:
- the active refund policy bundle,
- the eligible read and refund tools,
- order and customer evidence refs,
- the
support.refund.executedecision binding, GATE_FINANCE_APPROVALwhen the amount exceeds INR 3000,- a budget report showing whether any evidence or session state was compressed.
The Planner can then propose work against a known envelope. The Critic can reject a plan if required context is missing before the refund tool is ever called.
Common Misconceptions
- “Better prompts solve context.” Better prompts help, but governance comes from typed inputs, manifests, and replay.
- “Retrieval is the Context plane.” Retrieval is one input. The Context plane also resolves policy, tools, memory, budgets, and controls.
- “Summaries are always safe compression.” Summaries must carry source refs, transformation metadata, and quality signals.
- “The model can decide which policy matters.” Policy selection and enforcement happen outside model discretion.