Agentic Context Engineering

The Context-plane discipline of compiling governed, bounded, observable context for each agent run.

Foundational SpecLast reviewed: 2026-05-12 Edit on GitHub

At a glance

Context planePer-request compilation

Context as a compiled runtime artifact: declared in packs, resolved under policy, budgeted, traced, and improved from replay.

Inputs

RunContext with tenant, actor, intent, safety mode, budget, and trace
Versioned Context Pack declarations
Policy, tool, evidence, memory, and session state
Pressure signals from prior evaluated runs

Outputs

CompiledContext
Policy, tool, evidence, memory, and omission manifests
Runtime controls and budget report
Diagnostics for the Improvement Loop

Canonical types

ContextPack
CompiledContext
BudgetReport
ContextManifest

Agentic Context Engineering is the discipline of deciding what the model is allowed to see for a specific run, then making that decision reproducible.

It is not prompt writing. A prompt is one output of the Context plane. The real artifact is the CompiledContext: the bounded prompt plus the manifests, controls, omissions, source versions, and budget report that explain how it was produced.

Definition

Agentic Context Engineering turns a versioned Context Pack, a RunContext, and runtime evidence into a CompiledContext envelope.

The envelope must answer four questions:

Question	Required artifact
What source material was eligible?	pack version, registry refs, policy refs, memory refs, evidence refs
What did the compiler include?	compiled prompt, bucket manifests, tool manifest, evidence manifest
What did it exclude or compress?	omission manifest, truncation reason, budget report
Which controls were active?	runtime controls, approval gates, redaction rules, loop guards

If those artifacts are missing, the system may still be using good prompts, but it is not doing ContextOS-grade context engineering.

Why It Exists

Enterprise agents fail when context assembly is informal:

Policy is copied into prompts and quietly goes stale.
Tool availability is implied rather than bound to identity and approval mode.
Retrieval injects evidence without provenance, freshness, or access scope.
Memory writes return as future context before consent or promotion.
Token pressure silently drops the facts needed for an auditable decision.

The Context plane makes context assembly explicit. It lets the Decision plane reason over a known input envelope and lets the Trust plane replay the run later.

Compiler Contract

The Context Pack Compiler follows the same eight-stage shape used in How It Works:

Stage	Purpose	Output
Intent classification	Resolve the requested intent and risk class.	intent binding, confidence, unresolved entities
Policy resolution	Evaluate relevant policy bundles against the run.	policy manifest, active gates, prohibitions
Tool surfacing	Intersect registry, permissions, policy, and safety mode.	tool manifest
Evidence retrieval	Retrieve scoped evidence under ontology, identity, and graph rules.	evidence manifest
Memory recall	Select eligible promoted memory only.	memory manifest
Token budget allocation	Allocate per-bucket limits from the run budget.	budget plan
Bucket assembly	Rank, redact, compress, and pack context blocks.	compiled prompt, omission manifest
Runtime controls	Emit controls for the Decision and Action planes.	loop guards, approval gates, refusal/escalation rules

The compiler may use model assistance for classification, summarization, or compression, but the final envelope must still be typed, versioned, and traceable.

Context Buckets

ContextOS uses buckets so teams can reason about pressure and governance consistently.

Bucket	Typical content	Common failure
`business`	goals, tone, operating rules, domain constraints	generic behavior that ignores business reality
`policy`	active rules, gates, refusals, escalation conditions	model acts from stale or informal policy
`tool`	eligible capabilities, schemas, approval modes	planner calls tools it cannot execute
`evidence`	retrieved records, documents, graph facts, receipts	unsupported claims or wrong joins
`memory`	promoted episodic, semantic, and durable memory	contaminated or unapproved recall
`session`	recent conversation and run state	loss of continuity or repeated work

Working state that only belongs to the current turn should remain session state. Durable recall belongs in memory only after the Memory Model promotion rules allow it.

CompiledContext Shape

The implementation guide owns the full schema. A useful mental model:

{
  "compiled_context_id": "cc_run_refund_001",
  "run_id": "run_refund_001",
  "pack_ref": "ctxpack.support@1.0.0",
  "compiled_prompt_ref": "sha256:...",
  "manifests": {
    "policy": ["POLICY_RETURNS_V1:R_HIGH_VALUE_REQUIRES_APPROVAL"],
    "tools": ["adp_orders.lookup", "adp_payments.issue_refund"],
    "evidence": ["kg:order:ord_881#snapshot_2026_05_09"],
    "memory": [],
    "omissions": [
      {
        "bucket": "session",
        "reason": "budget_pressure",
        "source_ref": "msg_older_than_window"
      }
    ]
  },
  "runtime_controls": {
    "max_tool_calls": 8,
    "max_replan_attempts": 2,
    "approval_gates_active": ["GATE_FINANCE_APPROVAL"],
    "must_refuse": [],
    "must_escalate": ["fraud_signal_high"]
  },
  "budget_report": {
    "limit_tokens": 12000,
    "used_tokens": 7420,
    "truncated": true
  }
}

Quality Signals

Context quality is not a vibe. The compiler should emit signals that can be scored later:

Signal	Meaning
Relevance	The block matched the current intent and entities.
Authority	The source is allowed and trusted for this decision type.
Freshness	The block is inside its validity window or snapshot pin.
Density	The block carries useful information per token.
Coverage	Required evidence and policy inputs are present.
Pressure	Budget pressure did not remove required material.

These signals feed Evaluation and Observability and the Improvement Loop. They should be stored as diagnostics, not hidden in logs.

Autotune surfaces

The Context plane is the safest first place to apply autotune because many changes are bounded, replayable, and reversible. A Context Pack may declare tunable surfaces, but the compiler must refuse proposals outside that declaration.

Surface	Candidate examples	Guardrail
Retrieval	`top_k`, `max_hops`, source priority, freshness window	Required evidence coverage and authority score cannot regress.
Bucket budgets	`evidence_tokens`, `memory_tokens`, `tool_tokens`, compression threshold	Required blocks cannot be omitted under budget pressure.
Prompt fragments	Small instruction-block changes with token and forbidden-term limits	Policy, redaction, and tool manifests remain outside model discretion.
Memory recall	promoted-memory class filters, recency window, contradiction handling	Unpromoted or consent-missing memory cannot enter `CompiledContext`.
Runtime controls	loop guards, escalation hints, refusal messages	Approval gates and must-refuse rules cannot be weakened by a cost target.

Every candidate should produce a TuningProposal that names the target intent, target metric, baseline pack, replay sets, expected scorecard delta, and rollback target. The Context plane does not promote the candidate; the Trust plane gates it through replay, review, and staged rollout.

Boundary With Adjacent Pages

Page	Owns
Cognitive Core	where the compiler sits in the runtime loop
Context Pack	the concrete pack schema and lifecycle
API Contracts	invocation and runtime envelope examples
Memory Model	what can be recalled or written back
Governance	policy and approval-mode taxonomy

Agentic Context Engineering is the discipline. The Context Pack Compiler is the component. The Context Pack is the source artifact. CompiledContext is the runtime artifact.

Failure Modes

Failure	Runtime response
Required evidence is missing	return a typed `missing_evidence` verdict before planning destructive action
Policy bundle fails to evaluate	fail closed for enforced policy; do not treat as “not fired”
Tool manifest conflicts with safety mode	exclude the tool and record the exclusion
Memory candidate has no promotion or consent record	omit from compiled context
Token budget drops a required block	reject or escalate instead of silently truncating
Source snapshot cannot be pinned	mark replay as incomplete and block high-risk decisions

Operational Metrics

Context compile latency by stage.
Required evidence coverage.
Omission rate by bucket and reason.
Tool manifest eligibility mismatch rate.
Policy manifest evaluation error rate.
Replay match rate for CompiledContext reconstruction.
Budget pressure by intent and pack version.
Decision quality delta after context-pack changes.

Example

For support.refund, the compiler should include:

the active refund policy bundle,
the eligible read and refund tools,
order and customer evidence refs,
the support.refund.execute decision binding,
GATE_FINANCE_APPROVAL when the amount exceeds INR 3000,
a budget report showing whether any evidence or session state was compressed.

The Planner can then propose work against a known envelope. The Critic can reject a plan if required context is missing before the refund tool is ever called.

Common Misconceptions

“Better prompts solve context.” Better prompts help, but governance comes from typed inputs, manifests, and replay.
“Retrieval is the Context plane.” Retrieval is one input. The Context plane also resolves policy, tools, memory, budgets, and controls.
“Summaries are always safe compression.” Summaries must carry source refs, transformation metadata, and quality signals.
“The model can decide which policy matters.” Policy selection and enforcement happen outside model discretion.