Skip to content
Press / to search

Agentic Context Engineering

The Context-plane discipline of compiling governed, bounded, observable context for each agent run.

Foundational SpecLast reviewed: Edit on GitHub
At a glance
Context planePer-request compilation

Context as a compiled runtime artifact: declared in packs, resolved under policy, budgeted, traced, and improved from replay.

Inputs
  • RunContext with tenant, actor, intent, safety mode, budget, and trace
  • Versioned Context Pack declarations
  • Policy, tool, evidence, memory, and session state
  • Pressure signals from prior evaluated runs
Outputs
  • CompiledContext
  • Policy, tool, evidence, memory, and omission manifests
  • Runtime controls and budget report
  • Diagnostics for the Improvement Loop
Canonical types
  • ContextPack
  • CompiledContext
  • BudgetReport
  • ContextManifest

Agentic Context Engineering is the discipline of deciding what the model is allowed to see for a specific run, then making that decision reproducible.

It is not prompt writing. A prompt is one output of the Context plane. The real artifact is the CompiledContext: the bounded prompt plus the manifests, controls, omissions, source versions, and budget report that explain how it was produced.

Definition

Agentic Context Engineering turns a versioned Context Pack, a RunContext, and runtime evidence into a CompiledContext envelope.

The envelope must answer four questions:

QuestionRequired artifact
What source material was eligible?pack version, registry refs, policy refs, memory refs, evidence refs
What did the compiler include?compiled prompt, bucket manifests, tool manifest, evidence manifest
What did it exclude or compress?omission manifest, truncation reason, budget report
Which controls were active?runtime controls, approval gates, redaction rules, loop guards

If those artifacts are missing, the system may still be using good prompts, but it is not doing ContextOS-grade context engineering.

Why It Exists

Enterprise agents fail when context assembly is informal:

  • Policy is copied into prompts and quietly goes stale.
  • Tool availability is implied rather than bound to identity and approval mode.
  • Retrieval injects evidence without provenance, freshness, or access scope.
  • Memory writes return as future context before consent or promotion.
  • Token pressure silently drops the facts needed for an auditable decision.

The Context plane makes context assembly explicit. It lets the Decision plane reason over a known input envelope and lets the Trust plane replay the run later.

Compiler Contract

The Context Pack Compiler follows the same eight-stage shape used in How It Works:

StagePurposeOutput
Intent classificationResolve the requested intent and risk class.intent binding, confidence, unresolved entities
Policy resolutionEvaluate relevant policy bundles against the run.policy manifest, active gates, prohibitions
Tool surfacingIntersect registry, permissions, policy, and safety mode.tool manifest
Evidence retrievalRetrieve scoped evidence under ontology, identity, and graph rules.evidence manifest
Memory recallSelect eligible promoted memory only.memory manifest
Token budget allocationAllocate per-bucket limits from the run budget.budget plan
Bucket assemblyRank, redact, compress, and pack context blocks.compiled prompt, omission manifest
Runtime controlsEmit controls for the Decision and Action planes.loop guards, approval gates, refusal/escalation rules

The compiler may use model assistance for classification, summarization, or compression, but the final envelope must still be typed, versioned, and traceable.

Context Buckets

ContextOS uses buckets so teams can reason about pressure and governance consistently.

BucketTypical contentCommon failure
businessgoals, tone, operating rules, domain constraintsgeneric behavior that ignores business reality
policyactive rules, gates, refusals, escalation conditionsmodel acts from stale or informal policy
tooleligible capabilities, schemas, approval modesplanner calls tools it cannot execute
evidenceretrieved records, documents, graph facts, receiptsunsupported claims or wrong joins
memorypromoted episodic, semantic, and durable memorycontaminated or unapproved recall
sessionrecent conversation and run stateloss of continuity or repeated work

Working state that only belongs to the current turn should remain session state. Durable recall belongs in memory only after the Memory Model promotion rules allow it.

CompiledContext Shape

The implementation guide owns the full schema. A useful mental model:

{
  "compiled_context_id": "cc_run_refund_001",
  "run_id": "run_refund_001",
  "pack_ref": "ctxpack.support@1.0.0",
  "compiled_prompt_ref": "sha256:...",
  "manifests": {
    "policy": ["POLICY_RETURNS_V1:R_HIGH_VALUE_REQUIRES_APPROVAL"],
    "tools": ["adp_orders.lookup", "adp_payments.issue_refund"],
    "evidence": ["kg:order:ord_881#snapshot_2026_05_09"],
    "memory": [],
    "omissions": [
      {
        "bucket": "session",
        "reason": "budget_pressure",
        "source_ref": "msg_older_than_window"
      }
    ]
  },
  "runtime_controls": {
    "max_tool_calls": 8,
    "max_replan_attempts": 2,
    "approval_gates_active": ["GATE_FINANCE_APPROVAL"],
    "must_refuse": [],
    "must_escalate": ["fraud_signal_high"]
  },
  "budget_report": {
    "limit_tokens": 12000,
    "used_tokens": 7420,
    "truncated": true
  }
}

Quality Signals

Context quality is not a vibe. The compiler should emit signals that can be scored later:

SignalMeaning
RelevanceThe block matched the current intent and entities.
AuthorityThe source is allowed and trusted for this decision type.
FreshnessThe block is inside its validity window or snapshot pin.
DensityThe block carries useful information per token.
CoverageRequired evidence and policy inputs are present.
PressureBudget pressure did not remove required material.

These signals feed Evaluation and Observability and the Improvement Loop. They should be stored as diagnostics, not hidden in logs.

Autotune surfaces

The Context plane is the safest first place to apply autotune because many changes are bounded, replayable, and reversible. A Context Pack may declare tunable surfaces, but the compiler must refuse proposals outside that declaration.

SurfaceCandidate examplesGuardrail
Retrievaltop_k, max_hops, source priority, freshness windowRequired evidence coverage and authority score cannot regress.
Bucket budgetsevidence_tokens, memory_tokens, tool_tokens, compression thresholdRequired blocks cannot be omitted under budget pressure.
Prompt fragmentsSmall instruction-block changes with token and forbidden-term limitsPolicy, redaction, and tool manifests remain outside model discretion.
Memory recallpromoted-memory class filters, recency window, contradiction handlingUnpromoted or consent-missing memory cannot enter CompiledContext.
Runtime controlsloop guards, escalation hints, refusal messagesApproval gates and must-refuse rules cannot be weakened by a cost target.

Every candidate should produce a TuningProposal that names the target intent, target metric, baseline pack, replay sets, expected scorecard delta, and rollback target. The Context plane does not promote the candidate; the Trust plane gates it through replay, review, and staged rollout.

Boundary With Adjacent Pages

PageOwns
Cognitive Corewhere the compiler sits in the runtime loop
Context Packthe concrete pack schema and lifecycle
API Contractsinvocation and runtime envelope examples
Memory Modelwhat can be recalled or written back
Governancepolicy and approval-mode taxonomy

Agentic Context Engineering is the discipline. The Context Pack Compiler is the component. The Context Pack is the source artifact. CompiledContext is the runtime artifact.

Failure Modes

FailureRuntime response
Required evidence is missingreturn a typed missing_evidence verdict before planning destructive action
Policy bundle fails to evaluatefail closed for enforced policy; do not treat as “not fired”
Tool manifest conflicts with safety modeexclude the tool and record the exclusion
Memory candidate has no promotion or consent recordomit from compiled context
Token budget drops a required blockreject or escalate instead of silently truncating
Source snapshot cannot be pinnedmark replay as incomplete and block high-risk decisions

Operational Metrics

  • Context compile latency by stage.
  • Required evidence coverage.
  • Omission rate by bucket and reason.
  • Tool manifest eligibility mismatch rate.
  • Policy manifest evaluation error rate.
  • Replay match rate for CompiledContext reconstruction.
  • Budget pressure by intent and pack version.
  • Decision quality delta after context-pack changes.

Example

For support.refund, the compiler should include:

  • the active refund policy bundle,
  • the eligible read and refund tools,
  • order and customer evidence refs,
  • the support.refund.execute decision binding,
  • GATE_FINANCE_APPROVAL when the amount exceeds INR 3000,
  • a budget report showing whether any evidence or session state was compressed.

The Planner can then propose work against a known envelope. The Critic can reject a plan if required context is missing before the refund tool is ever called.

Common Misconceptions

  • “Better prompts solve context.” Better prompts help, but governance comes from typed inputs, manifests, and replay.
  • “Retrieval is the Context plane.” Retrieval is one input. The Context plane also resolves policy, tools, memory, budgets, and controls.
  • “Summaries are always safe compression.” Summaries must carry source refs, transformation metadata, and quality signals.
  • “The model can decide which policy matters.” Policy selection and enforcement happen outside model discretion.