The first compile function we wrote was a 600-line method on a class called ContextBuilder. It mixed prompt assembly, policy resolution, tool filtering, and budget arithmetic in the same code path. Adding a single new bucket meant changing four places. Debugging a missing evidence ref meant reading the whole file.
Eight stages and eight short files later, the compile is something a new engineer can hold in their head in an afternoon. The trick was not making it shorter; it was naming the boundaries. The Context Pack Compiler is the most central piece of the harness — it is what produces the CompiledContext the model actually sees — and it earns being eight files instead of one.
The canonical spec is in Context Pack Compiler and the Context Pack itself in Context Pack. This post is the build-along — eight stages, eight TypeScript files, with what enters and what leaves at each stage.
2026 update: compile boundaries are audit boundaries
The eight-stage split is not only a maintainability trick. Each stage is an audit boundary. Intent, policy, tools, evidence, memory, budget, buckets, and manifests each answer a different incident question, and a useful replay diff should be able to name the stage where drift appeared.
That is why the compiler should emit stage-level hashes and diagnostics, not only a final prompt. If a replay fails, “bucket assembly changed” is actionable. “The prompt changed” is not.
The orchestrator
The whole compile is 20 lines once the stages are factored out:
import type { ContextPack, RunContext, InvokeRequest, CompiledContext } from "@/types"
import { classifyIntent } from "./intent"
import { resolvePolicy } from "./policy"
import { surfaceTools } from "./tools"
import { retrieveEvidence } from "./evidence"
import { recallMemory } from "./memory"
import { allocateBudget } from "./budget"
import { assembleBuckets } from "./buckets"
import { emitManifests } from "./manifests"
export async function compile(
pack: ContextPack,
ctx: RunContext,
req: InvokeRequest,
): Promise<CompiledContext> {
const intent = classifyIntent(pack, ctx, req)
const policy = resolvePolicy(pack, ctx, req, intent)
const tools = surfaceTools(pack, ctx, policy)
const evidence = await retrieveEvidence(pack, ctx, req, intent)
const memory = await recallMemory(pack, ctx, intent)
const budget = allocateBudget(pack, ctx, intent)
const buckets = assembleBuckets({ pack, ctx, req, intent, policy, tools, evidence, memory, budget })
return emitManifests({ pack, ctx, intent, policy, tools, evidence, memory, buckets, budget })
}Three properties of this orchestrator earn their keep.
It is pure — no I/O except the two stages that explicitly take async (evidence and memory), and even those receive their data from caller-supplied retrievers. Replay reproduces the same CompiledContext from the same inputs every time.
Each stage is a single function with typed input and output. There is no shared mutable state. The intent stage cannot accidentally read a tool, the budget stage cannot accidentally write a manifest. The boundaries enforce themselves.
The output is one envelope, the CompiledContext. Everything the Decision plane needs is in it; everything not in it is invisible to the model.
Stage 1 — intent classification
What enters: the request and the run context.
What leaves: a typed Intent from the Intent-Task Catalog.
import type { ContextPack, RunContext, InvokeRequest } from "@/types"
export type Intent = {
id: string // "support.refund.execute"
class: string // "support_low_risk" | "support_high_value" | ...
task: string // matches a row in the intent-task-catalog
}
export function classifyIntent(
pack: ContextPack,
ctx: RunContext,
req: InvokeRequest,
): Intent {
// request carries the proposed intent; we trust it but verify it against the catalog
const proposed = req.input.intent
const row = pack.intent_layer.catalog.find((r) => r.id === proposed)
if (!row) {
throw new Error(`intent ${proposed} not in pack ${pack.contract_meta.contract_name}`)
}
return { id: row.id, class: row.intent_class, task: row.task_id }
}The classifier verifies the intent is one the pack has signed up to handle. An unknown intent is a refusal here, not a guess later. The full taxonomy is in the Intent-Task Catalog.
Stage 2 — policy resolution
What enters: the policy bundle, the run context, the resolved intent.
What leaves: a typed list of PolicyDecision envelopes — which rules fired, what they require, what they forbid.
import type { ContextPack, RunContext, InvokeRequest } from "@/types"
import type { Intent } from "./intent"
import jsonLogic from "json-logic-js"
export type PolicyDecision = {
policy_decision_id: string // pol_9900
rule_id: string // R_REFUND_REQUIRES_IDV
bundle_id: string // POLICY_RETURNS_V4
verdict: "allow" | "deny" | "require"
requires: string[] // capability/evidence ids the rule demands
forbids: string[] // capability ids the rule prohibits
rationale: string
}
export function resolvePolicy(
pack: ContextPack,
ctx: RunContext,
req: InvokeRequest,
intent: Intent,
): PolicyDecision[] {
const evalCtx = { run_context: ctx, request: req, intent }
const out: PolicyDecision[] = []
// bundles in priority order; first match-and-deny ends the cascade for that capability
for (const bundle of pack.policy_layer.policy_bundles
.slice()
.sort((a, b) => b.priority - a.priority)) {
for (const rule of bundle.policy_dsl.rules) {
if (!appliesToIntent(rule, intent)) continue
if (jsonLogic.apply(rule.if, evalCtx)) {
out.push({
policy_decision_id: `pol_${shortid()}`,
rule_id: rule.rule_id,
bundle_id: bundle.bundle_id,
verdict: rule.then.allow === false ? "deny"
: rule.then.requires?.length ? "require"
: "allow",
requires: rule.then.requires ?? [],
forbids: rule.then.forbids ?? [],
rationale: rule.rationale ?? "",
})
}
}
}
return out
}
function appliesToIntent(rule: any, intent: Intent): boolean {
return !rule.applies_to?.intent || rule.applies_to.intent === intent.id
}JsonLogic predicates evaluated over a uniform evalCtx. The function does not modify anything; it produces the typed list of decisions, which downstream stages consume. The full Policy-Engine spec is in Governance.
Stage 3 — tool surfacing
What enters: registry, permissions, prohibitions, the resolved policy decisions.
What leaves: a list of callable tools whose effective approval_mode does not exceed safety_mode.
import type { ContextPack, RunContext } from "@/types"
import type { PolicyDecision } from "./policy"
import { MODE_RANK, type ApprovalMode } from "@/tools/types"
export type SurfacedTool = {
adapter_id: string
capability_id: string
approval_mode: ApprovalMode // resolved (manifest ceiling, possibly downgraded)
schema_ref: string
}
export function surfaceTools(
pack: ContextPack,
ctx: RunContext,
policy: PolicyDecision[],
): SurfacedTool[] {
const forbidden = new Set(policy.flatMap((d) => d.forbids))
const out: SurfacedTool[] = []
for (const adp of pack.tooling_layer.adapter_registry) {
for (const cap of adp.capabilities) {
const key = `${adp.adapter_id}.${cap.id}`
// permission required
const perm = pack.tooling_layer.permissions.find(
(p) => p.adapter_id === adp.adapter_id && p.capability === cap.id && p.allow,
)
if (!perm) continue
// run-level prohibitions (e.g. policy forbids)
if (ctx.prohibitions?.some((p) => p.adapter_id === adp.adapter_id
&& p.capability === cap.id)) continue
if (forbidden.has(key)) continue
// mode resolution: manifest ceiling, downgraded by policy if any
const downgrade = policy.find((d) => d.rule_id.endsWith(`__downgrade_${key}`))
const mode: ApprovalMode = (downgrade as any)?.downgrade_to ?? cap.approval_mode
// refuse to surface anything above the run's safety_mode
if (MODE_RANK[mode] > MODE_RANK[ctx.safety_mode]) continue
out.push({
adapter_id: adp.adapter_id,
capability_id: cap.id,
approval_mode: mode,
schema_ref: cap.schema_ref ?? adp.schema_ref,
})
}
}
return out
}This is the same Registry ∩ Permissions − Prohibitions shape from Build the Tool Gateway, now run at compile time on the surface the model will see, rather than at call time on a single proposed call. Compile-time surfacing keeps the model from even seeing tools it cannot use.
Stage 4 — evidence retrieval
What enters: the intent and the request; the caller supplies a retriever.
What leaves: a list of EvidenceRefs, each pinned to a content-addressed snapshot.
import type { ContextPack, RunContext, InvokeRequest } from "@/types"
import type { Intent } from "./intent"
export type EvidenceRef = {
id: string // "kg:order:ord_881#snapshot_kg_2026_05_09_T0930"
class: string // "order" | "refund_window_evidence" | ...
classification: string // "PII" | "INTERNAL" | "PUBLIC"
payload_hash: string // sha256 of the resolved payload at snapshot time
}
export type EvidenceRetriever = (req: {
intent_id: string
query: Record<string, unknown>
}) => Promise<EvidenceRef[]>
export async function retrieveEvidence(
pack: ContextPack,
ctx: RunContext,
req: InvokeRequest,
intent: Intent,
): Promise<EvidenceRef[]> {
// The compiler does not call the KG itself; it asks the caller-supplied retriever.
// This keeps the compile pure and replay-deterministic.
const retriever: EvidenceRetriever = (ctx as any).deps?.evidence
if (!retriever) return []
const refs = await retriever({
intent_id: intent.id,
query: req.input.context_query ?? {},
})
// refuse refs whose snapshot does not match the run's pinned snapshot
const expectedSnapshot = ctx.kg_snapshot_id
return refs.filter((r) => r.id.includes(expectedSnapshot))
}Two design choices to call out.
The compiler does not call the Knowledge Graph itself. The retriever is injected. This keeps compile-time pure and makes replay-against-recorded-evidence trivial. The caller (live runtime, replay harness, simulator) is the one that decides where the evidence comes from.
The compiler filters by pinned snapshot. If the retriever returns a ref whose snapshot id does not match the run’s pinned KG snapshot, that ref is dropped. This is what guarantees that two compiles of the same pack against the same snapshot produce the same evidence list.
Stage 5 — memory recall
What enters: the intent. What leaves: a list of promoted memory recalls, never raw capture.
import type { ContextPack, RunContext } from "@/types"
import type { Intent } from "./intent"
export type PromotedMemory = {
id: string
intent_id: string
tier: "working" | "episodic" | "semantic" | "durable"
text: string
promoted_at: string
classification: string
}
export type MemoryRecaller = (req: {
intent_id: string
tenant_id: string
user_id?: string
}) => Promise<PromotedMemory[]>
export async function recallMemory(
pack: ContextPack,
ctx: RunContext,
intent: Intent,
): Promise<PromotedMemory[]> {
const recall: MemoryRecaller = (ctx as any).deps?.memory
if (!recall) return []
const recalls = await recall({
intent_id: intent.id,
tenant_id: ctx.tenant_id,
user_id: ctx.user_id,
})
// memory layer policy: max recalls per intent; promotion-only
const max = pack.memory_layer.recall_policy?.max_per_intent ?? 8
return recalls.slice(0, max)
}The compiler reads only promoted memory. Raw capture is invisible at this layer. The promotion path (capture → review → promote) is documented in Promotion-Aware Memory: Capture → Review → Promote in Code, and that entire pipeline is what makes the compiler’s job here so small.
Stage 6 — budget allocation
What enters: the run’s bucket_tokens budget, the intent.
What leaves: a per-bucket allocation that sums to the budget.
import type { ContextPack, RunContext } from "@/types"
import type { Intent } from "./intent"
export type Bucket = "system" | "developer" | "task" | "policy" | "tools" | "evidence" | "memory" | "session"
export type BudgetAllocation = {
total_tokens: number
per_bucket: Record<Bucket, number>
}
export function allocateBudget(
pack: ContextPack,
ctx: RunContext,
intent: Intent,
): BudgetAllocation {
const total = ctx.run_budget?.bucket_tokens ?? pack.budget_layer?.default_total ?? 8000
// per-intent split if declared; default proportional otherwise
const declared = pack.budget_layer?.splits?.[intent.id]
const split: Record<Bucket, number> = declared ?? {
system: 0.05,
developer: 0.05,
task: 0.10,
policy: 0.10,
tools: 0.15,
evidence: 0.30,
memory: 0.15,
session: 0.10,
}
const per_bucket = Object.fromEntries(
Object.entries(split).map(([k, frac]) => [k, Math.floor(total * (frac as number))]),
) as Record<Bucket, number>
return { total_tokens: total, per_bucket }
}The allocation is proportional and explicit. The split lives in the pack, which means it is versioned. When evidence keeps getting truncated for a given intent, the fix is to bump that intent’s evidence fraction in the pack — a release-gated change with golden replay, not a runtime knob.
Stage 7 — bucket assembly
What enters: every prior stage’s output.
What leaves: filled buckets; any over-budget bucket emits an explicit truncated: true flag.
import type { Intent } from "./intent"
import type { PolicyDecision } from "./policy"
import type { SurfacedTool } from "./tools"
import type { EvidenceRef } from "./evidence"
import type { PromotedMemory } from "./memory"
import type { BudgetAllocation, Bucket } from "./budget"
import { tokenize } from "@/runtime/tokens"
export type BucketContent = {
blocks: Array<{ kind: string; text: string; priority: number; truncated?: boolean }>
used_tokens: number
budget_tokens: number
truncated: boolean
}
export function assembleBuckets(args: {
pack: any; ctx: any; req: any; intent: Intent
policy: PolicyDecision[]
tools: SurfacedTool[]
evidence: EvidenceRef[]
memory: PromotedMemory[]
budget: BudgetAllocation
}): Record<Bucket, BucketContent> {
const out = {} as Record<Bucket, BucketContent>
out.system = packBucket(args.pack.tone_and_comms.system_blocks, args.budget.per_bucket.system)
out.developer = packBucket(args.pack.tone_and_comms.developer_blocks, args.budget.per_bucket.developer)
out.task = packBucket([{ kind: "task", text: args.req.input.message, priority: 100 }], args.budget.per_bucket.task)
out.policy = packBucket(policyAsBlocks(args.policy), args.budget.per_bucket.policy)
out.tools = packBucket(toolsAsBlocks(args.tools), args.budget.per_bucket.tools)
out.evidence = packBucket(evidenceAsBlocks(args.evidence), args.budget.per_bucket.evidence)
out.memory = packBucket(memoryAsBlocks(args.memory), args.budget.per_bucket.memory)
out.session = packBucket(args.req.session?.recent_turns ?? [], args.budget.per_bucket.session)
return out
}
function packBucket(
blocks: Array<{ kind: string; text: string; priority: number }>,
budget: number,
): BucketContent {
// pack by priority desc; mark truncated on each block that did not fit
const sorted = blocks.slice().sort((a, b) => b.priority - a.priority)
const fit: BucketContent["blocks"] = []
let used = 0
let truncated = false
for (const b of sorted) {
const n = tokenize(b.text).length
if (used + n <= budget) {
fit.push(b)
used += n
} else {
fit.push({ ...b, truncated: true, text: "" })
truncated = true
}
}
return { blocks: fit, used_tokens: used, budget_tokens: budget, truncated }
}The most important property of packBucket is that truncation is explicit, never silent. A block that did not fit appears in the output with truncated: true and an empty text — the manifest the next stage emits will include a bucket_truncations map that points to exactly which blocks were dropped. The audit trail covers what the model did and did not see.
Stage 8 — manifests + runtime controls
What enters: every prior stage’s output plus the assembled buckets.
What leaves: the CompiledContext envelope.
import type { Intent } from "./intent"
import type { PolicyDecision } from "./policy"
import type { SurfacedTool } from "./tools"
import type { EvidenceRef } from "./evidence"
import type { PromotedMemory } from "./memory"
import type { BudgetAllocation, Bucket } from "./budget"
import type { BucketContent } from "./buckets"
import type { CompiledContext } from "@/types"
export function emitManifests(args: {
pack: any; ctx: any; intent: Intent
policy: PolicyDecision[]
tools: SurfacedTool[]
evidence: EvidenceRef[]
memory: PromotedMemory[]
budget: BudgetAllocation
buckets: Record<Bucket, BucketContent>
}): CompiledContext {
const policy_manifest = args.policy.map((p) => ({
policy_decision_id: p.policy_decision_id,
rule_id: p.rule_id,
bundle_id: p.bundle_id,
verdict: p.verdict,
}))
const tool_manifest = args.tools.map((t) => ({
adapter_id: t.adapter_id,
capability_id: t.capability_id,
approval_mode: t.approval_mode,
}))
const evidence_manifest = args.evidence.map((e) => ({
id: e.id,
payload_hash: e.payload_hash,
classification: e.classification,
}))
const runtime_controls = {
must_refuse: args.policy.filter((p) => p.verdict === "deny").map((p) => p.rule_id),
must_escalate: args.policy.filter((p) => p.requires.includes("approval"))
.map((p) => p.rule_id),
approval_gates_active: deriveApprovalGates(args.policy, args.tools),
redaction_rules_active: args.pack.policy_layer.guardrails.redaction_rules ?? [],
}
const bucket_truncations: Record<string, number> = {}
for (const [name, bucket] of Object.entries(args.buckets)) {
if (bucket.truncated) {
bucket_truncations[name] = bucket.blocks.filter((b) => b.truncated).length
}
}
const compiled_prompt = renderPrompt(args.buckets, args.pack)
return {
compiled_prompt,
manifests: { policy_manifest, tool_manifest, evidence_manifest },
runtime_controls,
budget_report: {
allocations: args.budget.per_bucket,
used_at_compile: Object.fromEntries(
Object.entries(args.buckets).map(([k, b]) => [k, b.used_tokens]),
) as any,
bucket_truncations,
},
pack_version: args.pack.contract_meta.contract_name + "@" + args.pack.contract_meta.contract_version,
intent_id: args.intent.id,
}
}The CompiledContext is the single thing the Decision plane consumes. Anything that didn’t make it into a manifest is invisible. Anything that did make it carries enough metadata for the Critic to verify and for the replay harness to reproduce.
Compiler readiness checklist
| Stage | Must prove |
|---|---|
| Intent | Unknown or unsupported intents refuse instead of guessing. |
| Policy | Rule ids, bundle ids, verdicts, and required obligations are emitted as typed decisions. |
| Tools | Surfaced tools are the deterministic intersection of registry, permission, policy, and safety mode. |
| Evidence | Refs are snapshot-bound, classification-aware, and hash-addressed. |
| Memory | Recall reads promoted memory only, with tenant, subject, intent, and classification filters. |
| Budget | Bucket allocations are explicit, versioned, and sum to the run budget. |
| Buckets | Truncation is visible in the manifest; nothing silently disappears. |
| Manifest | CompiledContext contains the controls the Critic and replay harness need. |
What the eight stages buy you
Three things that are noticeable the day this lands.
Debugging is local. A missing evidence ref is a question for stage 4. A surprising tool-mode is stage 3. A truncated bucket is stage 7. The team grows muscle memory for which file to open, instead of tracing through a 600-line method.
Replay is trivial. Each stage is pure (the two async ones take injected retrievers). Replay re-runs the same eight functions with recorded inputs and gets the same output. The hash of the CompiledContext is deterministic; the replay harness reproduces it byte-for-byte.
Adding a new bucket is one file. New requirement: surface “recent corrections from the operator” alongside memory. That is one new stage 5b function and one new bucket entry — the rest of the compile is unchanged. The boundaries pay you back the first time you extend the compile.
Eight files. Twenty-line orchestrator. The same shape every compile takes, every time. Pull the orchestrator into your stack first; the eight stages can be stubs at first, then fleshed out one at a time. By the third stage, the team can already see the leverage.