Give Claude Code, Cursor, and Codex Persistent, Auditable Memory

Coding agents are brilliant and amnesiac. Claude Code, Cursor, and Codex can refactor a module or trace a bug across a dozen files — and then start the next session knowing none of it. They re-derive your repo conventions, forget last week’s architecture decision, and rediscover the test command you corrected them on twice already.

The usual patch is to bolt a vector store onto the agent. That helps it retrieve, but it does not help you trust: you cannot see why the agent believed a fact, whether that fact is still current, or which prior decision it is acting on. For a coding agent operating on a real repo, that gap is the whole problem.

SecondBrain takes a different shape: a shared, local-first Memory API that any agent plugs into over HTTP and MCP. It is open source, runs on your machine, and every memory-bearing response carries a Citation envelope back to the exact chunk that produced it. The brand promise is blunt: AI you can audit.

The split that makes this work

The Memory API draws one clean line: it returns grounded data; the calling agent brings the LLM and the synthesis. SecondBrain does retrieval, provenance, decisions, and open loops. Claude Code does what it is already good at — reading the evidence and writing the code. Nobody re-implements memory inside each agent, and you get one durable store instead of four siloed ones.

Because the contract is HTTP plus MCP, the same surface is available two ways: a public /v1/ REST route for any client, and a named MCP twin for any MCP-capable agent. An MCP-aware tool like Claude Code can call secondbrain_recall as a native tool; a script can POST /v1/memory/recall. Same data, same citations.

The citation envelope is the point

Every item the Memory API returns is wrapped in a provenance envelope. This is not metadata you can ignore — it is the durable identity of the evidence:

{
  "content": "The Orion release freezes scope on 2026-04-18; refunds need policy evidence.",
  "citation": {
    "chunk_hash": "sha256:9f2c…",
    "source_path": "03_decisions/2026-04-18-release-scope.md",
    "anchor": "Scope freeze",
    "text_span": "Scope is frozen as of 2026-04-18.",
    "score": 0.91,
    "retrieved_at": "2026-06-27T10:14:02Z"
  },
  "source_type": "hybrid"
}

The chunk_hash is a content-addressed SHA-256 of the normalized chunk, so the evidence has a stable identity even as files move. source_path plus anchor is the human-readable location; text_span is the literal quoted evidence. For a coding agent this is the difference between “the model said so” and “here is the decision file, line and span, that the model is acting on.” Grounded answers run behind a citation-density gate that refuses thin-evidence responses by default — so the agent declines to bluff instead of confidently inventing a convention you never set.

This is the operational answer to the problem we laid out in AI Agent Memory Is Broken: memory is not a search index, it is continuity you can audit.

The surface coding agents actually use

The API is small on purpose. The routes below cover the day-to-day loop, and each has a named MCP tool with the same behavior.

HTTP route	MCP tool	What it does
`POST /v1/memory/recall`	`secondbrain_recall`	Hybrid (semantic + keyword) retrieval over the workspace vault
`POST /v1/memory/ingest`	`secondbrain_ingest`	Add a file or raw text to the vault and index it
`POST /v1/memory/forget`	`secondbrain_forget`	Unindex content by `source_path` or `chunk_hash`
`POST /v1/memory/pack`	`secondbrain_pack`	Build a bounded ContextPack of facts, decisions, and loops for an intent
`POST /v1/grounded/answer`	`secondbrain_grounded_answer`	Closed-corpus QA with multi-step retrieval and citation enforcement
`GET /v1/decisions`	`secondbrain_decisions_list`	List recorded decisions in the workspace catalog
`GET /v1/open_loops`	`secondbrain_open_loops`	List unresolved `TODO` / `OPENLOOP` markers across the vault
`GET /v1/audit/event_log`	`secondbrain_audit`	Tail the event log; filter to one trajectory for replay

The recall call is exactly what you’d expect — a query and a top_k, back come ranked items, each with its citation:

eval "$(uv run sb serve-token env)"
curl -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
     -H "Content-Type: application/json" \
     -d '{"query":"what is the current test command?","top_k":5}' \
     http://localhost:8765/v1/memory/recall

Notice secondbrain_pack. Instead of dumping raw chunks into the agent’s context, you can ask for a ContextPack: a bounded bundle of the relevant facts, the decisions that govern them, and the open loops still in flight — each carrying its own citation. That is the difference between flooding a coding agent’s window and handing it exactly the compiled context for the task.

Memory that matures, not just accumulates

A vector store treats every write as equally true forever. SecondBrain’s Memory API (v1.1) adds a knowledge assimilation lifecycle so captured material has to earn durability:

For a coding agent the payoff is concrete: a freshly captured note (“we switched to pnpm test”) is intake, not gospel. It matures into practiced knowledge through review — so the agent does not act on an unverified one-off as if it were a settled repo convention. This is the promotion-aware discipline we argue for across the memory foundations, now exposed as an API.

Auditable by construction

Because the runtime is local and every layer leaves a trace, the trust claims are checkable rather than asserted:

GET /v1/cognitive/uplift returns a window-aggregated snapshot of what each cognitive layer contributed, mirrored by secondbrain_*_total Prometheus counters. If a layer never fires, its counter stays at zero and you can see it.
GET /v1/audit/event_log tails the workspace event log and filters to a single trajectory — so a session that went sideways can be replayed instead of guessed at.
Multi-tenancy behind SB_MULTI_TENANT=1 gives each workspace its own token, vault, and state, so one machine can serve several isolated projects.

The whole memory contract is pinned in the OpenAPI file at contracts/memory_api_v1.yaml, and a single end-to-end test (tests/test_memory_api_v1_e2e.py) asserts the Citation-envelope invariant on every memory-bearing response. The promise is enforced in CI, not just in prose.

Run it and point an agent at it

The Docker quickstart brings up sb serve with the Memory API live in one command — no provider key required for the first grounded answer, because retrieval and citations don’t need an LLM:

git clone https://github.com/contextosai/SecondBrain-collab.git
cd SecondBrain-collab
 
make quickstart-docker
# Memory API is now live at http://localhost:8765/v1/

From there, ingest your notes or repo docs with secondbrain_ingest, then have your agent call secondbrain_recall or secondbrain_pack before it answers. The first time Claude Code cites the exact decision file behind a suggestion instead of hand-waving, the difference is obvious.