SecondBrain: A Local-First Agent Operating System You Can Run, Inspect, and Trust

Most “AI agents” are still a smart model sitting behind a tool menu. They answer, retrieve, call an API, and write a note to a vector store. Useful, but thin. They forget what happened last week, they cannot show you why they trusted a fact, their actions are hard to bound, and a session that went wrong is almost impossible to replay.

SecondBrain is the opposite bet. It is an open-source, local-first agent operating system: cognition, memory, governed tools, durable sessions, workflows, quality loops, and bounded self-improvement in one inspectable runtime that runs on your own machine. State, traces, memory, knowledge, approvals, and artifacts live locally, where you can read them.

This post is the system tour: what it is, the loop it runs, the Memory API that lets other agents plug in, why local-first changes the trust posture, and how to run the whole thing in one command.

More than chat or RAG

Most AI tooling gives you one useful slice: chat, search, orchestration, tools, evals, or memory. Real agentic work needs those pieces to behave as one system. SecondBrain is built around a fuller loop, not a single capability.

That makes SecondBrain a local agent platform with a Memory API on top, not just another assistant shell.

The Memory API: the part other agents plug into

The cleanest on-ramp is the Memory API v1. Other agents — Claude Code, Cursor, Codex, ChatGPT, or your own — connect over HTTP and MCP. Every response carries a Citation envelope (chunk_hash, source_path, anchor, text_span, score, retrieved_at) so the caller can audit every claim back to the chunk that produced it.

eval "$(uv run sb serve-token env)"
curl -H "Authorization: Bearer ${SB_SERVE_TOKEN}" \
     -H "Content-Type: application/json" \
     -d '{"query":"migration plan","top_k":5}' \
     http://localhost:8765/v1/memory/recall

It is not a static index. The Antahkarana cognitive stack (Chitta synthesis, Manas predictive context, Viveka quality gate, Karma outcome ledger) and the Autotune prompt-mutation loop are wired into the serving path itself. Each layer’s contribution is published as a Prometheus counter (secondbrain_*_total) and as a JSON envelope at GET /v1/cognitive/uplift — so the self-improvement claim is auditable, not marketing. If a layer never fires, its counter stays at zero and you can see it.

A few details that matter for builders:

Public /v1/ HTTP routes with an OpenAPI contract at contracts/memory_api_v1.yaml, plus MCP tools with named parity for the same surface.
A citation-density gate on grounded answers that refuses thin-evidence responses by default.
A published quality scorecard: synthetic seed nDCG@10 = 0.94, MRR = 0.94, recall@10 = 1.00; and a dogfood run (SecondBrain indexing its own repo) measuring 2.1× better recall and 3.5× better precision than grep.
Multi-tenancy behind SB_MULTI_TENANT=1: per-workspace token, vault, and state.

Why local-first changes the trust posture

Local-first agents sit near private notes, source code, documents, credentials, calendars, and long-running work state. That proximity is only useful if the runtime can govern what the model sees and what the system does. Local-first does not mean unbounded — safety comes from explicit boundaries.

Because everything is local and inspectable, the failure modes have somewhere to live: traces you can read, approvals you can audit, and memory writes that go through review instead of silently becoming future truth.

A cognitive layer, not just retrieval

Chat and agent flows can pass through the Antahkarana stack before or around tool execution — structured attention, identity, goals, judgment, guardrails, memory, witness-style audit, and functional-state monitoring. It is framed as a Vedic-inspired engineering model with clear boundaries between traditional faculties and modern runtime responsibilities. If that idea interests you, the full architecture is in Antahkarana Stack: A Cognitive Layer for Local-First Agents.

The practical payoff is promotion-aware knowledge: sb capture, sb ingest, and sb ingest-pipeline move material from raw capture, to promoted knowledge, to compiled runtime context — with working, episodic, and curated long-term memory kept separate. Sleep-time sessions synthesize proposed durable memories into a review queue instead of writing them silently. This is the same promotion-aware model we argue for in AI Agent Memory Is Broken.

Self-improvement that stays bounded

The most over-promised idea in agents is “self-improvement.” SecondBrain treats Autotune as constrained optimization, not unrestricted self-editing. Lane specs declare the mutable files, evaluators, benchmark packs, and acceptance rules. Each candidate keeps its run record, diff, metrics, and cognitive trace.

sb autotune validate repl_prompt
sb autotune bench repl_prompt
sb autotune run repl_prompt --pack smoke
sb quality gate --surface all

The guardrails are the point: Prana, Manas, Buddhi, and Viveka gate budget, priority, attempt framing, and mutation safety before any change applies; deterministic evaluators and ensemble acceptance decide whether a candidate is kept; promotion gates block unsafe promotion when baselines are missing or judge agreement is insufficient; and accepted candidates are preserved as candidate branches, not auto-merged. This is the same replay-driven discipline we describe in Replayable Environments and the Harness Improvement Loop.

How it maps to ContextOS

SecondBrain is the most complete public reference of the ContextOS spec in running code. The five planes map cleanly onto its modules:

ContextOS plane	SecondBrain surface
Intelligence	Promotion-aware knowledge, working/episodic/long-term memory, Chitta consolidation
Context	Hybrid retrieval, context packs, Manas working-memory assembly
Decision	`AgenticRuntime` Planner → Executor → Critic, Buddhi judgment, durable background sessions
Action	Governed tool gateway, MCP and A2A capability planes, approval modes
Trust	Policies and approvals, traces, evals, replay cases, quality gates, Autotune

If you want to read the spec, start at Foundations. If you want to run the spec on your laptop and inspect every layer, start with the repo. The module-by-module mapping lives in the SecondBrain reference.

Run the whole thing in one command

The fastest first run is the repo-local quickstart. It keeps config, state, and a mutable demo-vault copy inside the checkout, seeded from synthetic fixtures — and it returns a real, citation-first answer without needing an LLM provider key. You need Python 3.12 or 3.13.

git clone https://github.com/contextosai/SecondBrain-collab.git
cd SecondBrain-collab
 
make quickstart

That creates a virtualenv, installs the package in editable mode, writes portable config, ingests the demo vault, builds the offline context index, and asks a grounded question with --no-synthesize so the first answer returns demo-vault citations instead of calling a provider. Prefer Docker? make quickstart-docker brings up sb serve with the Memory API live at http://localhost:8765/v1/.

Then the first commands look like this:

source .venv/bin/activate
export SB_CONFIG=$PWD/.secondbrain/config/config.yaml
export SB_STATE_DIR=$PWD/.secondbrain/state
export SB_VAULT_DIR=$PWD/.secondbrain/demo_vault
 
sb doctor
sb ingest "$SB_VAULT_DIR/00_inbox/orion-launch-notes.md"
sb context index
sb ask "What are the main active priorities for the Orion release?" --no-synthesize

Where to help

Some surfaces are more mature than others, and the repo is explicit about that. The core supported path — setup, doctor, ingest, context, ask, chat basics, brain.sdk, brain.patterns, docs, and examples — is the easiest place to start. The cleanest extension points are custom tools built with the brain.patterns.tools.tool decorator, workflow specs, MCP servers, A2A agent cards, and domain agents that reuse the shared context, memory, policy, and persistence layers.