Memory System
Architecture overview of the persistent AI memory system
Persistent project memory for Claude Code across sessions. Stores knowledge as markdown files in docs/memory/, with optional hybrid search indexing.
Core Concepts
- Markdown is source of truth — every memory is a
.mdfile with YAML frontmatter, git-trackable and human-editable - SQLite is a cache — rebuilding from markdown never loses data
- 3-tier graceful degradation — always works, regardless of what you install
- Local-only — no data leaves your machine
3-Tier Architecture
The system detects available dependencies and picks the best backend automatically:
| Tier 1 (Full) | Tier 2 (Lite) | Tier 3 (Markdown) | |
|---|---|---|---|
| What | SQLite + FTS5 + sqlite-vec | MiniSearch daemon | Pure file I/O |
| Search | Hybrid: keyword + semantic + RRF fusion | BM25 + fuzzy + prefix | grep/substring |
| Semantic | Yes (vector embeddings) | No | No |
| Dependencies | better-sqlite3, sqlite-vec, Transformers.js | MiniSearch (~7KB) | None |
| Index | On disk (SQLite file) | In-memory (rebuilds in <1s) | None |
| Latency | <50ms | <10ms | <5ms (LLM reasons over results) |
| Scale | 100K+ memories | ~10K | ~1K |
| Setup | cf memory init | cf memory start-daemon | Nothing |
Tier selection
By default (memory.tier: "auto"), the MCP server picks the best available tier at startup:
- SQLite database exists and deps installed? → Tier 1
- Daemon running? → Tier 2
- Otherwise → Tier 3 (always available — just reads markdown files)
You can force a specific tier in .coding-friend/config.json:
{ "memory": { "tier": "markdown" } }
This is useful for testing (e.g., force Tier 3 even when SQLite is installed) or when you want to skip auto-detection. See memory.tier for all values.
How tiers connect
Claude Code
│
┌─────────┴────-─────┐
│ MCP Server (stdio) │
└─────────┬─────-────┘
│
detectTier()
│
┌──────────────────┼────────────────-──┐
│ │ │
Tier 1 (Full) Tier 2 (Lite) Tier 3 (Markdown)
│ │ │
┌────────┴─────────┐ │ grep/ripgrep
│ SqliteBackend │ │ │
│ (in-process) │ ▼ │
│ │ ┌────────────────┐ │
│ better-sqlite3 │ │ Daemon │ │
│ sqlite-vec │ │ (Unix socket) │ │
│ FTS5 + RRF │ │ │ │
│ │ │ MiniSearch │ │
│ │ │ (in-memory) │ │
└────────┬─────────┘ └───────┬────────┘ │
│ │ │
└────────────────────┼─────────────────┘
▼
┌────────────────────┐
│ docs/memory/*.md │
│ (source of truth) │
└────────────────────┘
- Tier 1 & 3: MCP server connects directly to the backend — no daemon needed
- Tier 2: MCP server connects to a Hono HTTP daemon over Unix Domain Socket (UDS), which keeps the MiniSearch index in RAM
- Auto-reconnect: if the daemon stops mid-session (e.g., idle timeout), the client automatically respawns it on the next request — no manual restart needed
- All tiers read/write the same
docs/memory/*.mdmarkdown files (source of truth). SQLite is a rebuildable cache.
Key technologies
- FTS5 — SQLite's built-in full-text search engine, uses BM25 ranking algorithm
- sqlite-vec — SQLite extension for vector storage and cosine similarity search
- MiniSearch — Lightweight (~7KB) in-memory search with BM25 + fuzzy matching
- Hono — Ultra-lightweight (~14KB) HTTP framework powering the Tier 2 daemon
- Unix Domain Socket (UDS) — IPC transport for daemon communication (10-50x faster than TCP, no port conflicts)
- Transformers.js — Runs ML embedding models in Node.js (no Python/GPU needed)
- WAL mode — SQLite Write-Ahead Logging for concurrent reads/writes without blocking
Write & Sync
How writes work
When you use MCP tools (memory_store, memory_update, memory_delete), the backend writes to both markdown and SQLite in a single call:
- Write markdown file (source of truth)
- Upsert/delete SQLite row (immediate)
- Generate embedding vector (async, non-blocking)
No sync mechanism is needed — both storage layers are updated atomically by the same function.
CLAUDE.md sync for project rules
Convention memories (preference type) are automatically synced to the project's CLAUDE.md file. When you store, update, or delete a convention memory, a corresponding entry is added, updated, or removed under a dedicated ## CF Memory: Project Rules section.
Other memory types (decisions, infrastructure, etc.) can opt-in to CLAUDE.md sync by setting sync_to_claude_md: true when calling memory_store or memory_update. This is useful for project-wide rules that live outside the conventions/ folder — e.g., architecture decisions that must be followed, or deployment procedures that must not be skipped.
Each entry is a concise one-liner (from the memory's description field) tracked via an HTML comment (<!-- cf:<memory-id> -->). This ensures Claude Code always has project rules loaded in its context — without needing to search memory on every session start.
When markdown is edited directly
If you edit markdown files outside the MCP tools (e.g., in your editor, via git), SQLite is not automatically updated. The MCP server process has no file watcher.
To sync SQLite after manual edits:
cf memory rebuild
This scans all markdown files and rebuilds the entire SQLite index (rows + embeddings) from scratch.
The Tier 2 daemon (cf memory start-daemon) includes a file watcher that
auto-rebuilds on markdown changes. But in Tier 1 (SQLite via MCP), there is no
watcher — you must rebuild manually.
Capturing Memories
Three ways to store memories:
- Bootstrap —
/cf-scaninside a Claude Code session- Scans the entire project (README, configs, source code) and populates memory with architecture, conventions, tech stack, key features
- Safe to run multiple times — updates existing memories, never duplicates
- Recommended for new projects or after major refactors
- Manually —
/cf-rememberinside a Claude Code session- Claude picks the right category and writes a markdown file
- Also indexes via
memory_storeMCP tool
- Automatically — when
memory.autoCaptureis enabled- PreCompact hook captures session summaries as
episodememories - Skills auto-capture notable findings:
/cf-fix— bug root causes/cf-review— architectural patterns/cf-sys-debug— debugging breakthroughs
- PreCompact hook captures session summaries as
/cf-scan vs /cf-remember:
/cf-scan | /cf-remember | |
|---|---|---|
| When | First time, or to refresh project knowledge, or anytime you want to "scan" some aspect of the project | After a coding session |
| Source | Scans the codebase directly | Extracts from conversation |
| Scope | Whole project (architecture, conventions, features) | Specific topic from current session |
| Typical output | 10-15 memories across all categories | 1-3 memories per invocation |
/cf-remember vs cf memory:
/cf-remember | cf memory | |
|---|---|---|
| Where | Inside Claude Code session | Terminal CLI |
| Does | Saves knowledge from conversation | Manages the memory system |
| Examples | /cf-remember auth uses JWT | cf memory status, cf memory rebuild |
MCP Integration
The MCP (Model Context Protocol) server bridges Claude Code and memory backends via stdio transport — Claude Code spawns it automatically.
| Tool | Purpose |
|---|---|
memory_store | Store a new memory |
memory_search | Search (keyword/semantic/hybrid) |
memory_retrieve | Get a specific memory by ID |
memory_list | List memories with filtering |
memory_update | Update existing memory |
memory_delete | Delete a memory |
You never call these directly — skills like /cf-remember and /cf-fix use them behind the scenes.
Run cf memory mcp to get the MCP configuration snippet for your project. You can also run cf mcp to see both Learn MCP and Memory MCP configs together.
Important: The Memory MCP path is project-specific — it points to docs/memory/ in your project. Always configure it in a local .mcp.json per project. Do not add it to the global ~/.claude/.mcp.json, as it will only work for the one project whose path is hardcoded. Running cf memory init or cf init will set this up automatically.
Search Pipeline
When Claude calls memory_search, the query goes through this pipeline (Tier 1):
-
Query routing — auto-detects intent: quoted strings/code patterns → keyword only, questions (how/why/what) → semantic only, everything else → hybrid (both + fusion)
-
Keyword search (FTS5 + BM25) — full-text search with weighted fields: title (10x), tags (6x), description (4x), content (1x)
-
Semantic search (embeddings) — text converted to vectors by an embedding model, compared using cosine similarity. Finds meaning, not just keywords (e.g., "login verification" matches "JWT authentication"). Vectors cached by content hash.
-
Fusion (RRF) — in hybrid mode, keyword and semantic results merged with Reciprocal Rank Fusion (k=60). Each result scored by rank position
1/(60 + rank), scores from both lists summed. -
Post-processing — temporal decay (90-day half-life, floor at 70%), deduplication (Jaccard similarity), optional type/tag filtering
Memory Types
| Type | Folder | Answers | Example |
|---|---|---|---|
fact | features/ | "How does this work?" | "Auth uses JWT in httpOnly cookies" |
preference | conventions/ | "How does the user want it?" | "Always async/await, never .then()" |
context | decisions/ | "What state is the project in?" | "Migrating REST to GraphQL" |
episode | bugs/ | "What did we do before?" | "Fixed CORS on /api/upload" |
procedure | infrastructure/ | "What steps to do X?" | "Deploy: build, ship, merge, release" |
Each type maps to a subfolder in docs/memory/ for easy browsing.
Markdown Format
---
title: "API Authentication Pattern"
description: "Auth module uses JWT tokens stored in httpOnly cookies with RS256"
type: fact
tags: [auth, jwt, api, security]
importance: 3
created: 2026-03-12
updated: 2026-03-12
source: conversation
---
The auth module uses RS256-signed JWT tokens...
title— short, descriptive namedescription— one-line searchable summary (<100 chars). Critical for search quality. Good:"JWT auth with httpOnly cookies and RS256". Bad:"How auth works".type— one of:fact,preference,context,episode,proceduretags— keywords for filtering and search rankingimportance— 1 (low) to 5 (critical), default 3. Auto-captured episodes default to 2.created/updated— ISO date stringssource—conversation(manual) orauto-capture(automatic)
Embedding Models
Semantic search (Tier 1) needs an embedding model to convert text into vectors. Two providers available:
- Transformers.js (default) — runs in Node.js, no external service, model auto-downloaded (~23 MB)
- Ollama (optional) — uses a local Ollama server, supports more models, falls back to Transformers.js if unavailable
| Model | Dims | Size | Provider | Notes |
|---|---|---|---|---|
Xenova/all-MiniLM-L6-v2 | 384 | ~23 MB | Transformers.js | Default — auto-downloaded |
all-minilm:l6-v2 | 384 | ~23 MB | Ollama | Same model via Ollama |
nomic-embed-text | 768 | ~274 MB | Ollama | Recommended upgrade |
mxbai-embed-large | 1024 | ~670 MB | Ollama | Best quality |
snowflake-arctic-embed:s | 384 | ~67 MB | Ollama | Alternative small model |
bge-base-en-v1.5 | 768 | ~130 MB | Ollama | English-optimized |
To use Ollama: install it, pull a model (ollama pull nomic-embed-text), then configure in .coding-friend/config.json:
{
"memory": {
"embedding": {
"provider": "ollama",
"model": "nomic-embed-text",
"ollamaUrl": "http://localhost:11434"
}
}
}
Run cf memory rebuild after setup to embed existing memories.
When switching to a model with different dimensions (e.g., 384 → 768), vector search is auto-disabled until you run cf memory rebuild. Keyword search still works. Markdown files are never affected.
Environment variables: MEMORY_EMBEDDING_PROVIDER ("transformers" or "ollama"), MEMORY_EMBEDDING_MODEL, MEMORY_EMBEDDING_OLLAMA_URL (default http://localhost:11434). See config.json reference for all options.
Configuration
Check out the configuration reference for all memory options.