Memory System

Architecture overview of the persistent AI memory system

Persistent project memory for Claude Code across sessions. Stores knowledge as markdown files in docs/memory/, with optional hybrid search indexing.

Core Concepts

  • Markdown is source of truth — every memory is a .md file with YAML frontmatter, git-trackable and human-editable
  • SQLite is a cache — rebuilding from markdown never loses data
  • 3-tier graceful degradation — always works, regardless of what you install
  • Local-only — no data leaves your machine

3-Tier Architecture

The system detects available dependencies and picks the best backend automatically:

Tier 1 (Full)Tier 2 (Lite)Tier 3 (Markdown)
WhatSQLite + FTS5 + sqlite-vecMiniSearch daemonPure file I/O
SearchHybrid: keyword + semantic + RRF fusionBM25 + fuzzy + prefixgrep/substring
SemanticYes (vector embeddings)NoNo
Dependenciesbetter-sqlite3, sqlite-vec, Transformers.jsMiniSearch (~7KB)None
IndexOn disk (SQLite file)In-memory (rebuilds in <1s)None
Latency<50ms<10ms<5ms (LLM reasons over results)
Scale100K+ memories~10K~1K
Setupcf memory initcf memory start-daemonNothing

Tier selection

By default (memory.tier: "auto"), the MCP server picks the best available tier at startup:

  1. SQLite database exists and deps installed? → Tier 1
  2. Daemon running? → Tier 2
  3. Otherwise → Tier 3 (always available — just reads markdown files)

You can force a specific tier in .coding-friend/config.json:

{ "memory": { "tier": "markdown" } }

This is useful for testing (e.g., force Tier 3 even when SQLite is installed) or when you want to skip auto-detection. See memory.tier for all values.

How tiers connect

                        Claude Code
                            │
                  ┌─────────┴────-─────┐
                  │ MCP Server (stdio) │
                  └─────────┬─────-────┘
                            │
                     detectTier()
                            │
         ┌──────────────────┼────────────────-──┐
         │                  │                   │
    Tier 1 (Full)     Tier 2 (Lite)     Tier 3 (Markdown)
         │                  │                   │
┌────────┴─────────┐        │             grep/ripgrep
│  SqliteBackend   │        │                   │
│  (in-process)    │        ▼                   │
│                  │  ┌────────────────┐        │
│  better-sqlite3  │  │    Daemon      │        │
│  sqlite-vec      │  │ (Unix socket)  │        │
│  FTS5 + RRF      │  │                │        │
│                  │  │   MiniSearch   │        │
│                  │  │  (in-memory)   │        │
└────────┬─────────┘  └───────┬────────┘        │
         │                    │                 │
         └────────────────────┼─────────────────┘
                              ▼
                   ┌────────────────────┐
                   │  docs/memory/*.md  │
                   │ (source of truth)  │
                   └────────────────────┘
  • Tier 1 & 3: MCP server connects directly to the backend — no daemon needed
  • Tier 2: MCP server connects to a Hono HTTP daemon over Unix Domain Socket (UDS), which keeps the MiniSearch index in RAM
  • Auto-reconnect: if the daemon stops mid-session (e.g., idle timeout), the client automatically respawns it on the next request — no manual restart needed
  • All tiers read/write the same docs/memory/*.md markdown files (source of truth). SQLite is a rebuildable cache.

Key technologies

  • FTS5 — SQLite's built-in full-text search engine, uses BM25 ranking algorithm
  • sqlite-vec — SQLite extension for vector storage and cosine similarity search
  • MiniSearch — Lightweight (~7KB) in-memory search with BM25 + fuzzy matching
  • Hono — Ultra-lightweight (~14KB) HTTP framework powering the Tier 2 daemon
  • Unix Domain Socket (UDS) — IPC transport for daemon communication (10-50x faster than TCP, no port conflicts)
  • Transformers.js — Runs ML embedding models in Node.js (no Python/GPU needed)
  • WAL mode — SQLite Write-Ahead Logging for concurrent reads/writes without blocking

Write & Sync

How writes work

When you use MCP tools (memory_store, memory_update, memory_delete), the backend writes to both markdown and SQLite in a single call:

  1. Write markdown file (source of truth)
  2. Upsert/delete SQLite row (immediate)
  3. Generate embedding vector (async, non-blocking)

No sync mechanism is needed — both storage layers are updated atomically by the same function.

CLAUDE.md sync for project rules

Convention memories (preference type) are automatically synced to the project's CLAUDE.md file. When you store, update, or delete a convention memory, a corresponding entry is added, updated, or removed under a dedicated ## CF Memory: Project Rules section.

Other memory types (decisions, infrastructure, etc.) can opt-in to CLAUDE.md sync by setting sync_to_claude_md: true when calling memory_store or memory_update. This is useful for project-wide rules that live outside the conventions/ folder — e.g., architecture decisions that must be followed, or deployment procedures that must not be skipped.

Each entry is a concise one-liner (from the memory's description field) tracked via an HTML comment (<!-- cf:<memory-id> -->). This ensures Claude Code always has project rules loaded in its context — without needing to search memory on every session start.

When markdown is edited directly

If you edit markdown files outside the MCP tools (e.g., in your editor, via git), SQLite is not automatically updated. The MCP server process has no file watcher.

To sync SQLite after manual edits:

cf memory rebuild

This scans all markdown files and rebuilds the entire SQLite index (rows + embeddings) from scratch.

The Tier 2 daemon (cf memory start-daemon) includes a file watcher that auto-rebuilds on markdown changes. But in Tier 1 (SQLite via MCP), there is no watcher — you must rebuild manually.

Capturing Memories

Three ways to store memories:

  1. Bootstrap/cf-scan inside a Claude Code session
    • Scans the entire project (README, configs, source code) and populates memory with architecture, conventions, tech stack, key features
    • Safe to run multiple times — updates existing memories, never duplicates
    • Recommended for new projects or after major refactors
  2. Manually/cf-remember inside a Claude Code session
    • Claude picks the right category and writes a markdown file
    • Also indexes via memory_store MCP tool
  3. Automatically — when memory.autoCapture is enabled
    • PreCompact hook captures session summaries as episode memories
    • Skills auto-capture notable findings:
      • /cf-fix — bug root causes
      • /cf-review — architectural patterns
      • /cf-sys-debug — debugging breakthroughs

/cf-scan vs /cf-remember:

/cf-scan/cf-remember
WhenFirst time, or to refresh project knowledge, or anytime you want to "scan" some aspect of the projectAfter a coding session
SourceScans the codebase directlyExtracts from conversation
ScopeWhole project (architecture, conventions, features)Specific topic from current session
Typical output10-15 memories across all categories1-3 memories per invocation

/cf-remember vs cf memory:

/cf-remembercf memory
WhereInside Claude Code sessionTerminal CLI
DoesSaves knowledge from conversationManages the memory system
Examples/cf-remember auth uses JWTcf memory status, cf memory rebuild

MCP Integration

The MCP (Model Context Protocol) server bridges Claude Code and memory backends via stdio transport — Claude Code spawns it automatically.

ToolPurpose
memory_storeStore a new memory
memory_searchSearch (keyword/semantic/hybrid)
memory_retrieveGet a specific memory by ID
memory_listList memories with filtering
memory_updateUpdate existing memory
memory_deleteDelete a memory

You never call these directly — skills like /cf-remember and /cf-fix use them behind the scenes.

Run cf memory mcp to get the MCP configuration snippet for your project. You can also run cf mcp to see both Learn MCP and Memory MCP configs together.

Important: The Memory MCP path is project-specific — it points to docs/memory/ in your project. Always configure it in a local .mcp.json per project. Do not add it to the global ~/.claude/.mcp.json, as it will only work for the one project whose path is hardcoded. Running cf memory init or cf init will set this up automatically.

Search Pipeline

When Claude calls memory_search, the query goes through this pipeline (Tier 1):

  1. Query routing — auto-detects intent: quoted strings/code patterns → keyword only, questions (how/why/what) → semantic only, everything else → hybrid (both + fusion)

  2. Keyword search (FTS5 + BM25) — full-text search with weighted fields: title (10x), tags (6x), description (4x), content (1x)

  3. Semantic search (embeddings) — text converted to vectors by an embedding model, compared using cosine similarity. Finds meaning, not just keywords (e.g., "login verification" matches "JWT authentication"). Vectors cached by content hash.

  4. Fusion (RRF) — in hybrid mode, keyword and semantic results merged with Reciprocal Rank Fusion (k=60). Each result scored by rank position 1/(60 + rank), scores from both lists summed.

  5. Post-processing — temporal decay (90-day half-life, floor at 70%), deduplication (Jaccard similarity), optional type/tag filtering

Memory Types

TypeFolderAnswersExample
factfeatures/"How does this work?""Auth uses JWT in httpOnly cookies"
preferenceconventions/"How does the user want it?""Always async/await, never .then()"
contextdecisions/"What state is the project in?""Migrating REST to GraphQL"
episodebugs/"What did we do before?""Fixed CORS on /api/upload"
procedureinfrastructure/"What steps to do X?""Deploy: build, ship, merge, release"

Each type maps to a subfolder in docs/memory/ for easy browsing.

Markdown Format

---
title: "API Authentication Pattern"
description: "Auth module uses JWT tokens stored in httpOnly cookies with RS256"
type: fact
tags: [auth, jwt, api, security]
importance: 3
created: 2026-03-12
updated: 2026-03-12
source: conversation
---
The auth module uses RS256-signed JWT tokens...
  • title — short, descriptive name
  • description — one-line searchable summary (<100 chars). Critical for search quality. Good: "JWT auth with httpOnly cookies and RS256". Bad: "How auth works".
  • type — one of: fact, preference, context, episode, procedure
  • tags — keywords for filtering and search ranking
  • importance — 1 (low) to 5 (critical), default 3. Auto-captured episodes default to 2.
  • created / updated — ISO date strings
  • sourceconversation (manual) or auto-capture (automatic)

Embedding Models

Semantic search (Tier 1) needs an embedding model to convert text into vectors. Two providers available:

  • Transformers.js (default) — runs in Node.js, no external service, model auto-downloaded (~23 MB)
  • Ollama (optional) — uses a local Ollama server, supports more models, falls back to Transformers.js if unavailable
ModelDimsSizeProviderNotes
Xenova/all-MiniLM-L6-v2384~23 MBTransformers.jsDefault — auto-downloaded
all-minilm:l6-v2384~23 MBOllamaSame model via Ollama
nomic-embed-text768~274 MBOllamaRecommended upgrade
mxbai-embed-large1024~670 MBOllamaBest quality
snowflake-arctic-embed:s384~67 MBOllamaAlternative small model
bge-base-en-v1.5768~130 MBOllamaEnglish-optimized

To use Ollama: install it, pull a model (ollama pull nomic-embed-text), then configure in .coding-friend/config.json:

{
  "memory": {
    "embedding": {
      "provider": "ollama",
      "model": "nomic-embed-text",
      "ollamaUrl": "http://localhost:11434"
    }
  }
}

Run cf memory rebuild after setup to embed existing memories.

When switching to a model with different dimensions (e.g., 384 → 768), vector search is auto-disabled until you run cf memory rebuild. Keyword search still works. Markdown files are never affected.

Environment variables: MEMORY_EMBEDDING_PROVIDER ("transformers" or "ollama"), MEMORY_EMBEDDING_MODEL, MEMORY_EMBEDDING_OLLAMA_URL (default http://localhost:11434). See config.json reference for all options.

Configuration

Check out the configuration reference for all memory options.