Every AI agent starts each session with amnesia. Yesterday’s architecture decision, last week’s debugging breakthrough, the naming convention your team agreed on three sprints ago — gone. You re-explain it, or the agent gets it wrong.
BHGBrain fixes this. It’s an open-source MCP server that gives your AI agents a persistent, searchable, shared memory that survives across sessions, tools, and teams.
How It Works
BHGBrain sits between your AI agents and a durable storage layer. Any MCP-compatible client — Claude Code, Codex, OpenClaw, Gemini — connects to BHGBrain and gains access to a shared knowledge base.
AI Agents (Claude / Codex / OpenClaw / Gemini)
→ MCP transport (stdio or HTTP)
→ BHGBrain server
→ Qdrant (semantic vector search)
→ SQLite (metadata, fulltext index, audit log, archive)
When an agent stores a memory, BHGBrain runs it through an intelligent pipeline:
- Normalization — Input is cleaned and standardized before hashing, improving deduplication accuracy across paraphrased or reformatted content
- Deduplication — Compares against existing memories using SHA-256 content hashing and cosine similarity (threshold: 0.92), with tier-adjusted thresholds for precision control
- Decision — Determines whether to add new knowledge, update existing entries, or discard duplicates
- Retention assignment — Assigns the memory to the appropriate retention tier (T0–T3) based on type, importance, and caller-specified preference
- Storage — Embeds, indexes, and persists accepted memories with importance scores that influence future search ranking
When an agent needs context, BHGBrain delivers it through hybrid RRF search — combining semantic similarity (70%) and fulltext matching (30%) via Reciprocal Rank Fusion, with configurable weights. Agents get the relevant memories, not the entire knowledge base.
Key Capabilities
- Tiered retention (T0–T3) — Memories live as long as they matter. T0 (foundation) never expires. T1 (institutional) lasts one year. T2 (operational) lasts 90 days. T3 (ephemeral) lasts 30 days. Each tier has configurable capacity budgets (T1: 100K, T2: 200K, T3: 200K entries).
- Sliding window TTL — Every access resets the expiry clock. A memory that keeps getting used stays alive automatically.
- Auto-promotion — Memories accessed 5+ times automatically promote to the next higher retention tier. High-value knowledge self-selects for permanence.
- Pre-expiry warnings — Memories are flagged 7 days before expiration, giving agents and operators time to act before knowledge is lost.
- Archive-before-delete — Expired memories are written to an archive table before removal. Nothing is permanently purged without a recoverable record.
- Hybrid RRF search — Reciprocal Rank Fusion combines semantic (70%) and fulltext (30%) results into a single ranked list. Weights are configurable per query.
- Semantic deduplication — Cosine similarity at 0.92 threshold catches near-duplicates. SHA-256 checksums catch exact ones. Content normalization runs before hashing so paraphrased inputs deduplicate correctly.
- Importance scoring — Each memory carries a 0–1 importance score that directly influences search result ranking. High-importance memories surface first.
- Categories / persistent policy slots — Named policy categories (e.g.,
architecture-decisions,coding-standards,security-policies) provide persistent slots for institutional knowledge that should always be available, independent of TTL. - Shared memory across agents — Claude Code learns your API conventions; Codex picks them up automatically in the next session. One memory, every agent, zero drift.
- Memory classification — Memories are automatically typed as episodic (events), semantic (facts), or procedural (workflows).
- Namespace isolation — Separate projects, teams, or clients without cross-contamination. Global namespaces for cross-cutting standards.
- Collections — Group related memories within namespaces (e.g.,
api-design,infrastructure,security). - Context injection — A special MCP resource delivers a budgeted context block at session start, so agents begin with relevant knowledge without manual prompting.
- Full CLI — List, search, manage categories, run garbage collection, create backups — all from the command line.
Enterprise-Ready by Default
BHGBrain isn’t a prototype. It’s built for production use from day one.
| Capability | Detail |
|---|---|
| Authentication | Bearer token required for non-loopback HTTP. Fail-closed — server refuses to start without credentials on external bindings. |
| Audit logging | Every write and delete logged with timestamp, namespace, client ID, and operation type. |
| Secret scanning | Memories checked for credential patterns before storage. Likely secrets are rejected. |
| Rate limiting | 100 requests/minute/client by default. |
| Graceful degradation | If Qdrant goes down, reads fall back to SQLite fulltext. If embeddings are unavailable, server enters degraded mode instead of crashing. |
| Backup and restore | Full SQLite + Qdrant snapshots with integrity verification. |
| Capacity budgets | Per-tier entry limits (T1: 100K, T2: 200K, T3: 200K) prevent unbounded growth and keep storage predictable. |
| Pre-expiry warnings | Memories flagged 7 days before TTL expiration for review or re-promotion. |
| Archive-before-delete | Expired entries written to archive table before removal — no silent data loss. |
Who It’s For
- Teams running multi-agent workflows — When Claude Code, Codex, and OpenClaw all need to share the same project knowledge without drift.
- Enterprise IT departments — Organizations that need audit trails, authentication, and self-hosted infrastructure for AI memory.
- Consultants and agencies — Namespace isolation keeps client knowledge separate while sharing internal standards across engagements.
- Solo developers — Anyone whose AI memory needs have outgrown a
MEMORY.mdfile.
Get Started in 5 Minutes
1. Start Qdrant
docker run -d --name qdrant -p 6333:6333 qdrant/qdrant
2. Install BHGBrain
git clone https://github.com/Big-Hat-Group-Inc/BHGBrain.git
cd BHGBrain && npm install && npm run build
3. Set your API key and run
export OPENAI_API_KEY=sk-...
node dist/index.js
4. Connect your agent
Add BHGBrain to your MCP client config — Claude Desktop, OpenClaw, or any MCP-compatible tool. Your agents can now remember and recall across sessions.
For detailed setup, configuration options, and the bootstrap interview prompt, see the full documentation on GitHub.
Multilingual Documentation
BHGBrain ships with full documentation in five languages:
| Language | README |
|---|---|
| English | github.com/Big-Hat-Group-Inc/BHGBrain |
| 中文 (Mandarin) | README.zh-CN.md |
| Deutsch | README.de.md |
| Français | README.fr.md |
| Español | README.es.md |
📦 GitHub: github.com/Big-Hat-Group-Inc/BHGBrain
📖 Deep dive: BHGBrain: Give Your AI Agents a Shared, Persistent Memory