BHGBrain: Persistent Memory for AI Agents

Every AI agent starts each session with amnesia. Yesterday’s architecture decision, last week’s debugging breakthrough, the naming convention your team agreed on three sprints ago — gone. You re-explain it, or the agent gets it wrong.

BHGBrain fixes this. It’s an open-source MCP server that gives your AI agents a persistent, searchable, shared memory that survives across sessions, tools, and teams.

How It Works

BHGBrain sits between your AI agents and a durable storage layer. Any MCP-compatible client — Claude Code, Codex, OpenClaw, Gemini — connects to BHGBrain and gains access to a shared knowledge base.

AI Agents (Claude / Codex / OpenClaw / Gemini)
  → MCP transport (stdio or HTTP)
    → BHGBrain server
      → Qdrant (semantic vector search)
      → SQLite (metadata, fulltext index, audit log, archive)

When an agent stores a memory, BHGBrain runs it through an intelligent pipeline:

  1. Normalization — Input is cleaned and standardized before hashing, improving deduplication accuracy across paraphrased or reformatted content
  2. Deduplication — Compares against existing memories using SHA-256 content hashing and cosine similarity (threshold: 0.92), with tier-adjusted thresholds for precision control
  3. Decision — Determines whether to add new knowledge, update existing entries, or discard duplicates
  4. Retention assignment — Assigns the memory to the appropriate retention tier (T0–T3) based on type, importance, and caller-specified preference
  5. Storage — Embeds, indexes, and persists accepted memories with importance scores that influence future search ranking

When an agent needs context, BHGBrain delivers it through hybrid RRF search — combining semantic similarity (70%) and fulltext matching (30%) via Reciprocal Rank Fusion, with configurable weights. Agents get the relevant memories, not the entire knowledge base.

Key Capabilities

  • Tiered retention (T0–T3) — Memories live as long as they matter. T0 (foundation) never expires. T1 (institutional) lasts one year. T2 (operational) lasts 90 days. T3 (ephemeral) lasts 30 days. Each tier has configurable capacity budgets (T1: 100K, T2: 200K, T3: 200K entries).
  • Sliding window TTL — Every access resets the expiry clock. A memory that keeps getting used stays alive automatically.
  • Auto-promotion — Memories accessed 5+ times automatically promote to the next higher retention tier. High-value knowledge self-selects for permanence.
  • Pre-expiry warnings — Memories are flagged 7 days before expiration, giving agents and operators time to act before knowledge is lost.
  • Archive-before-delete — Expired memories are written to an archive table before removal. Nothing is permanently purged without a recoverable record.
  • Hybrid RRF search — Reciprocal Rank Fusion combines semantic (70%) and fulltext (30%) results into a single ranked list. Weights are configurable per query.
  • Semantic deduplication — Cosine similarity at 0.92 threshold catches near-duplicates. SHA-256 checksums catch exact ones. Content normalization runs before hashing so paraphrased inputs deduplicate correctly.
  • Importance scoring — Each memory carries a 0–1 importance score that directly influences search result ranking. High-importance memories surface first.
  • Categories / persistent policy slots — Named policy categories (e.g., architecture-decisions, coding-standards, security-policies) provide persistent slots for institutional knowledge that should always be available, independent of TTL.
  • Shared memory across agents — Claude Code learns your API conventions; Codex picks them up automatically in the next session. One memory, every agent, zero drift.
  • Memory classification — Memories are automatically typed as episodic (events), semantic (facts), or procedural (workflows).
  • Namespace isolation — Separate projects, teams, or clients without cross-contamination. Global namespaces for cross-cutting standards.
  • Collections — Group related memories within namespaces (e.g., api-design, infrastructure, security).
  • Context injection — A special MCP resource delivers a budgeted context block at session start, so agents begin with relevant knowledge without manual prompting.
  • Full CLI — List, search, manage categories, run garbage collection, create backups — all from the command line.

Enterprise-Ready by Default

BHGBrain isn’t a prototype. It’s built for production use from day one.

CapabilityDetail
AuthenticationBearer token required for non-loopback HTTP. Fail-closed — server refuses to start without credentials on external bindings.
Audit loggingEvery write and delete logged with timestamp, namespace, client ID, and operation type.
Secret scanningMemories checked for credential patterns before storage. Likely secrets are rejected.
Rate limiting100 requests/minute/client by default.
Graceful degradationIf Qdrant goes down, reads fall back to SQLite fulltext. If embeddings are unavailable, server enters degraded mode instead of crashing.
Backup and restoreFull SQLite + Qdrant snapshots with integrity verification.
Capacity budgetsPer-tier entry limits (T1: 100K, T2: 200K, T3: 200K) prevent unbounded growth and keep storage predictable.
Pre-expiry warningsMemories flagged 7 days before TTL expiration for review or re-promotion.
Archive-before-deleteExpired entries written to archive table before removal — no silent data loss.

Who It’s For

  • Teams running multi-agent workflows — When Claude Code, Codex, and OpenClaw all need to share the same project knowledge without drift.
  • Enterprise IT departments — Organizations that need audit trails, authentication, and self-hosted infrastructure for AI memory.
  • Consultants and agencies — Namespace isolation keeps client knowledge separate while sharing internal standards across engagements.
  • Solo developers — Anyone whose AI memory needs have outgrown a MEMORY.md file.

Get Started in 5 Minutes

1. Start Qdrant

docker run -d --name qdrant -p 6333:6333 qdrant/qdrant

2. Install BHGBrain

git clone https://github.com/Big-Hat-Group-Inc/BHGBrain.git
cd BHGBrain && npm install && npm run build

3. Set your API key and run

export OPENAI_API_KEY=sk-...
node dist/index.js

4. Connect your agent

Add BHGBrain to your MCP client config — Claude Desktop, OpenClaw, or any MCP-compatible tool. Your agents can now remember and recall across sessions.

For detailed setup, configuration options, and the bootstrap interview prompt, see the full documentation on GitHub.

Multilingual Documentation

BHGBrain ships with full documentation in five languages:

LanguageREADME
Englishgithub.com/Big-Hat-Group-Inc/BHGBrain
中文 (Mandarin)README.zh-CN.md
DeutschREADME.de.md
FrançaisREADME.fr.md
EspañolREADME.es.md

📦 GitHub: github.com/Big-Hat-Group-Inc/BHGBrain

📖 Deep dive: BHGBrain: Give Your AI Agents a Shared, Persistent Memory

Kevin Kaminski is a 17x Microsoft MVP with 25 years of enterprise IT experience specializing in Windows 365, Intune, Azure infrastructure, and AI agent deployment. He leads Big Hat Group, delivering consulting, training, and managed services for organizations modernizing their endpoint and cloud operations.

Learn More About Big Hat Group →

Ready to Get Started?

Book a discovery call to discuss your AI agent infrastructure needs.

Book a Discovery Call