BHGBrain is an open-source MCP server that provides persistent, vector-backed memory for AI agents. It stores memories in SQLite and Qdrant, making them searchable via semantic, fulltext, or hybrid search across sessions and tools.

Which AI agents work with BHGBrain?

Any MCP-compatible client — Claude Code, Claude Desktop, OpenAI Codex, OpenClaw, Gemini, and others. BHGBrain supports both stdio and HTTP transports, so it works as a local process or a networked service.

How is BHGBrain different from MEMORY.md or session context?

MEMORY.md is a flat file that gets loaded into context every session, consuming tokens and losing nuance at scale. BHGBrain stores memories as vectors with metadata, retrieves only what's relevant via semantic search, and deduplicates automatically — scaling to hundreds of thousands of memories without bloating context windows.

Do I need a GPU or cloud service to run BHGBrain?

No. BHGBrain runs on any machine with Node.js 20+ and a Qdrant instance (easily run via Docker). Embeddings use OpenAI's API by default, or you can use a local model like nomic-embed-text via Ollama for fully offline operation.

Is BHGBrain secure for enterprise use?

Yes. BHGBrain binds to localhost by default, requires bearer token authentication for non-loopback connections, redacts secrets in logs, scans for credential patterns before storing memories, and maintains a full audit log of all write and delete operations.

Can multiple agents share the same BHGBrain instance?

Absolutely. That's the core design. Multiple MCP clients connect over HTTP to the same BHGBrain server, sharing one knowledge base. Namespace isolation prevents cross-project leakage while still allowing cross-agent knowledge sharing within a namespace.

BHGBrain: Give Your AI Agents a Shared, Persistent Memory

Every AI coding agent you use — Claude Code, Codex, Copilot, Gemini — starts each session with amnesia. Yesterday’s debugging breakthrough, that architecture decision from last week, the coding standard your team agreed on three sprints ago — all gone. You either re-explain everything or hope the agent infers it from whatever files happen to be open.

This is the single biggest friction point in agentic workflows today. Not model quality. Not tool integration. Memory.

BHGBrain is our answer: an open-source MCP server that gives your AI agents a persistent, searchable, shared second brain.

The Problem: Every Agent Is a Goldfish

If you’ve used AI coding agents for any real project, you know the pattern:

Morning session: You explain your architecture, constraints, and conventions. The agent does great work.
Afternoon session: New context window. The agent suggests the exact pattern you told it to avoid four hours ago.
Next day: A different agent (maybe Codex for a background task) has zero awareness of what Claude Code learned yesterday.

The workarounds are fragile:

MEMORY.md files work until they hit 500 lines and start consuming 15% of your context window on every message.
System prompts are static and can’t capture evolving knowledge.
Session transcripts are agent-specific and unsearchable.
Manually re-explaining is what we were trying to avoid by using agents in the first place.

The fundamental issue: each agent has its own ephemeral context, and none of them talk to each other.

What BHGBrain Does

BHGBrain is an MCP server — meaning any MCP-compatible AI client can connect to it as a tool. It exposes a simple set of operations:

Tool	What It Does
`remember`	Store a memory with automatic type classification, deduplication, and tagging
`recall`	Semantic search — find relevant memories by meaning, not keywords
`search`	Hybrid search combining vector similarity and fulltext matching
`forget`	Delete a memory (with audit trail)
`tag`	Add or remove tags from memories
`category`	Manage persistent policy categories (architecture, coding standards, etc.)
`collections`	Organize memories into named collections
`backup`	Create and restore full backups

Under the hood, every memory gets:

Vector embeddings stored in Qdrant for semantic search
Metadata and fulltext index in SQLite for fast filtering and keyword search
Automatic deduplication — if you tell it the same thing twice, it merges instead of duplicating
Type classification — memories are categorized as episodic (events), semantic (facts), or procedural (workflows)

The Write Pipeline

When an agent calls remember, BHGBrain doesn’t just dump text into a database. It runs a multi-phase pipeline:

Extraction — An LLM breaks raw input into atomic memory candidates with inferred types, tags, and importance scores.
Decision — Each candidate is compared against existing memories in the same namespace. The system decides: ADD (new knowledge), UPDATE (refines existing), DELETE (invalidates old info), or NOOP (already known).
Storage — Accepted memories get embedded, indexed, and persisted.

If the extraction model is unavailable, BHGBrain falls back to deterministic deduplication using content hashes and cosine similarity thresholds. It never silently drops a memory.

The Read Path

When an agent needs context, it calls recall or search:

Semantic search finds memories by meaning (“how does authentication work in this project?”)
Fulltext search finds exact terms (“OIDC federated credentials”)
Hybrid search combines both using Reciprocal Rank Fusion (70% semantic, 30% fulltext by default)

There’s also memory://inject — a special MCP resource that delivers a budgeted context block at session start, so agents begin with relevant knowledge without manual prompting.

Why a Shared Brain Changes Everything

The real power isn’t that one agent can remember things. It’s that all your agents share the same memory.

Scenario: Multi-Agent Development

You’re running three agents on a project:

Claude Code for interactive development in your IDE
Codex for background tasks (test generation, refactoring)
OpenClaw for operations and deployment automation

Without shared memory, each agent operates in isolation. Claude Code learns your naming conventions; Codex generates tests that violate them. OpenClaw deploys a config that contradicts an architecture decision Claude Code captured yesterday.

With BHGBrain, when Claude Code stores "All API routes use kebab-case and return RFC 7807 problem details", Codex picks it up in its next recall and generates compliant tests. OpenClaw’s deployment scripts align with the same conventions. One memory, three agents, zero drift.

Scenario: Onboarding and Knowledge Transfer

New team member starts. Instead of reading 40 pages of wiki docs (half outdated), they connect their AI agent to the team’s BHGBrain instance. The agent immediately has access to:

Architecture decisions and their rationale
Coding standards with concrete examples
Known pitfalls and workarounds
Project-specific terminology

The bootstrap prompt included with BHGBrain walks through a structured 10-section interview covering identity, responsibilities, goals, tools, entity maps, and operating rules — building a comprehensive work profile in about 30 minutes.

Scenario: Cross-Repository Continuity

Working across multiple repos (common in microservices, monorepo splits, or multi-org consulting)? BHGBrain’s namespace and collection system keeps knowledge organized:

Namespace project-alpha holds memories specific to that project
Namespace global holds cross-cutting standards
Collections within namespaces group related memories (e.g., api-design, infrastructure, security)

Agents query the right scope automatically. No cross-contamination, no lost context.

Architecture: Simple and Self-Hosted

BHGBrain runs on your machine or your infrastructure. There’s no cloud dependency beyond the embedding API (and even that is optional with local models).

MCP Clients (Claude / Codex / OpenClaw / etc.)
  → MCP transport (HTTP or stdio)
  → BHGBrain server
      → Write pipeline (extraction + dedup + decision)
      → Qdrant (vector search)
      → SQLite (metadata, fulltext, categories, audit log)

Requirements:

Node.js 20+
Qdrant (Docker one-liner: docker run -d --name qdrant -p 6333:6333 qdrant/qdrant)
OpenAI API key (for embeddings) — or use Ollama with nomic-embed-text for fully local operation

Install and run:

git clone https://github.com/Big-Hat-Group-Inc/BHGBrain.git
cd BHGBrain
npm install && npm run build
export OPENAI_API_KEY=sk-...
export BHGBRAIN_TOKEN=$(node -e "console.log(require('crypto').randomBytes(32).toString('hex'))")
node dist/index.js

That’s it. Your agents can now connect via stdio (local) or HTTP (networked).

Enterprise-Ready by Default

BHGBrain isn’t a toy. It’s built with production concerns from day one:

Authentication: Bearer token required for non-loopback HTTP connections. Fail-closed — if the token env var isn’t set and you bind to a non-loopback address, the server refuses to start.
Rate limiting: 100 requests/minute/client by default.
Audit logging: Every write and delete is logged with timestamp, namespace, client ID, and operation type.
Secret scanning: Memories are checked for credential patterns before storage. Likely secrets are rejected.
Backup and restore: Full SQLite + Qdrant snapshots with integrity verification.
Graceful degradation: If Qdrant goes down, reads fall back to SQLite fulltext. If the embedding API is unavailable, the server enters degraded mode instead of crashing.
Structured logging: JSON logs with automatic token and content redaction.

Getting Started in 5 Minutes

1. Start Qdrant

docker run -d --name qdrant -p 6333:6333 qdrant/qdrant

2. Install BHGBrain

git clone https://github.com/Big-Hat-Group-Inc/BHGBrain.git
cd BHGBrain && npm install && npm run build

3. Configure Your Agent

For Claude Desktop, add to claude_desktop_config.json:

{
  "mcpServers": {
    "bhgbrain": {
      "command": "node",
      "args": ["/path/to/BHGBrain/dist/index.js"],
      "env": { "OPENAI_API_KEY": "sk-..." }
    }
  }
}

For HTTP clients (OpenClaw, mcporter, remote agents):

{
  "mcpServers": {
    "bhgbrain": {
      "transport": "http",
      "url": "http://127.0.0.1:3721",
      "headers": { "Authorization": "Bearer YOUR_TOKEN" }
    }
  }
}

4. Start Remembering

Ask your agent to remember something:

“Remember: our API uses kebab-case routes, returns RFC 7807 problem details, and all endpoints require Bearer token authentication.”

BHGBrain stores it, classifies it as semantic, tags it appropriately, and makes it available to every connected agent via recall.

5. Bootstrap Your Brain

Run the included bootstrap interview to build a comprehensive work profile:

# Paste the contents of BootstrapPrompt.txt into a fresh AI conversation
# The agent will interview you across 10 sections and produce a structured profile

CLI for Power Users

BHGBrain includes a full CLI for direct management:

bhgbrain search "authentication patterns" --mode hybrid
bhgbrain list --limit 20
bhgbrain category set "Coding Standards" --file ./standards.md
bhgbrain stats
bhgbrain gc --consolidate    # Merge similar memories, flag stale ones
bhgbrain backup create

When to Use BHGBrain vs. MEMORY.md

	MEMORY.md	BHGBrain
Scale	~100 memories before context bloat	500,000+ memories
Search	Full file loaded every session	Semantic retrieval of relevant subset
Multi-agent	Per-agent, per-workspace	Shared across all MCP clients
Deduplication	Manual	Automatic (hash + cosine similarity)
Types	Flat text	Episodic, semantic, procedural
Audit	None	Full audit log
Backup	Git	Dedicated backup/restore with integrity checks

MEMORY.md is fine for a single agent on a single project with a handful of notes. BHGBrain is for teams, multi-agent workflows, and anyone whose AI memory needs have outgrown a text file.

What’s Next

BHGBrain v1 focuses on getting the core right: reliable storage, smart deduplication, hybrid search, and multi-client access. The roadmap includes:

Multi-user RBAC — team-level access control
Encryption at rest — for regulated environments
Cloud sync — optional synchronization across machines
Working memory TTL — short-lived scratch memories that auto-expire

Try It

BHGBrain is open source under the MIT license.

GitHub: github.com/Big-Hat-Group-Inc/BHGBrain

If your AI agents keep forgetting what you told them yesterday, give them a brain that lasts.

Kevin Kaminski is the founder of Big Hat Group, where we build tools and consulting practices around enterprise AI, Windows 365, and cloud infrastructure. BHGBrain grew out of our own frustration with agent amnesia across the multi-agent workflows we run daily.