Three days after shipping tiered retention and hybrid search, we hit the exact failure mode that justified building a persistent memory server in the first place: the SQLite database on one machine was empty, while Qdrant Cloud still had every vector. All the content — the actual text of every memory — lived only in SQLite. The vectors were intact but useless without the text they encoded.
That incident drove BHGBrain 1.3. This release makes the memory server resilient to storage failures, adds multi-device support for teams running BHGBrain on more than one machine, and ships a disaster recovery tool that can rebuild your entire local database from the shared vector store.
The Problem: Dual-Store, Single Point of Failure
BHGBrain’s architecture uses two stores: SQLite for content, metadata, and full-text search; Qdrant for vector embeddings and semantic similarity. Every write goes to both. Every search joins across both.
The design assumed SQLite was durable. It is — sql.js flushes atomically to disk via write-to-temp-then-rename. But if the database file is lost, recreated, or starts fresh on a different machine, all content is gone. Qdrant has the embeddings and some metadata (tags, type, importance), but not the actual memory text. Search results that depend on the SQLite join silently return nothing.
This is exactly what happened. Two BHGBrain instances — one on a primary workstation, one on a Windows 365 Cloud PC — pointed at the same Qdrant Cloud cluster. The Cloud PC’s SQLite was empty. Every recall returned zero results, even though Qdrant held dozens of vectors with high-confidence matches.
Fix 1: Content in Qdrant Payloads
The first change is straightforward: store the full memory content in the Qdrant payload alongside the vector.
Before 1.3, the Qdrant upsert payload contained only metadata — namespace, type, tags, importance, retention_tier, decay_eligible, expires_at. The content text was SQLite-only.
Now, writeMemory and updateMemory include content, summary, category, source, created_at, and device_id in every Qdrant upsert. Qdrant becomes the redundant backup for content. SQLite is still the primary store for queries, full-text search, and lifecycle management — but losing it no longer means losing data.
Fix 2: Automatic Search Fallback
The search service’s buildSearchResults method used to look up every Qdrant result in SQLite. If a memory ID wasn’t found locally, it was dropped. This was the silent failure that made the bug so confusing — Qdrant returned matches, but the caller saw empty results.
Now, when a Qdrant result has no matching SQLite row, the search service constructs the result directly from the Qdrant payload:
Qdrant returns match → SQLite lookup → found? → return full record
→ not found? → payload has content?
→ yes → return from payload
→ no → skip (pre-1.3 memory)
This means a fresh BHGBrain instance pointing at an existing Qdrant cluster will immediately return search results — no SQLite data required. The results include everything stored in the payload: content, summary, type, tags, retention tier, device ID, and creation timestamp.
Fix 3: The Repair Tool
For full recovery — rebuilding SQLite so full-text search, lifecycle management, and access tracking work properly — there’s now a repair MCP tool:
{
"tool": "bhgbrain.repair",
"params": {
"dry_run": true
}
}
The repair tool:
- Discovers all
bhgbrain_*collections in Qdrant via the collections API - Scrolls every point in every collection (batched pagination)
- For each point: checks if the ID exists in local SQLite
- If missing and the Qdrant payload contains
content: inserts a fullMemoryRecordinto SQLite - If missing and no content in payload: skips (pre-1.3 memory, content unrecoverable)
- Reports statistics: collections scanned, points scanned, recovered, skipped, errors
Use dry_run: true to preview what would be recovered without writing anything. Use device_id to filter recovery to memories from a specific device.
This is the disaster recovery path. It’s also the onboarding path for new devices — point a fresh BHGBrain instance at your existing Qdrant cluster, run repair, and the local SQLite is populated with everything from the shared store.
Multi-Device Memory
The bigger architectural change in 1.3 is first-class support for multiple BHGBrain instances sharing a single Qdrant backend.
The Architecture
Device A (Workstation) Device B (Cloud PC)
┌──────────────────┐ ┌──────────────────┐
│ SQLite (local) │ │ SQLite (local) │
│ device_id: ws-1 │ │ device_id: w365 │
└────────┬─────────┘ └────────┬─────────┘
│ │
└──────────┬───────────────────┘
│
┌──────────▼──────────┐
│ Qdrant Cloud │
│ (shared backend) │
│ content + vectors │
│ device_id index │
└─────────────────────┘
Each device maintains its own SQLite database. Qdrant is the shared layer. Every write stores content in both stores plus tags the memory with the originating device_id.
Device Identity
Each instance resolves a stable device_id on startup:
- Explicit config:
device.idinconfig.json - Environment variable:
BHGBRAIN_DEVICE_ID - Auto-generated: Derived from
os.hostname(), lowercased and sanitized
The resolved ID is persisted to config.json on first run. It appears in every Qdrant payload, every SQLite record, and every search result — so you always know which device created a memory.
Cross-Device Visibility
Both devices see all memories. When Device B searches for something Device A stored, the Qdrant search returns the match. If Device B’s SQLite doesn’t have the record, the search fallback constructs the result from the Qdrant payload. No data is invisible.
| Source | Device A sees | Device B sees |
|---|---|---|
| Device A’s memories (SQLite) | Full record | Qdrant fallback |
| Device B’s memories (SQLite) | Qdrant fallback | Full record |
For full local functionality (full-text search, lifecycle tracking, access counts), run repair on the device to populate its SQLite from Qdrant.
Configuration
Point both devices at the same Qdrant cluster with different device IDs:
// Device A
{
"device": { "id": "workstation" },
"qdrant": {
"mode": "external",
"external_url": "https://your-cluster.cloud.qdrant.io",
"api_key_env": "QDRANT_API_KEY"
}
}
// Device B
{
"device": { "id": "cloud-pc" },
"qdrant": {
"mode": "external",
"external_url": "https://your-cluster.cloud.qdrant.io",
"api_key_env": "QDRANT_API_KEY"
}
}
That’s it. Both instances share the same memory pool. Both tag their writes with provenance. Both can see everything.
Documentation: 8 Mermaid Diagrams
The README now includes detailed Mermaid diagrams covering every major subsystem:
- Architecture — component diagram showing the full server stack
- Multi-Device Topology — shared Qdrant with local SQLite per device
- Write Pipeline — complete deduplication decision flowchart
- Tier Assignment — priority-ordered classification logic
- Tier Lifecycle — state diagram with promotions, TTLs, and expiry flow
- Hybrid Search — parallel semantic/fulltext paths through RRF fusion
- Backup & Restore — sequence diagram for create and restore flows
- Repair Flow — disaster recovery flowchart
All four translations (German, Spanish, French, Simplified Chinese) are updated to full parity with the English README, including all diagrams and the new multi-device section.
Upgrading to 1.3
No manual migration required. On first start after upgrade:
- SQLite gains a nullable
device_idcolumn (existing memories staynull) - Qdrant collections get a
device_idkeyword index - Config gets a
device.idfield auto-resolved from hostname - All new writes store content in Qdrant payloads
Pre-1.3 memories without content in Qdrant continue working normally through SQLite. They just can’t be recovered via repair if SQLite is lost — the content wasn’t in Qdrant when they were written.
The Takeaway
The single biggest lesson from the SQLite incident: if your system has two stores, and one of them silently degrades, you need the other one to pick up the slack without operator intervention. BHGBrain 1.3 makes Qdrant the safety net for SQLite and vice versa. Search falls back automatically. Recovery is a single tool call. Multi-device sharing works because both stores carry the full picture.
If you’re running BHGBrain on more than one machine, upgrade to 1.3 and run repair on each device. If you’re running it on one machine, upgrade anyway — the content-in-Qdrant change means your next SQLite mishap is a non-event instead of a data loss.
BHGBrain is open source, MIT licensed, and available at github.com/Big-Hat-Group-Inc/BHGBrain.
Kevin Kaminski is a principal at Big Hat Group, focused on enterprise AI infrastructure, Microsoft 365, and Windows 365. He builds open-source tools for teams running AI agents at work.