Build With Moenu buildwithmoenu.com
Case Study

duSraBheja — Personal AI Second Brain & MCP Server

Live

A personal open-brain that ingests everything I capture and makes it answerable by AI agents through MCP.

Overview

Project framing

A personal AI second brain that doesn't ask me to organize anything. I drop things into a Discord channel and ask later. The system handles the capture, classification, embedding, and merge into a canonical memory — and exposes the answer surface to Claude Code, Codex, and a public chat through MCP.

Problem

The problem isn't capturing information — it's retrieving it when you need it and knowing what you know. Notes app, browser bookmarks, Apple Notes, Discord screenshots, git history — knowledge lives in eight places and I never find it again. Every tool I tried required manual organization I don't actually do.

Role

Designer, engineer, operator. Solo build.

Why now

MCP changed the answer surface. Once Claude Code and Codex could call into a brain over MCP, the second-brain problem stopped being about UI and started being about pipelines, retrieval quality, and what the public surface should and should not say.

Topline

Constraints

  • Free-tier model budget — every chat, vision, and embedding call routes through NIM; token counts and model names get logged but cost is zero.
  • Single-droplet ops — no horizontal scale, so the public path was engineered to be cache-friendly and fast.

Outcomes

  • Live in production on a single DigitalOcean droplet running five services (brain-redis, brain-bot, brain-worker, brain-mcp, brain-api).
  • Powers the public chat at /brain and the Claude Code / Codex tool surface I use daily.
  • Captures from six+ sources, classifies with Llama 3.1 8B, embeds via nv-embedqa-e5-v5 (1024d), merges via Llama 3.3 70B.
Architecture

From capture to answer

How a single Discord message becomes something the brain can answer with.

01
Step 1

Capture

A message, image, PDF, or link lands in Discord #inbox. The bot enqueues a job, never blocks.

02
Step 2

Extract

Worker routes by MIME type and pulls text from PDFs, images (vision OCR), Excel, DOCX, or web links.

03
Step 3

Classify

Llama 3.1 8B returns strict JSON: category, confidence, entities, tags, suggested action. Below 0.75 → review queue.

04
Step 4

Embed

Text chunked at 512 tokens with 64 overlap, embedded via nv-embedqa-e5-v5 into 1024d vectors.

05
Step 5

Librarian merge

Llama 3.3 70B decides: merge into an existing Note or create a new canonical one. Source artifacts are kept with provenance.

06
Step 6

Answer

Same canonical store powers /brain chat, MCP tools for Claude Code & Codex, and the private dashboard.

Low-confidence path

Items below the 0.75 gate get a clarification question and a review-queue row instead of being silently dropped.

Cognition trigger

Every 20 merges, an on-demand synthesis pass runs across recent signals to surface threads I haven't seen.

OutcomeOne brain, six capture surfaces, two answer surfaces.

Architecture narrative

Discord #inbox is the capture entry point. The bot enqueues into ARQ on Redis; the worker runs an async pipeline — extract (PDF/image/excel/docx/link), classify (Llama 3.1 8B → strict JSON), embed (nv-embedqa-e5-v5, 1024d), librarian merge (Llama 3.3 70B) into a canonical Note. Classification is gated at 0.75 confidence — below that, a review queue and clarification flow. Storage is a six-layer canonical memory (Evidence → Observation → Episode → Thread → Entity → Synthesis); story is presentation, not storage. FastMCP exposes search / ask / capture / context / protocol / story tools to Claude Code and Codex. A 5-page private Atlas dashboard sits on the same data — every page is one or two indexed SQL queries, no on-render LLM calls. The public surface is a separate, owner-approved snapshot — PublicFactRecord rows that only surface when I check the approval queue.

System shape

Five Docker services on one droplet; one canonical data store; two distinct answer surfaces.

Capture

Discord #inbox

Bot cog enqueues every message + attachment as an ARQ job.

Local collector

Apple Notes, Chrome history, git, project files, life exports — periodic scans.

Pipeline

ARQ worker

Five concurrent jobs max, 5-minute timeout, async throughout.

Extract → Classify → Embed → Merge

Strict JSON contracts at every step; failures stage to a review queue, not the dead-letter.

Memory

PostgreSQL + pgvector

Canonical Notes + the six-layer memory (Evidence → Synthesis); vector search is project-aware.

AuditLog

Every LLM call logs model, tokens, cost, duration, trace_id.

Answer surfaces

FastMCP server

search / ask / capture / context / protocol / story tools for Claude Code & Codex.

Private dashboard

Five Atlas pages — What's New, Inbox, Library, Projects, Public Facts. No on-render LLM calls.

Public surface

Owner-approved PublicFactRecord allowlist + scrubbed chat. The site you're reading.

Confidence gate

0.75 separates auto-accept from owner-review. Below the line, the brain asks a clarification question instead of guessing.

Public ≠ Private

The public site reads from a separate snapshot. Promoting anything to public is a manual approval — never an automatic derivation.

Key Decisions
1 / 3
Pivot off TypeScript on day 10

Rewrite the whole stack in Python with Discord/ARQ.

The TS stack had clean infra (NATS, Temporal, pgvector) but was bleeding memory across ten child processes and Node's async story wasn't native. Python + ARQ gave me native async/await, SQLAlchemy 2.0, real MCP, and simpler ops.

Threw away ten days of work. Bought a stack I could ship to production solo.
Drop Anthropic + OpenAI for NIM

Move every chat, vision, and embedding call to NVIDIA NIM's free tier via one OpenAI-compatible client.

The cost curve was wrong for a personal brain. NIM free-tier has no per-token cost and the Llama 3.3 70B + nv-embedqa-e5-v5 quality is good enough for my retrieval and merge workloads.

Locked into NIM's free-tier rate limits and model lineup. Worth it for $0/month.
Owner-approved public surface, not derived

Build a separate snapshot for the public site that only surfaces facts I've explicitly approved.

A second brain that auto-publishes is a liability. I want the brain to remember everything privately and only say in public what I've signed off on.

More owner work (approval queue) for a fundamentally safer public surface.
Struggles

Vector search silently returned wrong results for a day. asyncpg's parameter binding for pgvector isn't the same as psycopg2's — queries were running but ranking was nonsense.

Three-commit fix, about 65 lines. The deeper cost was confidence in retrieval. Recovery was discipline: every retrieval path got a test before being shipped, and the librarian and retriever got project-aware ranking. That habit is how I now ship retrieval changes without anxiety.

Public snapshot rebuild silently wiped owner-curated content on every cron tick. A previous refactor stubbed the narrative builder and the rebuild kept calling it; months of approved content evaporated between deploys.

Two layers: a carry-forward in the rebuild so stub-derived fields preserve the previous snapshot's values, and an allowlist-gated project rebuild so the delete-then-write loop never deletes rows it can't replace. Then moved the narrative source out of code and into seed files in the repo.
Learnings
Schema discipline survives rewrites; framework choices don't.
Retrieval tests are the cheapest insurance you can buy.
Anything that has to survive a refactor lives in a file or env var, not a Python function.