QMD · LLM WIKI

An on-device search engine for everything worth remembering. It indexes Markdown notes, meeting transcripts, documentation, and knowledge bases, then lets you search by keyword or natural language. Built by Tobi Lütke.

This wiki itself runs on QMD, and the agent uses it daily through MCP.

How It Works

QMD combines three search methods into a hybrid result:

BM25 for fast keyword search and exact hits
Vector search for semantic similarity using embeddings
LLM reranking for the final relevance ordering

Everything runs locally through node-llama-cpp with GGUF models. No cloud call required.

Version 2.1 (April 2026)

Code-aware splitting - AST-based chunking for code files, so functions and classes are not cut in half; a big win for technical RAG
Performance improvements
Official benchmarks

CLI Commands

qmd search "project timeline"           # BM25 keyword search
qmd vsearch "how to deploy"             # semantic search
qmd query "quarterly planning process"  # hybrid + reranking

qmd get "meetings/2024-01-15.md"        # fetch a document
qmd get "#abc123"                       # fetch by doc ID
qmd multi-get "journals/2025-05*.md"    # batch fetch via glob

qmd embed                               # generate embeddings
qmd update                              # refresh the index

MCP Integration

QMD ships with an MCP server for direct agent integration:

{
  "mcpServers": {
    "qmd": {
      "command": "qmd",
      "args": ["mcp"]
    }
  }
}

Exposed tools: query, get, multi_get, status

For multiple clients, you can use HTTP transport via qmd mcp --http on port 8181, which loads the models into VRAM once.

Context System

Its key feature is collection-level context metadata, which improves relevance for LLMs.

qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs "Work documentation"

Collections

qmd collection add ~/notes --name notes
qmd collection add ~/Documents/meetings --name meetings

Installation

npm install -g @tobilu/qmd
# or
bun install -g @tobilu/qmd

Community Response

“Code-aware splitting alone makes this worth it” - @ymlynsky
“shipping benchmarks with the release instead of ‘its fast trust me’ energy” - @PromptSlinger
“Code-Aware Splitting is a massive win for technical RAG. Naive chunking usually breaks function context or class logic” - @yashns1