QMD
Updated 2026-04-06
An on-device search engine for everything worth remembering. It indexes Markdown notes, meeting transcripts, documentation, and knowledge bases, then lets you search by keyword or natural language. Built by Tobi Lütke.
This wiki itself runs on QMD, and the agent uses it daily through MCP.
How It Works
QMD combines three search methods into a hybrid result:
- BM25 for fast keyword search and exact hits
- Vector search for semantic similarity using embeddings
- LLM reranking for the final relevance ordering
Everything runs locally through node-llama-cpp with GGUF models. No cloud call required.
Version 2.1 (April 2026)
- Code-aware splitting - AST-based chunking for code files, so functions and classes are not cut in half; a big win for technical RAG
- Performance improvements
- Official benchmarks
CLI Commands
qmd search "project timeline" # BM25 keyword search
qmd vsearch "how to deploy" # semantic search
qmd query "quarterly planning process" # hybrid + reranking
qmd get "meetings/2024-01-15.md" # fetch a document
qmd get "#abc123" # fetch by doc ID
qmd multi-get "journals/2025-05*.md" # batch fetch via glob
qmd embed # generate embeddings
qmd update # refresh the index
MCP Integration
QMD ships with an MCP server for direct agent integration:
{
"mcpServers": {
"qmd": {
"command": "qmd",
"args": ["mcp"]
}
}
}
Exposed tools: query, get, multi_get, status
For multiple clients, you can use HTTP transport via qmd mcp --http on port 8181, which loads the models into VRAM once.
Context System
Its key feature is collection-level context metadata, which improves relevance for LLMs.
qmd context add qmd://notes "Personal notes and ideas"
qmd context add qmd://docs "Work documentation"
Collections
qmd collection add ~/notes --name notes
qmd collection add ~/Documents/meetings --name meetings
Installation
npm install -g @tobilu/qmd
# or
bun install -g @tobilu/qmd
Community Response
“Code-aware splitting alone makes this worth it” - @ymlynsky
“shipping benchmarks with the release instead of ‘its fast trust me’ energy” - @PromptSlinger
“Code-Aware Splitting is a massive win for technical RAG. Naive chunking usually breaks function context or class logic” - @yashns1