LLM Knowledge Base
Updated 2026-04-09
A workflow developed by Andrej Karpathy in which raw sources are compiled by an LLM into a structured, cross-linked Markdown wiki and continuously extended over time. Not a RAG system that starts from zero for every question, but a compiling, accumulating knowledge system that gets richer with every source.
The Real Problem with RAG
Most systems such as NotebookLM, file-upload chats, and standard RAG pipelines do the same thing: when you ask a question, they retrieve chunks from raw documents and the LLM synthesizes an answer. The problem is no accumulation. Question 100 starts over just like question 1. Need to synthesize five documents? Every time from scratch. Contradictions between sources are not resolved; they are simply reproduced at random.
The alternative is to let the LLM compile sources into a persistent wiki and keep that wiki current.
Core Idea
“The wiki is a persistent, compounding artifact.” — Andrej Karpathy
Three layers:
- Raw Sources — untouched original material such as articles, papers, images, and repos. The LLM reads them but never edits them.
- The Wiki — LLM-generated Markdown pages: summaries, entity pages, concept pages, comparisons. The LLM writes; you read.
- The Schema —
AGENTS.md/CLAUDE.md: how the wiki is structured, which conventions apply, what happens during ingest. This is the document that turns the LLM from a generic chatbot into a disciplined wiki maintainer.
Operations
Ingest: add a new source, let the LLM read it, discuss the core points, write a summary page, update the index, and touch 10 to 15 existing pages. One source can create dozens of useful connections.
Query: ask questions against the wiki. The LLM finds relevant pages, synthesizes an answer, and good answers can be written back into the wiki. Exploration accumulates just like sources do.
Lint: run a regular health check for contradictions between pages, outdated claims, orphan pages with no incoming links, important concepts without their own page, and obvious data gaps.
Indexing & Logging
index.md — a catalog of all pages with short descriptions. The LLM reads this first for every query. It works well up to roughly 100 sources or several hundred pages without needing embedding infrastructure.
log.md — an append-only chronology of what was ingested, when, and which questions were asked. A useful convention is ## [2026-04-02] ingest | Title, which stays easy to inspect with tools like grep "^## \\[" log.md | tail -5.
Tools (Karpathy)
- Obsidian as IDE and frontend. Its graph view shows what is connected, which pages are hubs, and which ones are orphans.
- Obsidian Web Clipper for turning articles into Markdown.
- qmd, local search across Markdown using BM25/vector hybrid, built by Tobi Lütke.
- Marp for generating Markdown slides directly from wiki content.
- Dataview (Obsidian plugin) for dynamic tables from frontmatter.
Use Cases
- Personal knowledge — health, goals, self-development: journal entries, articles, and podcast notes compiled into long-term memory.
- Research — paper, reports, and interviews accumulated into an evolving thesis over weeks or months.
- Reading a book — work through chapters one by one and end up with a companion wiki, similar to a fan wiki.
- Team wiki — Slack threads, meeting transcripts, and project documents where the LLM handles the maintenance no one wants to do.
- Competitive analysis, due diligence, trip planning, course notes.
Why This Works
The exhausting part of knowledge bases is not reading or thinking. It is the bookkeeping: updating cross-references, marking contradictions, and keeping pages consistent. Humans give up because the maintenance cost grows faster than the value. LLMs do not get tired, do not forget cross-references, and can touch 15 files in one pass.
This connects directly to Vannevar Bush’s Memex (1945): a personal, curated knowledge system with associative paths between documents. Bush could not solve who would do the maintenance. The LLM does.
This System
This vault is a direct implementation of that idea. The concrete workflow is documented in AGENTS.md.
Connections
- Andrej Karpathy — author of the original gist document and the origin of this workflow.
- Obsidian — the frontend. Without graph view and wikilinks the idea loses much of its value.
- Vibe Coding — related idea: the LLM as an active participant rather than a passive tool.
- Hermes Agent — the agent that maintains this vault overnight.
Sources
- llm-wiki — Karpathy’s full gist with architecture and practical tips (2026-04-04).
- @karpathy on X - LLM Knowledge Bases — original short description (2026-04-02).
- @itsolelehmann on X - LLM Knowledge Bases — summary for a broader audience.