conversations as a knowledge base

A lot of the knowledge that matters when building software never makes it into the codebase. It lives in meetings, message threads and conversations where someone explains why we're doing it this way and not that way. The information in business context, discussions of tradeoffs, and verbal articulation of the things people care about shape every decision, but it mostly evaporates.

I've been thinking about what it would look like to actually capture it.

the gap between talking and building

When a team discusses a feature, the conversation contains a lot of useful signal. E.g., User requirements naturally surface, edge cases get raised, someone explains the business constraint that rules out the obvious approach. All of that context is exactly what you'd want available when you sit down to write the code.

The problem has always been that turning conversations into structured documentation is tedious enough that it doesn't happen consistently. Meeting notes get taken sometimes, decisions get recorded occasionally, but there's no reliable pipeline from "we talked about it" to "the information is available where it's needed."

I think LLMs change this. They're good at reading messy, unstructured text and extracting what matters from it. That makes them a useful translation layer between how we actually think and talk (which is loosely structured, full of tangents, repetitive) and the structured documentation that's useful for building software. The capture doesn't need to be clean because the retrieval can be smart.

a lightweight setup

I've been experimenting with a workflow that I think makes this practical. The basic idea is you dump everything into one place, let an indexing layer make it searchable, and sync relevant context into project repos where agents can use it.

The components:

Obsidian daily notes as the capture layer. One file per day, completely freeform. I paste meeting transcripts in, jot down ideas, drop notes from conversations. There's no structure to maintain, which is the point. If capturing something requires organising it first, it probably won't get captured.

Git as the source of truth. The knowledge base is a single repo and Obsidian Git syncs it automatically every few minutes.

OpenViking as the indexing layer. It sits on top of the repo and builds a semantic index, with tiered levels of detail. Agents can request a one-sentence summary of a chunk, or the core information, or the full original content, depending on how much context they need. This keeps token usage reasonable.

A sync script that pulls project docs (READMEs, design docs) into the knowledge base so everything is searchable in one place, and pushes relevant knowledge base content into project repos via .ai/context/ directories where coding agents can read it.

what this means for coding

When an AI coding agent starts a task, it can query the knowledge base for relevant context. Not just the code and the README, but the conversation where the team discussed why the feature works this way, the meeting where someone raised the edge case, the note where I worked through the tradeoff.

LLMs are good at this kind of high-dimensional retrieval. A conversation transcript might not mention the specific function an agent is working on, but it contains the reasoning and constraints that should inform how that function gets built. The agent can read between the lines in a way that keyword search can't.

I think this is especially useful across multiple projects. Conversations about one project often contain insights relevant to another. A shared knowledge base means that context can flow between them rather than staying siloed.

connecting to structured specs

I've been using a tool called SpecKit for structured specifications on this blog. It's a bit much for a blog, but for larger software with multiple stakeholders, having formal specs matters. I think the interesting thing is how a knowledge base like this feeds into that process. The conversations that inform specs don't need to be re-explained each time, because they're already captured and searchable. I'll probably write more about SpecKit once I've used it on something more substantial.

My core thoughts here is that the bottleneck isn't capturing information, it's making capture easy enough that it actually happens. If all I need to do is paste a transcript or write a few sentences into today's note, and the indexing and retrieval happens automatically, then the knowledge base grows as a side effect of working normally.

the gap between talking and building

a lightweight setup

what this means for coding

connecting to structured specs

Comments