RAG Index

May 16, 2026

Your codebase, chats, and instruction files — embedded once, queryable forever.

Most "AI for code" tools re-embed your repo every session, charge you per token to do it, and throw the index away when the chat ends. You pay for the same context window, over and over, and the model still doesn't remember what you discussed last Tuesday.

The RAG Index is the cloud layer that fixes this. It maintains persistent embeddings across three sources that normally live in completely different places: your code (AST-aware, not just text chunks), your chat history (ingested from Claude Code, Copilot, and other CLI agents), and the instruction files generated locally for each package in your repo.

Retrieval happens on demand — agents query the index when they actually need context, not preemptively at the start of every session. That keeps token costs predictable and keeps the agent grounded in your decisions rather than its best guess at what's in the file.

You control what gets indexed. Mount the repos you want, exclude the ones you don't. Embeddings stay tied to your tenant. Nothing crosses over to training data, nothing gets aggregated, and you can purge the index at any time.

Pricing is usage-based — you pay for what you embed and what you query, not a flat tier that punishes light users and undercharges heavy ones. For solo developers running a handful of repos, this typically lands in single-digit monthly costs. For teams with larger codebases, the index scales linearly without surprise overages.

The point isn't "RAG." RAG is plumbing. The point is that your entire development context — code, conversations, decisions — becomes one queryable surface instead of seven disconnected tools.