Building kb-manage: A Rust CLI for Kiro's Local Knowledge Bases
Kiro is an AI coding assistant that maintains local knowledge bases — HNSW vector indexes built from your source code, chunked and embedded with a MiniLM model. The problem: these KBs are entirely local. There's no built-in way to share indexed knowledge with your team, update contexts outside of Kiro, or search across everything from the command line.
So I built kb-manage — a Rust CLI that reads, searches, exports, imports, and updates Kiro's on-disk knowledge bases directly.
The On-Disk Format
Kiro stores its KBs at ~/Library/Application Support/kiro-cli/knowledge_bases/kiro_default/. The layout:
kiro_default/
├── contexts.json # UUID → context metadata
├── <context-uuid>/
│ ├── data.json # entries with text, metadata, and 384-dim vectors
│ ├── index.hnsw.graph # HNSW graph topology
│ └── index.hnsw.data # HNSW point data
└── models/
├── model.onnx # all-MiniLM-L6-v2
└── tokenizer.json
Each context is a UUID directory containing a data.json (array of chunked entries with their embedding vectors) and an HNSW index serialized by the hnsw_rs crate. The contexts.json file maps UUIDs to metadata — name, source path, include/exclude patterns, item count.
Reverse-engineering the exact format took some xxd comparisons and trial-and-error with hnsw_rs parameters. The magic numbers: DistCosine, f32, 16 layers (exactly — it panics otherwise), 16 max connections, ef=200.
The Embedding Pipeline
Kiro uses all-MiniLM-L6-v2 for embeddings. To produce compatible vectors, kb-manage runs the same model through ONNX Runtime (ort crate) with the HuggingFace tokenizer:
Input text → Tokenize → ONNX inference → Mean pool (masked) → L2 normalize → 384-dim vector
The mean pooling step applies the attention mask before averaging — without this, padding tokens pollute the embedding. The final L2 normalization means cosine similarity reduces to a dot product, which is what the HNSW index expects.
Chunking: Matching Kiro's Strategy
Getting the chunking right was critical for the update command. By analyzing overlaps between consecutive chunks in Kiro's existing data, I found the pattern: collapse whitespace, then cut ~4000 character chunks with ~1000 characters of overlap (stride ~3000). The chunks use character indices, not byte indices — important when your data contains multi-byte characters like box-drawing glyphs.
Testing against a real context (Webex-Messages: 51 markdown files, ~34K lines), kb-manage produces 509 chunks vs Kiro's 508. One entry difference from a minor boundary variance. Close enough that search results are identical.
Atomic Updates
The update command re-indexes a context from its source files without going through Kiro's interactive CLI. The atomic swap strategy:
- Build the new context in a
.tmp-{uuid}directory on the same filesystem - Rename the live context directory to
.old-{uuid} - Rename
.tmp-{uuid}to the live path - Update
contexts.jsonvia temp file + rename - Delete
.old-{uuid}
The race window is just the gap between steps 3 and 4. For a local tool, that's an acceptable tradeoff — and it replaced a fragile Python script that was driving kiro-cli interactively through pexpect.
What It Does
Five subcommands cover the workflow:
contexts— list all indexed contexts with entry counts and source pathssearch— semantic search across contexts with natural language queries, optional--contextfilter and--verboseflag for full chunk textexport— write search results or entire contexts to a Kiro-compatible directoryimport— merge a shared KB directory into your local Kiro installationupdate— re-index a context from source with atomic swap
The sharing workflow: export a context, commit it to a git repo, teammates clone and import. The exported format is binary-compatible with Kiro's layout.
The Stack
| Crate | Purpose |
|---|---|
clap 4 |
CLI with derive macros |
ort 2.0.0-rc.12 |
ONNX Runtime for MiniLM inference |
tokenizers 0.22 |
HuggingFace tokenizer |
hnsw_rs 0.3.4 |
HNSW index construction and serialization |
anndists 0.1.4 |
DistCosine for hnsw_rs |
serde / serde_json |
JSON serialization |
uuid, time |
ID generation and timestamps |
The release binary is 28MB. Not tiny, but ONNX Runtime is doing the heavy lifting.
Lessons Learned
A few things that bit me along the way:
hnsw_rsnb_layer must be exactly 16. Not "at least 16." Exactly 16. Otherwise it panics when Kiro tries to load the index.ortv2 tensor creation is picky. The pattern isTensor::from_array(([1usize, len], data.into_boxed_slice()))— getting the shape tuple wrong gives unhelpful errors.- Not all contexts have the same payload schema. Some contexts are missing
languageorfile_typefields. Without#[serde(default)]on those fields, deserialization silently fails and you get zero entries with no error message. - Chunk boundaries must use char indices, not byte indices. The Webex data contains multi-byte UTF-8 characters. Slicing at byte boundaries panics; slicing at char boundaries works.
What's Next
The tool works well enough for daily use. Possible additions: a merge subcommand to combine multiple contexts into one, incremental updates that only re-embed changed files, and maybe a watch mode that re-indexes on file changes. But for now, it does what I need — search my KBs from the terminal, share them with the team, and update them without launching Kiro.
