blog

Building kb-manage: A Rust CLI for Kiro's Local Knowledge Bases

I built a Rust CLI called kb-manage that reads, searches, exports, imports, and updates Kiro's local HNSW knowledge bases — reverse-engineering the on-disk format, embedding pipeline, and chunking strategy to produce binary-compatible indexes.

Building kb-manage: A Rust CLI for Kiro's Local Knowledge Bases

Kiro is an AI coding assistant that maintains local knowledge bases — HNSW vector indexes built from your source code, chunked and embedded with a MiniLM model. The problem: these KBs are entirely local. There's no built-in way to share indexed knowledge with your team, update contexts outside of Kiro, or search across everything from the command line.

So I built kb-manage — a Rust CLI that reads, searches, exports, imports, and updates Kiro's on-disk knowledge bases directly.

The On-Disk Format

Kiro stores its KBs at ~/Library/Application Support/kiro-cli/knowledge_bases/kiro_default/. The layout:

kiro_default/
├── contexts.json                  # UUID → context metadata
├── <context-uuid>/
│   ├── data.json                  # entries with text, metadata, and 384-dim vectors
│   ├── index.hnsw.graph           # HNSW graph topology
│   └── index.hnsw.data            # HNSW point data
└── models/
    ├── model.onnx                 # all-MiniLM-L6-v2
    └── tokenizer.json

Each context is a UUID directory containing a data.json (array of chunked entries with their embedding vectors) and an HNSW index serialized by the hnsw_rs crate. The contexts.json file maps UUIDs to metadata — name, source path, include/exclude patterns, item count.

Reverse-engineering the exact format took some xxd comparisons and trial-and-error with hnsw_rs parameters. The magic numbers: DistCosine, f32, 16 layers (exactly — it panics otherwise), 16 max connections, ef=200.

The Embedding Pipeline

Kiro uses all-MiniLM-L6-v2 for embeddings. To produce compatible vectors, kb-manage runs the same model through ONNX Runtime (ort crate) with the HuggingFace tokenizer:

Input text → Tokenize → ONNX inference → Mean pool (masked) → L2 normalize → 384-dim vector

The mean pooling step applies the attention mask before averaging — without this, padding tokens pollute the embedding. The final L2 normalization means cosine similarity reduces to a dot product, which is what the HNSW index expects.

Chunking: Matching Kiro's Strategy

Getting the chunking right was critical for the update command. By analyzing overlaps between consecutive chunks in Kiro's existing data, I found the pattern: collapse whitespace, then cut ~4000 character chunks with ~1000 characters of overlap (stride ~3000). The chunks use character indices, not byte indices — important when your data contains multi-byte characters like box-drawing glyphs.

Testing against a real context (Webex-Messages: 51 markdown files, ~34K lines), kb-manage produces 509 chunks vs Kiro's 508. One entry difference from a minor boundary variance. Close enough that search results are identical.

Atomic Updates

The update command re-indexes a context from its source files without going through Kiro's interactive CLI. The atomic swap strategy:

  1. Build the new context in a .tmp-{uuid} directory on the same filesystem
  2. Rename the live context directory to .old-{uuid}
  3. Rename .tmp-{uuid} to the live path
  4. Update contexts.json via temp file + rename
  5. Delete .old-{uuid}

The race window is just the gap between steps 3 and 4. For a local tool, that's an acceptable tradeoff — and it replaced a fragile Python script that was driving kiro-cli interactively through pexpect.

What It Does

Five subcommands cover the workflow:

  • contexts — list all indexed contexts with entry counts and source paths
  • search — semantic search across contexts with natural language queries, optional --context filter and --verbose flag for full chunk text
  • export — write search results or entire contexts to a Kiro-compatible directory
  • import — merge a shared KB directory into your local Kiro installation
  • update — re-index a context from source with atomic swap

The sharing workflow: export a context, commit it to a git repo, teammates clone and import. The exported format is binary-compatible with Kiro's layout.

The Stack

Crate Purpose
clap 4 CLI with derive macros
ort 2.0.0-rc.12 ONNX Runtime for MiniLM inference
tokenizers 0.22 HuggingFace tokenizer
hnsw_rs 0.3.4 HNSW index construction and serialization
anndists 0.1.4 DistCosine for hnsw_rs
serde / serde_json JSON serialization
uuid, time ID generation and timestamps

The release binary is 28MB. Not tiny, but ONNX Runtime is doing the heavy lifting.

Lessons Learned

A few things that bit me along the way:

  • hnsw_rs nb_layer must be exactly 16. Not "at least 16." Exactly 16. Otherwise it panics when Kiro tries to load the index.
  • ort v2 tensor creation is picky. The pattern is Tensor::from_array(([1usize, len], data.into_boxed_slice())) — getting the shape tuple wrong gives unhelpful errors.
  • Not all contexts have the same payload schema. Some contexts are missing language or file_type fields. Without #[serde(default)] on those fields, deserialization silently fails and you get zero entries with no error message.
  • Chunk boundaries must use char indices, not byte indices. The Webex data contains multi-byte UTF-8 characters. Slicing at byte boundaries panics; slicing at char boundaries works.

What's Next

The tool works well enough for daily use. Possible additions: a merge subcommand to combine multiple contexts into one, incremental updates that only re-embed changed files, and maybe a watch mode that re-indexes on file changes. But for now, it does what I need — search my KBs from the terminal, share them with the team, and update them without launching Kiro.

← back to index