I built a Rust CLI called kb-manage that reads, searches, exports, imports, and updates Kiro's local HNSW knowledge bases — reverse-engineering the on-disk format, embedding pipeline, and chunking strategy to produce binary-compatible indexes.

Building kb-manage: A Rust CLI for Kiro's Local Knowledge Bases

Kiro is an AI coding assistant that maintains local knowledge bases — HNSW vector indexes built from your source code, chunked and embedded with a MiniLM model. The problem: these KBs are entirely local. There's no built-in way to share indexed knowledge with your team, update contexts outside of Kiro, or search across everything from the command line.

So I built kb-manage — a Rust CLI that reads, searches, exports, imports, and updates Kiro's on-disk knowledge bases directly.

The On-Disk Format

Kiro stores its KBs at ~/Library/Application Support/kiro-cli/knowledge_bases/kiro_default/. The layout:

kiro_default/
├── contexts.json                  # UUID → context metadata
├── <context-uuid>/
│   ├── data.json                  # entries with text, metadata, and 384-dim vectors
│   ├── index.hnsw.graph           # HNSW graph topology
│   └── index.hnsw.data            # HNSW point data
└── models/
    ├── model.onnx                 # all-MiniLM-L6-v2
    └── tokenizer.json

Each context is a UUID directory containing a data.json (array of chunked entries with their embedding vectors) and an HNSW index serialized by the hnsw_rs crate. The contexts.json file maps UUIDs to metadata — name, source path, include/exclude patterns, item count.

Reverse-engineering the exact format took some xxd comparisons and trial-and-error with hnsw_rs parameters. The magic numbers: DistCosine, f32, 16 layers (exactly — it panics otherwise), 16 max connections, ef=200.

The Embedding Pipeline

Kiro uses all-MiniLM-L6-v2 for embeddings. To produce compatible vectors, kb-manage runs the same model through ONNX Runtime (ort crate) with the HuggingFace tokenizer:

Input text → Tokenize → ONNX inference → Mean pool (masked) → L2 normalize → 384-dim vector

The mean pooling step applies the attention mask before averaging — without this, padding tokens pollute the embedding. The final L2 normalization means cosine similarity reduces to a dot product, which is what the HNSW index expects.

Chunking: Matching Kiro's Strategy

Getting the chunking right was critical for the update command. By analyzing overlaps between consecutive chunks in Kiro's existing data, I found the pattern: collapse whitespace, then cut ~4000 character chunks with ~1000 characters of overlap (stride ~3000). The chunks use character indices, not byte indices — important when your data contains multi-byte characters like box-drawing glyphs.

Testing against a real context (Webex-Messages: 51 markdown files, ~34K lines), kb-manage produces 509 chunks vs Kiro's 508. One entry difference from a minor boundary variance. Close enough that search results are identical.

Atomic Updates

The update command re-indexes a context from its source files without going through Kiro's interactive CLI. The atomic swap strategy:

Build the new context in a .tmp-{uuid} directory on the same filesystem
Rename the live context directory to .old-{uuid}
Rename .tmp-{uuid} to the live path
Update contexts.json via temp file + rename
Delete .old-{uuid}

The race window is just the gap between steps 3 and 4. For a local tool, that's an acceptable tradeoff — and it replaced a fragile Python script that was driving kiro-cli interactively through pexpect.

What It Does

Five subcommands cover the workflow:

contexts — list all indexed contexts with entry counts and source paths
search — semantic search across contexts with natural language queries, optional --context filter and --verbose flag for full chunk text
export — write search results or entire contexts to a Kiro-compatible directory
import — merge a shared KB directory into your local Kiro installation
update — re-index a context from source with atomic swap

The sharing workflow: export a context, commit it to a git repo, teammates clone and import. The exported format is binary-compatible with Kiro's layout.

The Stack

Crate	Purpose
`clap` 4	CLI with derive macros
`ort` 2.0.0-rc.12	ONNX Runtime for MiniLM inference
`tokenizers` 0.22	HuggingFace tokenizer
`hnsw_rs` 0.3.4	HNSW index construction and serialization
`anndists` 0.1.4	DistCosine for hnsw_rs
`serde` / `serde_json`	JSON serialization
`uuid`, `time`	ID generation and timestamps

The release binary is 28MB. Not tiny, but ONNX Runtime is doing the heavy lifting.

Lessons Learned

A few things that bit me along the way:

hnsw_rs nb_layer must be exactly 16. Not "at least 16." Exactly 16. Otherwise it panics when Kiro tries to load the index.
ort v2 tensor creation is picky. The pattern is Tensor::from_array(([1usize, len], data.into_boxed_slice())) — getting the shape tuple wrong gives unhelpful errors.
Not all contexts have the same payload schema. Some contexts are missing language or file_type fields. Without #[serde(default)] on those fields, deserialization silently fails and you get zero entries with no error message.
Chunk boundaries must use char indices, not byte indices. The Webex data contains multi-byte UTF-8 characters. Slicing at byte boundaries panics; slicing at char boundaries works.

What's Next

The tool works well enough for daily use. Possible additions: a merge subcommand to combine multiple contexts into one, incremental updates that only re-embed changed files, and maybe a watch mode that re-indexes on file changes. But for now, it does what I need — search my KBs from the terminal, share them with the team, and update them without launching Kiro.