blog

Reverse-Engineering Amazon Kiro CLI: What 31MB of Strings Reveals

I ran `strings` on the Amazon Kiro CLI binary and spent an evening picking through the output. Here's what I found — from embedded prompt fragments and local ML inference to the archaeology of Amazon's AI branding pivots.

Reverse-Engineering Amazon Kiro CLI: What 31MB of Strings Reveals

I recently got my hands on the Amazon Kiro CLI binary — the agentic AI coding assistant that Amazon has been quietly building. Rather than just using it, I did what any self-respecting engineer would do: I ran strings on it.

The result was a 31MB text file. Here's what I found.

What Is Kiro?

If you haven't been tracking Amazon's AI branding journey, here's the lineage:

  • Amazon CodeWhisperer (2022–2024) — AI code completion, their Copilot competitor
  • Amazon Q Developer (April 2024) — CodeWhisperer folded into the broader Amazon Q AI assistant
  • Amazon Kiro (2025) — The current incarnation: a full agentic CLI/IDE with subagents, MCP support, todo lists, context compaction, and local inference

The binary I examined is an x86-64 Linux ELF, written in Rust, built via Cargo on a GitHub Actions CI runner. The build path baked into the binary confirms it: /home/runner/work/kiro-cli/kiro-cli/.

Despite the rebrand, the old names live on in the crate names: amzn-codewhisperer-client, amzn-codewhisperer-streaming-client, semantic-search-client. Legacy is forever.

The Tech Stack

The strings dump is a treasure map of dependencies. Here's the full picture:

Async Runtime & Networking - Tokio 1.49.0 (async runtime) - hyper 0.14 + 1.8, reqwest 0.12, h2 (HTTP stack) - rustls 0.21 + 0.23, ring 0.17, aws-lc-rs (TLS)

AWS Integration - aws-config, aws-sdk-sts, aws-sdk-ssooidc, aws-sigv4, aws-runtime — the full AWS auth stack

AI/ML (Local Inference!) - candle-core 0.9.2 — a pure-Rust transformer inference engine - tokenizers 0.21.4 — HuggingFace tokenizer library

Code Intelligence - Tree-sitter grammars for JavaScript, TypeScript, Python, PHP, Rust, Go, Java, and more - grep-regex, grep-searcher, aho-corasick, regex-automata — ripgrep-style fast search

TUI - ratatui 0.29, crossterm, skim (fuzzy finder), rustyline

Storage - SQLite (embedded)

Compression (notably aggressive) - zstd, zlib-rs, flate2, bzip2, lzma, zopfli

Protocols - MCP (Model Context Protocol) via rmcp 0.15.0 - SACP 10.1.0 (likely an internal Amazon protocol) - OAuth2 for auth flows

Image Processing (unexpected!) - png, gif, exr, rav1e, zune-jpeg — suggesting screenshot/image analysis capabilities

Crypto - OpenSSL, AES, RSA, ECDSA — plus post-quantum algorithms: MLKEM, MLDSA

Internal Crates - chat-cli, chat-cli-v2, agent, code-agent-sdk - use_subagent.rs, delegate.rs — multi-agent orchestration

The Juicy Part: Embedded Prompts

No full system prompts are baked into the binary — those are fetched from the server at runtime. But there are substantial prompt-adjacent strings embedded: tool descriptions, system notes, and behavioural instructions that reveal how the agent works internally.

Context Compaction

When conversations get too long, Kiro injects a message instructing the agent to treat a compressed summary as authoritative context:

"[Summary of earlier conversation]... treat this as authoritative context and explicitly acknowledge relevant parts in responses."

This is the same pattern most agentic systems use, but it's interesting to see the exact framing.

Context Window Checkpoints

When hitting context limits, Kiro creates a structured handoff document with the system note:

[SYSTEM NOTE: This is an automated checkpoint creation request, not from the user]

The checkpoint follows a rigid structure: - ## OBJECTIVE - ## USER GUIDANCE - ## COMPLETED - ## TECHNICAL CONTEXT - ## TOOLS EXECUTED - ## NEXT STEPS - ## TODO LIST

This is essentially a save-game file for the AI conversation. When context is exhausted, Kiro can pick up from the checkpoint without the user having to re-explain anything.

Nudge Messages

The binary contains several corrective system messages that get injected when things go wrong:

  • "SYSTEM NOTE: the actual tool use arguments were too complicated to be generated" — the model tried to generate tool args that were too complex
  • "The generated tool was too large, try again but this time split up the work between multiple tool uses" — nudging toward smaller atomic operations
  • "You took too long to respond - try to split up the work into smaller steps." — a timeout nudge

There's also a "Failed to send failsafe prompt:" error string, confirming a failsafe prompting mechanism — likely a last-resort injection when the agent goes off the rails.

Embedded Tool Definitions

Tool schemas are partially baked into the binary as JSON:

  • glob — file listing with .gitignore support and exclusion patterns
  • todo_list — with complete, add, remove operations. Notably, the schema requires a list_id parameter that must always be provided after every todo_list call
  • web_search — exactly what it sounds like
  • thinking — an internal reasoning tool described as "improving quality of complex tasks by breaking them into atomic actions"

The thinking tool is particularly interesting — it's a structured scratchpad that forces the model to decompose problems before acting.

Documentation Consolidation

One embedded instruction string at line 18010 reads:

"If consolidate is true, you MUST create consolidated documentation files for each target in consolidate_targets. For each consolidate_target file that already exists, You MUST merge..."

This suggests Kiro has a documentation generation/consolidation feature that enforces strict merge semantics — it won't just overwrite existing docs, it merges them.

Local ML: MiniLM for Semantic Search

This was one of the most interesting findings. Kiro embeds candle-core (a pure-Rust ML inference engine) and HuggingFace tokenizers, and references both:

  • sentence-transformers/all-MiniLM-L6-v2 (22M parameters, 384-dimensional embeddings)
  • sentence-transformers/all-MiniLM-L12-v2 (33M parameters, 384-dimensional embeddings)

But the model weights are NOT in the binary. Instead, the download flow works like this:

  1. First run → downloads MiniLM safetensors + tokenizer from https://desktop-release.q.us-east-1.amazonaws.com/models as a zip
  2. Extracts locally and validates against embedded SHA hashes (model.safetensors53aa51172d142c89d9012cce15ae4d6cc0ca6895)
  3. Subsequent runs → checks if model files exist and are valid, skips download
  4. At runtime → loads model via candle-core for local semantic search

Relevant strings confirm the flow: - "Downloading hosted model:" - "Model already exists and is valid" - "Model files invalid or missing, will re-download" - "Successfully downloaded and extracted model:" - "Loading model" (with a progress bar template)

This powers the knowledge tool — local codebase indexing using MiniLM embeddings stored in an HNSW index, with BM25 as a fallback for keyword search. All running locally, no cloud round-trip needed for code search.

Multi-Agent Architecture

The presence of use_subagent.rs and delegate.rs confirms Kiro uses multi-agent orchestration. The main agent can spawn sub-agents for specific tasks, with:

  • "Permission request from server:" — approval flow for sensitive operations
  • "Failed to send approval result:" — error handling for the permission system

Combined with the MCP support (via rmcp 0.15.0), Kiro can integrate with external tool servers — the same protocol that Claude, Cursor, and other AI tools are converging on.

What This Tells Us

A few takeaways from the spelunking:

  1. Local-first intelligence — Running MiniLM locally for code search is smart. It means semantic search works offline and doesn't leak your codebase to the cloud for basic navigation.

  2. The context problem is universal — Even Amazon's team is wrestling with context window limits. Their checkpoint/compaction system is sophisticated but the problem is fundamentally the same one every agentic system faces.

  3. Rust for AI tooling is maturing — candle-core for inference, tokenizers for preprocessing, HNSW for vector search — the entire ML pipeline runs in Rust. No Python subprocess needed.

  4. The nudge pattern — Injecting corrective system messages when the agent misbehaves ("too complicated", "too large", "too slow") is a practical approach to guardrailing that's worth stealing.

  5. Legacy never dies — Three brand names deep and the CodeWhisperer crate names are still there. Some things are universal across all software organizations.


All analysis was performed on a strings dump of the binary. No reverse engineering of machine code was performed — just reading the tea leaves that strings(1) surfaced from the compiled artifact.

← back to index