The Query
Engine
The heart of Claude Code. Owns the conversation lifecycle: system prompt construction, streaming API calls, tool execution loops, permission tracking, token accounting, and session persistence.
Anatomical_Markers
Config receives Tools, Command[], MCPServerConnection[], and AgentDefinition[]. These define the complete capability surface the LLM can invoke during the conversation.
getAppState/setAppState bridge the QueryEngine to the React UI. Every tool execution can read and mutate application state, which triggers re-renders in the Ink terminal.
maxTurns, maxBudgetUsd, and taskBudget provide hard limits on the conversation. Token usage is tracked in totalUsage and cost calculated via the cost-tracker.
submitMessage() is an async generator that yields SDKMessage events — text chunks, tool calls, permission requests, compact boundaries, status updates. Callers iterate with for await.
MESSAGE FLOW
User Input (text or ContentBlockParam[])
│
▼
submitMessage() — AsyncGenerator<SDKMessage>
│
├── 1. Build system prompt
│ ├── fetchSystemPromptParts() → static + dynamic sections
│ ├── Inject CLAUDE.md memory
│ ├── Inject tool descriptions
│ └── Inject environment context (OS, git, cwd)
│
├── 2. Construct API call
│ ├── Model selection (user override → config → default)
│ ├── Thinking config (adaptive / enabled / disabled)
│ ├── Message normalization (strip UI-only messages)
│ └── Token budget check → trigger compact if needed
│
├── 3. query() — Stream API call via Anthropic SDK
│ ├── SSE streaming: text_delta, thinking_delta, tool_use
│ ├── Yield SDKMessage events as they arrive
│ └── Handle rate limits, retries, fallback models
│
├── 4. Tool Loop (if tool_use blocks present)
│ ├── For each tool_use block:
│ │ ├── Permission check (canUseTool)
│ │ ├── Execute tool (tool.execute(input, context))
│ │ ├── Yield tool result as SDKMessage
│ │ └── Append result to messages
│ ├── Feed all results back → goto step 3
│ └── Repeat until: no tool calls, maxTurns hit, or budget exceeded
│
├── 5. Post-processing
│ ├── Session persistence (recordTranscript)
│ ├── Token usage accumulation
│ ├── File history snapshot
│ └── Memory extraction trigger
│
└── 6. Yield final status + usage summary
Streaming-First
Every response streams via async generator. Text appears token-by-token. The REPL renders each chunk immediately through Ink. No buffering.
Permission-Gated Tools
The canUseTool callback wraps every tool invocation. Denials are tracked and reported back to the SDK. The tool loop respects user decisions.
Conversation Persistence
Messages accumulate in mutableMessages across turns. Sessions are persisted to disk for --resume. Compaction prevents token overflow.
CONTEXT COMPACTION ENGINE
src/services/compact/ — 145KB across 11 files
When conversations grow long, token counts approach the context window limit. The compaction engine is a three-tier system that progressively summarizes history to keep the conversation alive without losing critical context.
Session Memory Compact
Lightweight. If a session memory file exists and is recent, creates a compact boundary using the memory as summary. No LLM call needed.
Full Compaction
Forks a subagent that calls the LLM to generate a conversation summary. Old messages replaced with the summary as a new conversation start.
Micro-Compact
Targeted inline trimming of large tool results — file reads, grep output, shell output — replaced with [Old tool result content cleared].
autoCompact monitors token count after each API response Token Usage Check: │ ├── Below threshold ────────────────────── Continue normally │ (contextWindow - 20K output - 13K buffer) │ ├── Above WARNING threshold (20K buffer) ── Show yellow indicator │ ├── Above ERROR threshold ──────────────── Show red indicator │ ├── Above AUTO_COMPACT threshold ────────── Trigger compaction: │ │ │ ├── 1. Try Session Memory Compact │ │ └── Success? → Done (no API call!) │ │ │ ├── 2. Try Full Compaction │ │ ├── Fork subagent with summary prompt │ │ ├── Generate conversation summary │ │ ├── Replace old messages with boundary │ │ ├── Re-attach: open files, plans, MCP state, skills │ │ └── Post-compact cleanup + hooks │ │ │ └── 3. Micro-Compact fallback │ └── Trim large tool results inline │ └── Above BLOCKING limit ───────────────── Force /compact before next query Circuit Breaker: After 3 consecutive failures, stop retrying (Saved ~250K wasted API calls/day globally — BQ 2026-03-10)
Key Constants
Configuration
DISABLE_COMPACT=1
DISABLE_AUTO_COMPACT=1
Keeps manual /compact working
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80
Compact at 80% of context window
CLAUDE_CODE_AUTO_COMPACT_WINDOW=100000
Cap context window to 100K tokens
Post-Compact Restoration
After compaction replaces old messages with a summary boundary, critical state is re-attached:
(max 5, 5K tok each)
(re-injected)
(instructions delta)
(max 25K tok)