CLASS QueryEngine / src/QueryEngine.ts / 1,297 lines

The Query
Engine

The heart of Claude Code. Owns the conversation lifecycle: system prompt construction, streaming API calls, tool execution loops, permission tracking, token accounting, and session persistence.

ASYNC_GENERATOR
submitMessage() yields SDKMessage
QueryEngine.ts TypeScript
class QueryEngine
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
export type QueryEngineConfig = { cwd: string tools: Tools commands: Command[] mcpClients: MCPServerConnection[] agents: AgentDefinition[] canUseTool: CanUseToolFn getAppState: () => AppState setAppState: (f: (prev: AppState) => AppState) => void initialMessages?: Message[] readFileCache: FileStateCache customSystemPrompt?: string appendSystemPrompt?: string userSpecifiedModel?: string thinkingConfig?: ThinkingConfig maxTurns?: number maxBudgetUsd?: number taskBudget?: { total: number } jsonSchema?: Record<string, unknown> }
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
209
210
211
212
export class QueryEngine { private config: QueryEngineConfig private mutableMessages: Message[] private abortController: AbortController private permissionDenials: SDKPermissionDenial[] private totalUsage: NonNullableUsage private discoveredSkillNames = new Set<string>() private loadedNestedMemoryPaths = new Set<string>() // One QueryEngine per conversation. submitMessage() starts a new turn. constructor(config: QueryEngineConfig) { this.config = config this.mutableMessages = config.initialMessages ?? [] this.abortController = config.abortController ?? createAbortController() this.permissionDenials = [] this.totalUsage = EMPTY_USAGE } async *submitMessage( prompt: string | ContentBlockParam[], options?: { uuid?: string; isMeta?: boolean }, ): AsyncGenerator<SDKMessage, void, unknown> {

Anatomical_Markers

[ MK-01: TOOL_POOL ] build

Config receives Tools, Command[], MCPServerConnection[], and AgentDefinition[]. These define the complete capability surface the LLM can invoke during the conversation.

[ MK-02: STATE_BRIDGE ] sync

getAppState/setAppState bridge the QueryEngine to the React UI. Every tool execution can read and mutate application state, which triggers re-renders in the Ink terminal.

[ MK-03: BUDGET_CONTROL ] payments

maxTurns, maxBudgetUsd, and taskBudget provide hard limits on the conversation. Token usage is tracked in totalUsage and cost calculated via the cost-tracker.

[ MK-04: ASYNC_GENERATOR ] stream

submitMessage() is an async generator that yields SDKMessage events — text chunks, tool calls, permission requests, compact boundaries, status updates. Callers iterate with for await.

Private Fields 8
Config Fields 18

MESSAGE FLOW

User Input (text or ContentBlockParam[])
    │
    ▼
submitMessage() — AsyncGenerator<SDKMessage>
    │
    ├── 1. Build system prompt
    │   ├── fetchSystemPromptParts() → static + dynamic sections
    │   ├── Inject CLAUDE.md memory
    │   ├── Inject tool descriptions
    │   └── Inject environment context (OS, git, cwd)
    │
    ├── 2. Construct API call
    │   ├── Model selection (user override → config → default)
    │   ├── Thinking config (adaptive / enabled / disabled)
    │   ├── Message normalization (strip UI-only messages)
    │   └── Token budget check → trigger compact if needed
    │
    ├── 3. query() — Stream API call via Anthropic SDK
    │   ├── SSE streaming: text_delta, thinking_delta, tool_use
    │   ├── Yield SDKMessage events as they arrive
    │   └── Handle rate limits, retries, fallback models
    │
    ├── 4. Tool Loop (if tool_use blocks present)
    │   ├── For each tool_use block:
    │   │   ├── Permission check (canUseTool)
    │   │   ├── Execute tool (tool.execute(input, context))
    │   │   ├── Yield tool result as SDKMessage
    │   │   └── Append result to messages
    │   ├── Feed all results back → goto step 3
    │   └── Repeat until: no tool calls, maxTurns hit, or budget exceeded
    │
    ├── 5. Post-processing
    │   ├── Session persistence (recordTranscript)
    │   ├── Token usage accumulation
    │   ├── File history snapshot
    │   └── Memory extraction trigger
    │
    └── 6. Yield final status + usage summary

Streaming-First

Every response streams via async generator. Text appears token-by-token. The REPL renders each chunk immediately through Ink. No buffering.

Permission-Gated Tools

The canUseTool callback wraps every tool invocation. Denials are tracked and reported back to the SDK. The tool loop respects user decisions.

Conversation Persistence

Messages accumulate in mutableMessages across turns. Sessions are persisted to disk for --resume. Compaction prevents token overflow.

compress

CONTEXT COMPACTION ENGINE

src/services/compact/ — 145KB across 11 files

When conversations grow long, token counts approach the context window limit. The compaction engine is a three-tier system that progressively summarizes history to keep the conversation alive without losing critical context.

TIER 1 bolt

Session Memory Compact

Lightweight. If a session memory file exists and is recent, creates a compact boundary using the memory as summary. No LLM call needed.

sessionMemoryCompact.ts (21KB)
TIER 2 psychology

Full Compaction

Forks a subagent that calls the LLM to generate a conversation summary. Old messages replaced with the summary as a new conversation start.

compact.ts (60KB) + prompt.ts (16KB)
TIER 3 content_cut

Micro-Compact

Targeted inline trimming of large tool results — file reads, grep output, shell output — replaced with [Old tool result content cleared].

microCompact.ts (19KB)
COMPACTION_TRIGGER_FLOW autoCompact.ts
autoCompact monitors token count after each API response

Token Usage Check:
  │
  ├── Below threshold ────────────────────── Continue normally
  │   (contextWindow - 20K output - 13K buffer)
  │
  ├── Above WARNING threshold (20K buffer) ── Show yellow indicator
  │
  ├── Above ERROR threshold ──────────────── Show red indicator
  │
  ├── Above AUTO_COMPACT threshold ────────── Trigger compaction:
  │   │
  │   ├── 1. Try Session Memory Compact
  │   │       └── Success? → Done (no API call!)
  │   │
  │   ├── 2. Try Full Compaction
  │   │       ├── Fork subagent with summary prompt
  │   │       ├── Generate conversation summary
  │   │       ├── Replace old messages with boundary
  │   │       ├── Re-attach: open files, plans, MCP state, skills
  │   │       └── Post-compact cleanup + hooks
  │   │
  │   └── 3. Micro-Compact fallback
  │           └── Trim large tool results inline
  │
  └── Above BLOCKING limit ───────────────── Force /compact before next query

Circuit Breaker: After 3 consecutive failures, stop retrying
(Saved ~250K wasted API calls/day globally — BQ 2026-03-10)
      

Key Constants

AUTOCOMPACT_BUFFER_TOKENS 13,000
WARNING_THRESHOLD_BUFFER 20,000
MAX_OUTPUT_TOKENS_SUMMARY 20,000
MAX_CONSECUTIVE_FAILURES 3
POST_COMPACT_MAX_FILES 5
POST_COMPACT_TOKEN_BUDGET 50,000

Configuration

DISABLE ALL
DISABLE_COMPACT=1
DISABLE AUTO ONLY
DISABLE_AUTO_COMPACT=1

Keeps manual /compact working

OVERRIDE THRESHOLD
CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80

Compact at 80% of context window

CUSTOM WINDOW
CLAUDE_CODE_AUTO_COMPACT_WINDOW=100000

Cap context window to 100K tokens

Post-Compact Restoration

After compaction replaces old messages with a summary boundary, critical state is re-attached:

folder_open
Open files
(max 5, 5K tok each)
account_tree
Active plan
(re-injected)
hub
MCP state
(instructions delta)
auto_awesome
Invoked skills
(max 25K tok)