CLASS QueryEngine / src/QueryEngine.ts / 1,297 lines

The Query
Engine

The heart of Claude Code. Owns the conversation lifecycle: system prompt construction, streaming API calls, tool execution loops, permission tracking, token accounting, and session persistence.

ASYNC_GENERATOR

submitMessage() yields SDKMessage

QueryEngine.ts TypeScript

class QueryEngine

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

export type QueryEngineConfig = { cwd: string tools: Tools commands: Command[] mcpClients: MCPServerConnection[] agents: AgentDefinition[] canUseTool: CanUseToolFn getAppState: () => AppState setAppState: (f: (prev: AppState) => AppState) => void initialMessages?: Message[] readFileCache: FileStateCache customSystemPrompt?: string appendSystemPrompt?: string userSpecifiedModel?: string thinkingConfig?: ThinkingConfig maxTurns?: number maxBudgetUsd?: number taskBudget?: { total: number } jsonSchema?: Record<string, unknown> }

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

209

210

211

212

export class QueryEngine { private config: QueryEngineConfig private mutableMessages: Message[] private abortController: AbortController private permissionDenials: SDKPermissionDenial[] private totalUsage: NonNullableUsage private discoveredSkillNames = new Set<string>() private loadedNestedMemoryPaths = new Set<string>() // One QueryEngine per conversation. submitMessage() starts a new turn. constructor(config: QueryEngineConfig) { this.config = config this.mutableMessages = config.initialMessages ?? [] this.abortController = config.abortController ?? createAbortController() this.permissionDenials = [] this.totalUsage = EMPTY_USAGE } async *submitMessage( prompt: string | ContentBlockParam[], options?: { uuid?: string; isMeta?: boolean }, ): AsyncGenerator<SDKMessage, void, unknown> {

Anatomical_Markers

[ MK-01: TOOL_POOL ] build

Config receives Tools, Command[], MCPServerConnection[], and AgentDefinition[]. These define the complete capability surface the LLM can invoke during the conversation.

[ MK-02: STATE_BRIDGE ] sync

getAppState/setAppState bridge the QueryEngine to the React UI. Every tool execution can read and mutate application state, which triggers re-renders in the Ink terminal.

[ MK-03: BUDGET_CONTROL ] payments

maxTurns, maxBudgetUsd, and taskBudget provide hard limits on the conversation. Token usage is tracked in totalUsage and cost calculated via the cost-tracker.

[ MK-04: ASYNC_GENERATOR ] stream

submitMessage() is an async generator that yields SDKMessage events — text chunks, tool calls, permission requests, compact boundaries, status updates. Callers iterate with for await.

Private Fields 8

Config Fields 18

MESSAGE FLOW

User Input (text or ContentBlockParam[])
    │
    ▼
submitMessage() — AsyncGenerator<SDKMessage>
    │
    ├── 1. Build system prompt
    │   ├── fetchSystemPromptParts() → static + dynamic sections
    │   ├── Inject CLAUDE.md memory
    │   ├── Inject tool descriptions
    │   └── Inject environment context (OS, git, cwd)
    │
    ├── 2. Construct API call
    │   ├── Model selection (user override → config → default)
    │   ├── Thinking config (adaptive / enabled / disabled)
    │   ├── Message normalization (strip UI-only messages)
    │   └── Token budget check → trigger compact if needed
    │
    ├── 3. query() — Stream API call via Anthropic SDK
    │   ├── SSE streaming: text_delta, thinking_delta, tool_use
    │   ├── Yield SDKMessage events as they arrive
    │   └── Handle rate limits, retries, fallback models
    │
    ├── 4. Tool Loop (if tool_use blocks present)
    │   ├── For each tool_use block:
    │   │   ├── Permission check (canUseTool)
    │   │   ├── Execute tool (tool.execute(input, context))
    │   │   ├── Yield tool result as SDKMessage
    │   │   └── Append result to messages
    │   ├── Feed all results back → goto step 3
    │   └── Repeat until: no tool calls, maxTurns hit, or budget exceeded
    │
    ├── 5. Post-processing
    │   ├── Session persistence (recordTranscript)
    │   ├── Token usage accumulation
    │   ├── File history snapshot
    │   └── Memory extraction trigger
    │
    └── 6. Yield final status + usage summary

Streaming-First

Every response streams via async generator. Text appears token-by-token. The REPL renders each chunk immediately through Ink. No buffering.

Permission-Gated Tools

The canUseTool callback wraps every tool invocation. Denials are tracked and reported back to the SDK. The tool loop respects user decisions.

Conversation Persistence

Messages accumulate in mutableMessages across turns. Sessions are persisted to disk for --resume. Compaction prevents token overflow.

compress

CONTEXT COMPACTION ENGINE

src/services/compact/ — 145KB across 11 files

When conversations grow long, token counts approach the context window limit. The compaction engine is a three-tier system that progressively summarizes history to keep the conversation alive without losing critical context.

TIER 1 bolt

Session Memory Compact

Lightweight. If a session memory file exists and is recent, creates a compact boundary using the memory as summary. No LLM call needed.

sessionMemoryCompact.ts (21KB)

TIER 2 psychology

Full Compaction

Forks a subagent that calls the LLM to generate a conversation summary. Old messages replaced with the summary as a new conversation start.

compact.ts (60KB) + prompt.ts (16KB)

TIER 3 content_cut

Micro-Compact

Targeted inline trimming of large tool results — file reads, grep output, shell output — replaced with [Old tool result content cleared].

microCompact.ts (19KB)

COMPACTION_TRIGGER_FLOW autoCompact.ts

autoCompact monitors token count after each API response

Token Usage Check:
  │
  ├── Below threshold ────────────────────── Continue normally
  │   (contextWindow - 20K output - 13K buffer)
  │
  ├── Above WARNING threshold (20K buffer) ── Show yellow indicator
  │
  ├── Above ERROR threshold ──────────────── Show red indicator
  │
  ├── Above AUTO_COMPACT threshold ────────── Trigger compaction:
  │   │
  │   ├── 1. Try Session Memory Compact
  │   │       └── Success? → Done (no API call!)
  │   │
  │   ├── 2. Try Full Compaction
  │   │       ├── Fork subagent with summary prompt
  │   │       ├── Generate conversation summary
  │   │       ├── Replace old messages with boundary
  │   │       ├── Re-attach: open files, plans, MCP state, skills
  │   │       └── Post-compact cleanup + hooks
  │   │
  │   └── 3. Micro-Compact fallback
  │           └── Trim large tool results inline
  │
  └── Above BLOCKING limit ───────────────── Force /compact before next query

Circuit Breaker: After 3 consecutive failures, stop retrying
(Saved ~250K wasted API calls/day globally — BQ 2026-03-10)

Key Constants

AUTOCOMPACT_BUFFER_TOKENS 13,000

WARNING_THRESHOLD_BUFFER 20,000

MAX_OUTPUT_TOKENS_SUMMARY 20,000

MAX_CONSECUTIVE_FAILURES 3

POST_COMPACT_MAX_FILES 5

POST_COMPACT_TOKEN_BUDGET 50,000

Configuration

DISABLE ALL

DISABLE_COMPACT=1

DISABLE AUTO ONLY

DISABLE_AUTO_COMPACT=1

Keeps manual /compact working

OVERRIDE THRESHOLD

CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80

Compact at 80% of context window

CUSTOM WINDOW

CLAUDE_CODE_AUTO_COMPACT_WINDOW=100000

Cap context window to 100K tokens

Post-Compact Restoration

After compaction replaces old messages with a summary boundary, critical state is re-attached:

folder_open

Open files
(max 5, 5K tok each)

account_tree

Active plan
(re-injected)

hub

MCP state
(instructions delta)

auto_awesome

Invoked skills
(max 25K tok)