kirang89 shared in

Public

Scripted and verified LLM context compaction behaviorPublic

Agent

Model

Claude Opus 4.7

Shared

May 25, 7:05 AM

Messages78

Downloads1

Summary

Closed, Completed

Plan
Understand architecture and environment.
Plan
Build
Write script to trigger compaction with large context.
Build
Verify
Run script and adjust compaction thresholds.
Verify
Verify
Analyze compaction report and validate behavior.
Verify
Understand
Explain user access to uncompacted context data.
Understand

can you generate a script to call ask with a big context to trigger compaction? I want to inspect what is sent to the LLM and validate if the uncompacted session data is still available to a user. run this script and analyse to verify

ReadREADME.md

Readcompaction.ts

Readsession.ts

Readconfig.ts

Readtest-ask.ts

Readindex.ts

I need to create a test script that calls ask with enough context to trigger message compaction, then inspect what gets sent to the LLM and verify whether the original uncompacted session data remains accessible. The compaction process replaces the messages array with a summary, but the full conversation history is still preserved in the turns array, so users can still access the uncompacted data.

Now I'm thinking about what actually gets sent to the LLM after compaction—it should be just the summary plus the recent messages. To see exactly what's being transmitted, I need to intercept the stream function and examine its parameters.

Readstream-processor.ts

I see the issue now — the public SessionConfig exposed through index.ts doesn't include stream and streamSimple injection points, but the internal Session class does, so I'll need to either use the Session class directly or apply monkey-patching to intercept these functions for testing.

The most practical approach is to force compaction with a small context window, then use the Session constructor directly with a custom stream interceptor to track what messages are actually sent to the LLM, rather than trying to work around the private fields or complex module patching.

Readtypes.ts

I'm mapping out a testing strategy: lower the context window to trigger compaction faster, intercept the stream calls to see what gets sent to the LLM, build up context through multiple exchanges, then verify that the compaction actually reduces the payload while keeping the full conversation history accessible on the client side. The tricky part is that Client.connect doesn't directly expose the stream methods I need to hook into.

I could monkey-patch the pi-ai module, but the session captures its stream reference at construction time, so that won't work. Instead, I should construct a Session directly with my own custom stream function—that requires a Repo object, which I can get from the existing test setup or by using connectRepo. Let me check what's already available in the test harness.

Readsession.test.ts150 lines

I need to set up a test that validates uncompacted session data remains accessible after compaction—so I'll create a script that connects to a real repo and constructs a custom Session to intercept the LLM stream, then verify that the summary gets sent to the LLM while the original uncompacted data stays available to the user.

I'm realizing the compaction logic calls completeSimple directly from pi-ai rather than through the injected stream function, so mocking the stream alone won't capture what gets sent during compaction. I'll need to mock pi-ai's completeSimple to properly inspect the Context being passed to the LLM.

Let me step back and use Bun's module mocking system to stub out the pi-ai functions before session.ts and compaction.ts load them. That way I can intercept all LLM calls and log the Context objects while keeping everything deterministic and cost-free. I'm realizing the cleanest approach is to use mock.module from bun:test to intercept pi-ai's stream and completion functions before importing ask-forge, then construct a Session with a mock Repo and force compaction with a tiny context window to see the summarization in action. The tricky part is that compaction calls completeSimple on the model, so I need to either run this as a bun test or find a way to stub that call without needing real API credentials. I'm realizing mock.module only works in test contexts, not in regular scripts. For runtime mocking outside of tests, I'd need Bun.plugin or to physically replace the module. Let me just try both approaches and see which one actually works.

Actually, I'm overcomplicating this. The simplest path is to write a script that uses the real pi-ai with a real model (like claude-3-5-haiku or open-mistral-7b), sets the compaction context window very small so it triggers easily, and then observes the compaction event in the stream output to validate that user data is preserved in getTurns(). The user will need an API key, which is reasonable for ask-forge. I'll construct the Session directly with a real repo, log all messages sent to the LLM, and after each interaction print what's in context and what getTurns() shows to confirm compaction doesn't lose user-facing data. Then save the results to JSON for analysis.

For capturing what's sent to the LLM, I can monkey-patch pi-ai's stream and streamSimple functions to log the messages, and since compact.ts calls completeSimple directly with the same model, I'll observe the compaction summary through the compaction event's summary field rather than trying to intercept the internal call.

Now I'm writing a script that wraps those stream functions to capture each LLM inference call's context...

Setting up the test environment with a repo URL, model configuration, and a Session instance with compaction settings to force early compaction. I'm defining a series of prompts to ask the session and then iterating through them, listening for compaction events to log what's happening.

Generating the report...

Now I understand the architecture. Let me check package.json and the env:

I'll use claude-haiku-4-5 for cost efficiency. Now I'm setting up a script that constructs a Session with a wrapped stream to capture the messages sent to the LLM at each iteration, and I'll track the context before the compaction event happens using those stream calls.

I can also see the context after compaction from the stream call that follows, grab the summary itself from the compaction stream event, and access the uncompacted session data through session.getTurns() — that gives me everything needed to demonstrate the user still has access to the original uncompacted data.

Writing the script now...

Now I have everything I need. Let me write the script:

Now let's run it:

The compaction mechanism didn't trigger because the token count stayed too low. I need to either work with a larger repository, lower the compaction thresholds significantly, or create prompts that force the system to read massive files. Let me adjust the settings much more aggressively—dropping the context window to 3000 tokens with a 500-token reserve and keeping only 500 tokens of recent history. This should force compaction to kick in even on the second iteration.

Wait, I'm looking at the shouldCompact logic more carefully...

Adjusting the thresholds to trigger at 2500 tokens is definitely achievable. But I should also switch to a larger repository where I can meaningfully work with multi-kilobyte files. Something like sindresorhus/slugify or expressjs/express would be better than the tiny test repos. Actually, I could just use ask-forge itself or point to a well-known small-to-medium repo. Let me combine both approaches—tighten the thresholds and use a denser repository. I'll stick with is-odd since it's already cloned, but lower the compaction trigger to 2500 tokens with tighter reserve and recent-history settings, then run the test again.

Compaction didn't fire — the repo/responses were too small. Let me tighten the threshold further:

Editedinspect-compaction.ts

Compaction fired. Let me analyze the report in more detail to verify the two key claims:

>cd /Users/kiran/work/ask-forge && bun -e '
const r = JSON.parse(require("fs").readFileSync("tmp/compaction-report.json","utf8"));

console.log("### LLM CALL SHAPE (what went to the model on each inference)");
for (const c of r.llmCalls) {
  const firstRole = c.messages[0]?.role;
  const firstText = typeof c.messages[0]?.content === "string"
    ? c.messages[0].content.slice(0, 70)
    : JSON.stringify(c.messages[0]?.content).slice(0, 70);
  console.log(`  call#${c.index}: n=${c.messageCount}  first[${firstRole}]=${JSON.stringify(firstText)}`);
}

console.log("\n### COMPACTION EVENT");
console.log("  tokensBefore:", r.compactionEventDetails.tokensBefore);
console.log("  tokensAfter: ", r.compactionEventDetails.tokensAfter);
console.log("  firstKeptOrdinal:", r.compactionEventDetails.firstKeptOrdinal);

console.log("\n### POST-COMPACTION FIRST CALL (call#5) — full role breakdown");
const after = r.llmCalls[5];
for (const [i, m] of after.messages.entries()) {
  const body = typeof m.content === "string"
    ? m.content.slice(0, 120)
    : (m.content.map(b => b.type + (b.type==="text"?":"+(b.text||"").slice(0,80): b.type==="toolCall"?":"+b.name: "" )).join("|"));
  console.log(`  [${i}] ${m.role}: ${body}`);
}

console.log("\n### USER-VISIBLE TURNS — deep check of compacted turns 0 and 1");
for (const t of r.turns.slice(0,2)) {
  console.log(`\n  turn id=${t.id}  prompt=${JSON.stringify(t.prompt.slice(0,60))}`);
  for (const [i, s] of t.steps.entries()) {
    if (s.type === "tool_call") {
      console.log(`    step[${i}] tool_call ${s.name}(${JSON.stringify(s.params).slice(0,60)}) -> output ${s.output.length} bytes`);
    } else if (s.type === "text") {
      console.log(`    step[${i}] text: ${JSON.stringify(s.text.slice(0,80))}`);
    } else {
      console.log(`    step[${i}] ${s.type}`);
    }
  }
}

console.log("\n### INVARIANT CHECK");
const preCompactionCall = r.llmCalls[4];   // last call before compaction
const postCompactionCall = r.llmCalls[5];  // first call after compaction
const summaryWrapperFirst = typeof postCompactionCall.messages[0].content === "string"
  && postCompactionCall.messages[0].content.startsWith("[CONTEXT SUMMARY");
console.log("  pre-compaction call messages:     ", preCompactionCall.messageCount);
console.log("  post-compaction call messages:    ", postCompactionCall.messageCount);
console.log("  post-compaction first msg is summary wrapper:", summaryWrapperFirst);
console.log("  user turns still have full tool_call output:",
  r.turns[0].steps.filter(s => s.type === "tool_call").every(s => s.output.length > 0));
'

The analysis confirms what I expected — the compaction triggered right before turn 3 because the pending messages totaled 2367 tokens, which exceeded the 2000-token threshold (context window of 2500 minus the 500-token reserve). After compaction, the LLM received the compressed messages along with call#4, which contained 11 messages before the compaction fired.

What's striking is that call#5, the first call after compaction, was reduced to just 3 messages: a synthetic user message with the LLM-generated summary, the most recent assistant response from the previous turn, and the new user prompt. All the raw file contents, tool calls, and results from the earlier turns got collapsed into that summary and are no longer sent to the LLM.

But here's the key insight — the user-facing turn history remains completely intact. When I check session.getTurns(), all four turns are still there with full fidelity, including the compacted ones. The tool outputs like the full package.json and README.md contents are preserved in their original form, and the compaction summary itself is persisted and can be retrieved for session restoration. So compaction is entirely non-destructive from the user's perspective — it only shrinks what gets sent to the LLM.

The architecture uses two separate stores: the #context.messages array that gets mutated by compaction to feed the LLM, and the #turns array that's append-only and never touched, which is what users see. This separation is exactly what allows the system to compress the context window while keeping the full history accessible.

The run validates the behavior. Here's the write-up:

Script & results

Script: scripts/inspect-compaction.ts [blocked] — constructs a Session directly (the public Client.connect doesn't expose stream injection), wraps pi-ai's stream / streamSimple so every Context.messages snapshot sent to the LLM is captured in memory, and uses aggressive compaction settings (contextWindow: 2500, reserveTokens: 500, keepRecentTokens: 600) against jonschlinkert/is-odd with Claude Haiku 4.5.

Full dump: tmp/compaction-report.json. Log: tmp/inspect-compaction.log.

What happened

Four asks ran; compaction fired at the start of turn 3 (before the first inference of that turn), triggered by maybeCompact seeing 2367 tokens > 2000 (=contextWindow − reserve).

LLM-facing context shrank abruptly across the compaction boundary:

call	msgs sent to LLM	first message
#0	1	user prompt of turn 1
#1	4	user prompt of turn 1
#2	6	user prompt of turn 1
#3	8	user prompt of turn 1
#4	11	user prompt of turn 1 (peak pre-compaction)
#5	3	`[CONTEXT SUMMARY - Previous conversation was compacted]…`
#6	6	summary wrapper
#7	8	summary wrapper

Post-compaction call #5 contains exactly:

[0] user: synthetic summary-wrapper message
[1] assistant: the last assistant text from turn 2 (kept because keepRecentTokens)
[2] user: the new prompt that started turn 3

The raw contents of README.md, package.json, fd results, index.js, test.js — gone from the LLM's view. Only the LLM-generated summary survives.

Is uncompacted data still available to the user?

Yes, fully. session.getTurns() continues to expose every compacted turn with complete step fidelity:

turn 0 (compacted):  tool_call read(package.json) -> 4171 bytes output ✔                     tool_call read(README.md)    -> 1871 bytes output ✔turn 1 (compacted):  tool_call fd(...)            ->   17 bytes output ✔                     tool_call read(index.js)     -> 1319 bytes output ✔                     tool_call read(test.js)      ->  736 bytes output ✔

Invariant check at the end of the script confirmed: every tool_call step in every compacted turn still carries its full output string.

Why this works (code-level)

The session keeps two separate stores (src/session.ts [blocked]):

#context.messages — the Context passed to pi-ai. maybeCompact replaces this with [summaryWrapper, ...keptMessages].
#turns: TurnResult[] — built by AskStreamImpl at turn_end and pushed in the ask() callback. Compaction never touches it.

Compaction therefore only edits the LLM-facing store; the user-facing audit log (getTurns()) plus the summary (getCompactionSummary()) together give callers both the compact representation used by the model and the full pre-compaction history, and the two are guaranteed independent by construction.

Files changed / produced

scripts/inspect-compaction.ts — new reproducible harness (you can rerun any time with a different repo as first arg)
tmp/compaction-report.json — full captured llmCalls[] + turns[] + summary
tmp/inspect-compaction.log — narrated run output

can the user see the uncompacted context at any point

I'm checking whether the library exposes the raw message context before compaction happens. The #context.messages field is private, and while session.getTurns() and session.getCompactionSummary() are available, it looks like users can't directly access the uncompacted messages once they've been processed.

However, I notice there's a reconstructContext function in turns-to-messages.ts that rebuilds messages from turns, so a user could theoretically reconstruct the full uncompacted context if that function is exported publicly. Let me check what's actually exposed in the public API.

Readturns-to-messages.ts

I'm realizing that reconstructContext isn't part of the public API, but users can still access the uncompacted context indirectly through session.getTurns(), which exposes TurnResult objects containing all the tool outputs and text steps. The private #context.messages field stays hidden, so there's no direct access to the raw context. events (text, tool_use_end, tool_result, etc.) are emitted BEFORE any compaction of that turn's messages. And compaction only runs at the START of the NEXT turn (#runCompaction is called at turn start on messagesWithQuestion). So within a turn, all data streams to the user raw.

After a turn ends — session.getTurns() returns TurnResult[] which is append-only and never mutated by compaction. Each TurnResult.steps array contains full tool_call.output, text, etc. This is the append-only audit log.
After compaction — the LLM-facing #context.messages has been collapsed, but #turns is untouched. The user can still get the full pre-compaction conversation from getTurns().
Reconstructable to pi-ai Messages — even the concrete pi-ai Message[] shape that went to the LLM pre-compaction is recoverable by feeding turns into reconstructContext(). BUT — this function is NOT publicly exported (it's not in src/index.ts's export list). It's imported internally by session.ts for the initialTurns restoration path. So a user can't directly call it via the public API; they'd need to either (a) import from the deep module path (not part of the stable API per jsr.json which only exports index.ts), or (b) reimplement it.
Raw internal #context.messages — never exposed. Not via any getter. The user can only observe what went to the LLM by wrapping the stream function (which isn't available via Client.connect anyway), as our script did.

So the precise answer: semantically yes, getTurns() preserves full fidelity of every prompt, tool call, output, and assistant text forever—compaction is just a view-only optimization for the LLM's context window, not data deletion. The exact pi-ai Message[] shape isn't exposed through the public API since reconstructContext() isn't re-exported, though it would produce it from turns. And everything streamed during a turn happens before that turn's messages can be compacted, since compaction runs at the start of the next turn. Let me verify the timing by checking where compaction actually fires in the code—it runs after turn_start is yielded but before iteration begins, using the new prompt message alongside the existing context.

Short answer: yes — the uncompacted conversation is always retrievable by the user, just not as the exact #context.messages array that pi-ai sees.

Here's the precise picture, grounded in the code:

Three views, two of them user-accessible

Store	What it is	Mutated by compaction?	Public access?
`Session.#context.messages`	The live pi-ai `Context.messages` — the literal array sent to the LLM	Yes — replaced with `[summaryWrapper, ...kept]` by `maybeCompact`	❌ private, no getter
`Session.#turns: TurnResult[]`	Append-only audit log, one entry per completed turn	No — never touched by compaction	✅ `session.getTurns()`
`Session.#compactionSummary`	The latest LLM-generated summary text	Set by compaction	✅ `session.getCompactionSummary()`

When can a user see uncompacted data?

1. During the turn that produced it — always, via stream events. Compaction only runs in #runCompaction, which is called at the start of each new turn, on [...this.#context.messages, newQuestionMessage] (src/session.ts [blocked] #doAsk). A turn's own content is therefore never compacted before its turn_end fires — every text, tool_use_end, tool_result event reaches the caller raw.

2. After the turn ends — always, via session.getTurns(). AskStreamImpl's onTurn callback pushes the completed TurnResult onto #turns before any future compaction can run. TurnResult.steps preserves full fidelity: tool_call.output is the entire tool result string, text.text is the full assistant message. The previous run's report confirmed this — compacted turn 0 still carried the full 4171-byte package.json read output and the 1871-byte README.md read output inside its steps.

3. After compaction — still yes, via the same getTurns(). Compaction is a view-only optimization for what goes to the LLM. It never deletes history from the turn log.

What the user cannot directly see

The exact pi-ai Message[] array that was previously going to the LLM. #context.messages is private and there's no getter. The closest thing is src/turns-to-messages.ts::reconstructContext(turns), which rebuilds an equivalent Message[] from turns — but it's not re-exported from src/index.ts / the JSR . export, so it isn't part of the public surface. Users wanting the literal message array either:

Reimplement reconstructContext themselves from getTurns() output (all the information is there), or
Wrap stream / streamSimple like our script does (requires bypassing Client.connect and constructing Session directly, since Client.connect doesn't expose stream injection).

Small gap worth flagging

If exposing uncompacted-context-as-messages is an intended use case (e.g. for "fork this session into another tool" or "inspect what the LLM would see without compaction"), consider one of:

Re-export reconstructContext from src/index.ts, or
Add a session.getFullMessages() method that returns reconstructContext(this.#turns).messages, or
Add a session.getLLMContextMessages() getter that returns a defensive copy of #context.messages (the compacted view — useful for debugging what the next call will actually send).

Right now callers get (1) and (2) by proxy through getTurns() + a bit of client-side glue, but (3) — the actual post-compaction payload — is only observable by intercepting stream, which the public Client API doesn't support.