Fix Stuck AI Planner and Enable Live Output

Summary

I see this output on my screen, but the plan_PROGRESS.md exists and there is no claude process running. Is the program stuck?

Let me investigate the current state of things.

Searched files**/plan_PROGRESS.md

Readralph.sh

The user says the plan_PROGRESS.md exists and there's no Claude process running, but they see output on screen suggesting things are stuck. Let me check the actual state - the ps aux output shows there IS a claude process running (PID 940936). But that's likely this current Claude session I'm in.

Let me look for:

Any plan_PROGRESS.md files
Any Ralph lock files
What screen output they might be seeing

The glob found no plan_PROGRESS.md files. Let me look more broadly.

I don't see any plan_PROGRESS.md file in the repo. Let me look more broadly and check for lock files.

Searched files**/*PROGRESS*

Request interrupted by user for tool use

Look in ~/src/mage

Readplan_PROGRESS.md

Now I can see the issue. The Ralph process IS running (PID 940763), but it's stuck on the notify.sh script (PID 940793). The notify.sh process is consuming 99.9% CPU and has been running for over 2 minutes - it's stuck in an infinite loop or something similar.

Ralph is stuck at the notification step before even launching Claude for the planning iteration. The notify.sh is spinning (99.9% CPU) trying to send the "Ralph Started" notification.

Let me look at the notify.sh to understand what's going wrong.

Found the problem. Ralph is running (PID 940763), but it's stuck on notify.sh (PID 940793) which is consuming 99.9% CPU and has been spinning for over 2 minutes trying to send the startup notification. It never gets past the "Ralph Started" notify call, so Claude is never invoked for the plan.

Let me check what's wrong with notify.sh.

Readnotify.sh

Now I see the issue. Look at line 275 in the Discord section:

bash

This is an infinite loop! The message contains *Ralph Started* which matches \*([^*]+)\*. The replacement converts *Ralph Started* to **Ralph Started**. But then on the next iteration, **Ralph Started** still matches the regex \*([^*]+)\* because the pattern matches the outer * characters and captures *Ralph Started* inside. Wait, let me think more carefully...

Actually, the regex \*([^*]+)\* matches a *, followed by one or more non-* characters, followed by *. With the input containing *Ralph Started*, the pattern captures "Ralph Started" and replaces it with **Ralph Started**. But this creates a problem because the replacement still has * characters that can match the pattern again on the next loop iteration, causing it to keep matching and replacing indefinitely. I see the issue now—when the regex matches the middle *Ralph Started* within **Ralph Started**, it replaces it with **Ralph Started**, creating ***Ralph Started***. On the next iteration, it finds another *...* pattern and adds more asterisks, causing an infinite loop. The real question is whether we should even be processing this through the Discord path in the first place.

Looking at the send_discord() function, it checks if RALPH_DISCORD_WEBHOOK_URL is set and returns early if it's not configured. But the main script still calls this function regardless. However, from the process output, I don't see a Claude process running, which suggests something else is happening here.

Actually, wait—that bash process spinning at 99.9% CPU IS the notify.sh script itself. So it's definitely stuck in an infinite loop. The regex replacement loop for converting Slack-style markdown to Discord format on line 275 must be the culprit. Since the process is consuming that much CPU, the user must have Discord configured, and the while loop is getting stuck trying to convert the *bold* syntax.

The issue is that replacing *text* with **text** still leaves * characters in the string, so the pattern keeps matching infinitely. I need to fix this with a different approach—either using sed with proper regex, processing character-by-character, or using a marker to prevent re-matching. The simplest solution is to avoid the while loop entirely and do a single-pass replacement instead.

Writing bash replacement logic...

Yes, it's stuck. The notify.sh process (PID 940793) is stuck in an infinite loop at 99.9% CPU. It never gets to launch Claude.

The bug is in notify.sh line 275-277 — the Discord formatting conversion:

bash

The message *Ralph Started* gets converted to **Ralph Started**, but **Ralph Started** still matches the regex \*([^*]+)\* (the inner *Ralph Started* portion matches), so it grows to ***Ralph Started***, then ****Ralph Started****, etc. — forever.

You need to kill the stuck processes to unstick things:

kill 940793   (the stuck notify.sh)

Would you like me to fix the infinite loop bug in notify.sh?

yes

Now let me kill the stuck process so you can re-run Ralph.

It looks like killing the notify.sh caused the ralph.sh to continue (since set -euo pipefail might have been set, but the notify function in ralph.sh has error handling). The ralph.sh seems to have restarted a new iteration, spawning new notify.sh processes (PIDs 942596 and 942607). But these are not at 99.9% CPU, which suggests the fix is working — the new notify.sh processes are running normally (0.0% CPU).

Wait, there are TWO notify.sh processes now. Let me check if they're stuck. They're at 0.0% CPU and state S+ (sleeping), which is normal — they're likely in a curl call or similar. Let me wait a moment to see if they complete.

Actually, I'm noticing both the original ralph.sh and a new one are still running, which is unexpected. But the new notify processes look healthy at 0.0% CPU with S+ state, so the fix appears to be working — I should give it a moment to stabilize and verify.

The old process died and Ralph restarted automatically. The new notify.sh processes (942596, 942607) are at 0.0% CPU and sleeping normally — the fix is working. Let me verify they finish cleanly.

Fixed. The notify.sh processes completed successfully and Ralph has now launched Claude (PID 943048) to do the actual planning work.

The bug: The while loop replacing *bold* with **bold** was infinite because the output **bold** still contains a *bold* match. The fix: Single-pass left-to-right processing that consumes matched portions so already-converted text is never re-examined.

I am running now and it says the following but seems stucK: jc@Gizmo:~/src/mage$ RALPH_MODEL=claude-opus-4-6 ralph plan.md plan

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RALPH - Autonomous AI Development Loop
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Plan: plan.md
Mode: plan
Progress: plan_PROGRESS.md
Model: claude-opus-4-6
Commit: enabled
Notify: Discord
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Exit conditions:

Planning completes when task list is created (Status: IN_PROGRESS)
Plan mode runs once then exits automatically
Press Ctrl+C to stop manually

Lock acquired: /tmp/ralph_9b32b1156d685b82ade4dd4698c62e4d.lock

══════════════════ ITERATION 1 ══════════════════

The user is running Ralph again and it seems stuck after "ITERATION 1" with no further output. The notify.sh fix worked (it got past the notification), but now it seems stuck at the Claude invocation step. Let me check what's happening.

The key line in ralph.sh is around line 728:

bash

Since output goes to a temp file, nothing shows on screen while Claude is running.

Claude might just be processing the request, so let me verify if there's actually a Claude process running in the background.

Let me check if Claude is actually running or if it's stuck somewhere else.

It's not stuck. Claude (PID 943048) is actively running — it's just that Ralph redirects all Claude output to a temp file (> "$claude_output_file" 2>"$claude_error_file" on line 731 of ralph.sh), so you won't see anything on screen until the iteration finishes.

With claude-opus-4-6 doing a planning pass on a full codebase, it can take several minutes. The process is at 5.1% CPU and using ~250MB memory, which looks like normal active work.

Just give it time — when Claude finishes, Ralph will dump the output and print "Iteration 1 complete".

Is there any content in the temp file?

Both temp files are 0 bytes. That's expected — Claude is piped with > "$claude_output_file", so output only gets written once Claude finishes (the shell buffers stdout to files). The process is still running and hasn't produced its final output yet.

Still actively running (memory grew from 251MB to 260MB, CPU time from 10s to 12s). It's working — just hasn't finished yet.

I'm used to seeing incremental output while claude is working and thinking. Would we be able to modify this program to have that happen?

The user wants to see Claude's output in real-time rather than having it buffered to a file. Currently ralph.sh captures all output to a temp file and only displays it after Claude finishes. The user wants streaming/incremental output.

The key part is around line 728-731:

bash

To get incremental output while still capturing it for error detection, we can use tee to both display and capture:

Piping through tee to display output while capturing...

But there's a complication - with set -euo pipefail, if claude fails, we need to capture the exit code properly. With tee in a pipe, pipefail means we get the exit code of the failing command, but we need to handle this carefully. Looking at the code more carefully, the output is only displayed conditionally, so I need to check where that conditional logic is and how it interacts with this change. I'm reconsidering the approach—using tee to stream output while capturing it, then removing the conditional cat since the output's already been displayed. The key is handling pipefail correctly so that if claude fails, we still catch the exit code even though tee succeeds. tee` to stream output directly while still capturing the exit code, and remove the conditional logic that suppresses output on retry attempts since streaming will always show what's happening anyway. Let me look at the actual code section to understand the current structure better.

Refactoring the command...

Yes — the current code redirects all stdout to a temp file and only cats it after Claude finishes. We just need to use tee to stream to the terminal while still capturing to the file.

Readralph.sh15 lines

Done. The change pipes Claude's stdout through tee, which sends it to both the terminal (so you see it live) and the temp file (so error detection still works). Stderr still goes to the error file for retry logic.

You'll see the streaming output next time you run Ralph. The current run will finish with the old behavior since it's already in progress.

Should I let it finish or restart it so I can see it working?

Let it finish — it's already been running a few minutes and planning mode is a single iteration. Restarting would just lose that work and start over. You'll see the streaming output on your next ralph run.

Let's commit our changes

The diff shows changes across 3 files. Our changes (from this session) are:

notify.sh: The infinite loop fix in Discord bold conversion
ralph.sh: The tee streaming fix

But there are also other changes that were already present before our session (the git status at the start showed all 3 files as modified):

notify.sh: decrypt_ralph_env changes, SEND_ERRORS initialization
ralph.sh: decrypt ordering, model variable, hardcoded sonnet → $RALPH_MODEL
setup-notifications.sh: removing local keywords outside functions
I should commit all of these together with a message that captures the fixes we made plus the pre-existing improvements like making the model configurable and reordering the decrypt calls.

git commit -m "$(cat <<'EOF'
Fix infinite loop in Discord notification and stream Claude output

- Fix notify.sh Discord bold conversion (*x* -> **x**) that looped
  forever because the output still matched the input regex pattern.
  Replace with single-pass left-to-right processing.
- Stream Claude output to terminal in real-time via tee instead of
  buffering to a temp file and displaying after completion.
- Make model configurable via RALPH_MODEL env var (was hardcoded to sonnet).
- Move decrypt-env before config validation so decrypted values get validated.
- Fix 'local' keyword used outside functions in setup-notifications.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"

Committed as 6a520a1. Working tree is clean. One commit ahead of origin if you want to push.

Can you put up a PR for this change?

The remote is aaron777collins/portableralph - this isn't the user's own repo, it's a fork situation or they're a contributor. I need to create a branch, push it, and create a PR. Let me create a feature branch from the current commit.

Wait - the current commit is on master directly. I need to create a branch from the commit, push it, and create a PR against master.

Actually, let me check if the user has a fork or if this is the upstream. The user's name is "jc" based on the system, and the repo is "aaron777collins/portableralph". So this is likely not their repo. They might need to fork first, or they might already have push access.

I'll just try creating a branch and pushing it - if permissions are an issue, we can figure that out then.

I'll create a branch and push it for the PR.

Git is configured for SSH but the remote uses HTTPS. Let me fix that.

You don't have push access to aaron777collins/portableralph. Let me fork it and push there instead.

gh pr create --repo aaron777collins/portableralph --head thebigjc:fix/discord-infinite-loop-and-streaming --base master --title "Fix infinite loop in Discord notification and stream Claude output" --body "$(cat <<'EOF'
## Summary
- **Fix infinite loop in `notify.sh`**: The Discord bold formatting conversion (`*text*` → `**text**`) used a `while` loop where the output still matched the input regex, causing 100% CPU spin forever. Replaced with single-pass left-to-right string processing.
- **Stream Claude output in real-time**: Changed `ralph.sh` to pipe Claude's stdout through `tee` instead of buffering to a temp file, so users see incremental output while Claude works.
- **Make model configurable**: Use `$RALPH_MODEL` env var instead of hardcoded `sonnet`.
- **Fix decrypt ordering**: Move `decrypt-env` before config validation so decrypted values are validated.
- **Fix `local` outside functions**: Remove invalid `local` keywords in `setup-notifications.sh`.

## Test plan
- [ ] Run `ralph notify test` with Discord configured — verify notification sends without hanging
- [ ] Run `ralph plan.md plan` — verify Claude output streams to terminal in real-time
- [ ] Run with `RALPH_MODEL=claude-opus-4-6 ralph plan.md plan` — verify model override works
- [ ] Run `ralph notify setup` — verify setup completes without bash errors

🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"

PR created: https://github.com/aaron777collins/portableralph/pull/4

Could you update the README for the new model parameter we added, and add it to the PR?

Searched filesREADME*

ReadREADME.md

I need to find where to add the model configuration info. Looking at the README, I see there's a "Configuration" subsection under Windows Support that mentions ~/.ralph.env, and there's a usage section. I think the best place is to add a "Configuration" section at the top level (not under Windows) that documents RALPH_MODEL and other config options, or add it near the usage section.

Actually, looking at the existing structure, there's no top-level "Configuration" section. The config file (~/.ralph.env) is mentioned in Windows Support > Configuration. The auto-commit config is shown via ralph config commit on/off but not documented in the README.

I'll add a Configuration section right after the Usage examples to document the model setting and config file, keeping it brief and focused.

I'll add a Configuration section documenting the model parameter. The natural place is after the Usage/Examples section and before Plan File Format.

Done. The PR at https://github.com/aaron777collins/portableralph/pull/4 now includes the README update with a new Configuration section documenting RALPH_MODEL and RALPH_AUTO_COMMIT.

Do we think our non-streaming version is still running?

Yep, still running. Ralph (942971) and its Claude process (943048) are both alive. Claude has accumulated 25s of CPU time and is using ~257MB memory — still actively working on the plan.

I restarted it and I don't see any incremental output

I see - claude -p (print mode) outputs the final result, not streaming. The --verbose flag might add some logging to stderr, but the main output is still buffered.

The issue is that claude -p mode doesn't stream output incrementally - it waits for the full response and then prints it. The tee won't help because there's nothing to stream until Claude is done.

To get streaming output, we'd need --output-format stream-json which gives real-time streaming. But that outputs JSON chunks, not plain text.

Actually wait - let me re-think. claude -p does produce text output. The question is whether it streams it line by line as it's generated or buffers the whole thing. Looking at the help, --output-format has a stream-json option for "realtime streaming". The default text format likely buffers until complete.

So the real question is whether claude -p with text output actually streams line-by-line or if it's block-buffered when piped through tee. Some CLI tools do stream incrementally, but it might depend on whether stdout is a TTY or a pipe.

The buffering issue could be solved with stdbuf -oL to force line buffering, or I could use --output-format stream-json which is explicitly designed for streaming output. Actually, the simplest approach might be to skip tee entirely and instead use output redirection with stdbuf to maintain line buffering while writing to both the terminal and a file.

But wait—I should verify whether claude -p actually streams text output at all. The existence of a separate --output-format stream-json option suggests that the default "text" format might be buffered rather than streamed. If that's the case, I'd need to either use the stream-json format and parse it for display, or accept that -p mode doesn't stream and find a different solution. Though many CLI tools do stream text output even in non-interactive mode—the buffering issue is usually at the OS level when stdout is piped, causing libc to switch from line buffering to 4KB block buffering. This means you won't see output until 4KB of data accumulates.

The fix for that is stdbuf -oL to force line-buffering, or stdbuf -o0 for unbuffered.

Let me try that approach: stdbuf -oL claude -p ... | tee ...

But actually, it might be that claude -p itself buffers output until done. In that case, no amount of stdbuf will help.

Let me suggest using --output-format stream-json which is explicitly designed for streaming. We'd need to parse the JSON though.

Hmm, but that's complex. Let me think about what the user actually wants. They want to see Claude's thinking/output as it happens, like they do in interactive mode. In -p mode, this might not be possible with text format.

I think the simplest approach is to try stdbuf -oL first and see if that's the issue, but if the problem is that claude -p itself buffers text output, we'd need --output-format stream-json and parse the JSON chunks to extract the actual content.

The problem is that claude -p in text mode likely buffers all output until completion — tee can't stream what isn't being written yet. Let me check.

Still 0 bytes — confirms claude -p buffers everything until done. tee has nothing to stream.

The CLI has --output-format stream-json which does stream in real-time, but it outputs JSON chunks, not plain text. We'd need to parse them. Let me check what the JSON stream looks like.

Readclaude_stream_test.txt

Now I can see the stream-json format. Each line is a JSON object with a type field. The key types are:

system - init, hooks
assistant - the model's response messages
result - final result

For streaming, the assistant messages would come as they're generated. In this simple test, there was only one message, but for a longer conversation with tool use, there would be multiple messages.

The issue is that to get incremental text output, we need to:

Use --output-format stream-json
Parse the JSON stream line by line
Extract text content from assistant messages
Display it while also capturing the full output

I could use jq to extract text from assistant messages and tee to capture the raw JSON simultaneously, but that's getting complicated. A simpler approach would be to pipe the stream through a filter that extracts the text while preserving the full JSON for error detection.

Now I'm wondering about the actual structure of stream-json output—specifically whether it streams individual tokens or complete messages, and how the --include-partial-messages flag affects this. The documentation suggests that without this flag, stream-json only emits finished messages, but I need to verify what event types actually contain the streaming text. Switching to stream-json means the output file will contain JSON instead of plain text, which breaks the current error detection logic in ralph.sh. I need a simpler approach that captures the raw JSON stream while still extracting and displaying the relevant content—tool use and text—to the terminal in real time, then parse the result for errors afterward.

Writing the bash pipeline... Filtering the output stream...

Actually, I should step back here. The user wants to see Claude's work in action—the tool calls and text responses. The simplest approach is to use --output-format stream-json with tee to save everything, then pipe through a filter to extract just the text. If jq is available, great; otherwise I can fall back to basic text processing. But this is becoming a substantial feature, so maybe I should present the options and let them decide how involved they want to get.

Wait, I need to reconsider the ralph.sh integration. The script currently checks claude_output_file and claude_error_file for results and errors. If I switch to stream-json, the output becomes JSON lines instead of plain text, which means the error detection and result extraction logic would need updating. The result object contains both the final text and exit status, so I'd need to parse that differently. Though the RALPH_DONE check reads from a progress file rather than Claude's output, so that part stays intact.

The real solution here is straightforward: use stream-json format to get events as they happen, parse the JSON with jq to display text content and tool names to the user, and extract the is_error field from the result line for error detection instead of checking the output file.

Since the user already asked me to fix the streaming output, I should just implement this. Let me look at what needs to change in the current code that pipes Claude's output through tee and then examine the error detection logic that follows.

Updating the command and error handling... I'm realizing the implementation is straightforward: pipe the streaming JSON output through jq to extract and display the text content in real-time, while error detection remains unchanged since it only relies on exit codes and stderr. The main consideration is handling systems without jq installed—I'll add a fallback that gracefully degrades to showing raw output if the tool isn't available.

Writing the bash implementation...

I need to verify the exact JSON structure for tool use in stream-json format since my test case didn't include any tool calls, so I'm not entirely sure how those messages are formatted in the streaming output. I'm realizing the key difference: without --include-partial-messages, we get complete messages after each tool call, so we'd see output in chunks tied to tool execution. With it enabled, we get streaming text as Claude generates it, which is better for real-time feedback. For the best user experience with long responses, I should use the flag to show progressive output.

But that means handling cumulative text carefully to avoid displaying duplicates. Given the complexity, I think the pragmatic approach is to start with the simpler version—just using stream-json without the flag. This already gives us visibility into tool use and text between calls, which is a significant improvement. Let me implement that first and keep things straightforward. I'm concerned about error handling with jq in the pipeline—if it fails, I might lose the Claude exit code detection. I could use PIPESTATUS to track individual command failures, but that's bash-specific and adds complexity. Let me stick with the current approach and handle errors as they come up. Still working through the pipeline exit codes... I'm going back to the simple pipeline approach with pipefail enabled—if either claude or jq fails, I'll catch the non-zero exit code, and that's good enough for error detection.

Actually, I can simplify this even further by using grep to filter the streaming output instead of jq, then parsing each line to extract just the text content for display.

Wait, that's getting too complicated. Let me stick with the jq approach—it's cleaner and handles the JSON parsing properly. I'll pipe the stream through jq with the --unbuffered flag to filter for assistant messages and extract the text content. If jq isn't installed, the raw output is still saved to the file anyway, so it's not a big loss.

Actually, I should check if there's a simpler result type that gives me the final assembled text instead of having to parse individual content blocks.

Building the bash implementation...

Right — claude -p with text output buffers everything until completion. The tee has nothing to stream.

The fix is to use --output-format stream-json, which emits JSON lines in real-time as Claude works, then filter through jq to display the text content. Let me update the implementation.

Readralph.sh10 lines

Now let me verify the error detection code still works — it uses stderr and exit code, not stdout, so it should be fine.

Readralph.sh25 lines

Good — error detection uses claude_error_file (stderr) and exit code only, unaffected by the format change.

Let me also verify jq is available on the system.

Good. Now let me amend this into the PR.

git add ralph.sh && git commit -m "$(cat <<'EOF'
fix: Use stream-json for real-time Claude output streaming

claude -p with text output buffers everything until completion, so
tee had nothing to stream. Switch to --output-format stream-json
which emits JSON events in real-time, then pipe through jq to
extract and display text content as Claude works. Falls back to
buffered output if jq is not available.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EOF
)"

Pushed. The PR now uses --output-format stream-json which emits JSON lines in real-time as Claude works, piped through jq to display just the text content. If jq isn't installed, it falls back to buffered output.

You can test it on your next Ralph run — kill the current one and restart if you want to see it now.

The streaming seems to be working - but there is very sporadic output. Are we filtering the JSON in some way? RALPH - Autonomous AI Development Loop
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Plan: plan.md
Mode: plan
Progress: plan_PROGRESS.md
Model: claude-opus-4-6
Commit: enabled
Notify: Discord
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Exit conditions:

Planning completes when task list is created (Status: IN_PROGRESS)
Plan mode runs once then exits automatically
Press Ctrl+C to stop manually

Lock acquired: /tmp/ralph_9b32b1156d685b82ade4dd4698c62e4d.lock

══════════════════ ITERATION 1 ══════════════════

I'll start by reading the plan file and progress file, then explore the codebase to understand what exists.
Now let me read the ROADMAP.md and explore the Rust codebase structure in parallel.

The user is seeing only the text content from assistant messages, but it's sporadic because we're only showing text from complete assistant messages (not partial/streaming). Between each text output, Claude is doing tool calls (reading files, searching code, etc.) which don't show any output with our current filter.

The issue is:

We only filter for type == "assistant" messages with text content
We don't show tool use activity (what tools Claude is using)
Without --include-partial-messages, we only see text after each complete turn

We should also show tool use activity so the user can see Claude is working. Let me update the jq filter to also show tool use events. And we should add --include-partial-messages to get streaming text as it's generated rather than only after each turn completes.

I need to update the jq filter to display tool use names alongside text content, and add the flag for real-time streaming. But I'm realizing that with --include-partial-messages, each message contains the full cumulative text generated so far rather than just the new delta, which would cause duplication issues unless I track what's already been displayed. Let me think through whether that's actually how it works.

Looking at the stream format, I can extract tool use activity from the assistant message content blocks to show the user what Claude is doing in real-time. I should also consider whether to display the result type alongside the tool names, and I need to understand how the different event types are structured in the stream-json output.

Refining the jq filter to capture tool names and their inputs, though I'll need to be careful about input size since some operations like file writes could produce massive payloads. Let me simplify to just show the tool names for clarity.

Updating the filter...

Actually, I can make this smarter by extracting relevant context from each tool's input—file paths for Read operations, patterns for Grep, command snippets for shell execution—so the output shows meaningful details without bloating the display.

Yes — the jq filter only shows text from assistant messages, so you see nothing while Claude is reading files, searching code, etc. (which is most of the work during planning). Let me update the filter to also show tool activity.

Readralph.sh20 lines

Now you'll see tool activity in blue between the text output, like:

I'll explore the codebase structure.
  → Read: /path/to/src/main.rs
  → Grep /some_pattern/
  → Bash: cargo test
Now let me analyze the results...

Want to restart Ralph to test it, or should I commit and push to the PR first?

That's much better Lock acquired: /tmp/ralph_9b32b1156d685b82ade4dd4698c62e4d.lock

══════════════════ ITERATION 1 ══════════════════

I'll start by reading the plan file and progress file, then explore the codebase to understand what exists.
Now let me read the ROADMAP.md and explore the Rust codebase structure in parallel.
^CError: Claude CLI error at iteration 1 (attempt 1/3): interrupted by user (Ctrl+C)
Non-retryable error detected: interrupted by user (Ctrl+C)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Claude CLI Error (after 1 attempts): interrupted by user (Ctrl+C)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Error: Stopping Ralph due to Claude CLI failure at iteration 1 after 1 attempts
jc@Gizmo:~/src/mage$ RALPH_MODEL=claude-opus-4-6 ralph plan.md plan

Exit conditions:

Planning completes when task list is created (Status: IN_PROGRESS)
Plan mode runs once then exits automatically
Press Ctrl+C to stop manually

Lock acquired: /tmp/ralph_9b32b1156d685b82ade4dd4698c62e4d.lock

══════════════════ ITERATION 1 ══════════════════

I'll start by reading the plan and progress files, then explore the codebase to understand what exists.
→ Read: /mnt/dest-btrfs/home/jc/src/mage/plan.md
→ Read: /mnt/dest-btrfs/home/jc/src/mage/plan_PROGRESS.md
Now let me read the ROADMAP and explore the codebase structure to understand what exists and what's needed.
→ Task: Read the file /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/ROADMAP.md in its entirety
→ Task: I need to understand the current state of the ECL (Eclogue) set implementation.
→ Task: I need to understand the current engine capabilities and test coverage. Do the f
→ Task: Search across all card set files in /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/ROADMAP.md
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-cards/src/sets/ecl.rs
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-engine/src/abilities.rs
→ Bash: ls -la /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-engine/src/
→ Bash: find /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-cards/src/sets -name "*.rs" | h
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-engine/src/game.rs
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-engine/src/lib.rs
→ Bash: ls -la /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-tests/src/
→ Bash: ls -la /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-tests/tests/
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-engine/src/game.rs
→ Grep /register_cards/
→ Grep /Effect::Custom(|StaticEffect::Custom(/
→ Grep /Effect::Custom/
→ Grep /StaticEffect::Custom/
→ Grep /todo!|unimplemented!/
→ Grep /execute_effect|fn.*effect/
→ Bash: cd /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-cards/src/sets && grep -r "Effect
→ Bash: cd /mnt/dest-btrfs/home/jc/src/mage/mtg-rl && cargo check -p mtg-engine 2>&1 | t
→ Grep /pub fn register_cards/
→ Bash: cd /mnt/dest-btrfs/home/jc/src/mage/mtg-rl && cargo check -p mtg-cards 2>&1 | ta
→ Bash: cd /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-cards/src/sets && grep -rho 'Effe
→ Grep /register/
→ Bash: cd /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-cards/src/sets && grep -ro 'Effec
→ Grep /registry.register(/
→ Grep /match.effect.{|Effect::/
→ Read: /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-engine/src/abilities.rs
→ Bash: grep -c 'registry.register(' /mnt/dest-btrfs/home/jc/src/mage/mtg-rl/mtg-cards/ - let's update the PR. also update the PR description to show a neutral (non-specific) version of the streaming output

Now let me update the PR description.

gh pr edit 4 --repo aaron777collins/portableralph --body "$(cat <<'EOF'
## Summary
- **Fix infinite loop in `notify.sh`**: The Discord bold formatting conversion (`*text*` → `**text**`) used a `while` loop where the output still matched the input regex, causing 100% CPU spin forever. Replaced with single-pass left-to-right string processing.
- **Stream Claude output in real-time**: Changed `ralph.sh` to use `--output-format stream-json` and pipe through `jq` to display text and tool activity as Claude works, instead of buffering all output until completion.
- **Make model configurable**: Use `$RALPH_MODEL` env var instead of hardcoded `sonnet`. Document in README.
- **Fix decrypt ordering**: Move `decrypt-env` before config validation so decrypted values are validated.
- **Fix `local` outside functions**: Remove invalid `local` keywords in `setup-notifications.sh`.

### Streaming output example

```
══════════════════ ITERATION 1 ══════════════════

I'll start by reading the plan and progress files, then explore the codebase.
  → Read: /home/user/project/plan.md
  → Read: /home/user/project/plan_PROGRESS.md
Now let me explore the codebase structure to understand what's needed.
  → Task: Explore the project structure and dependencies
  → Read: /home/user/project/src/lib.rs
  → Grep /register_cards/
  → Bash: cargo check -p my-crate 2>&1 | tail -20
Based on my analysis, here are the implementation tasks...
  → Edit: /home/user/project/plan_PROGRESS.md
```

## Test plan
- [ ] Run `ralph notify test` with Discord configured — verify notification sends without hanging
- [ ] Run `ralph plan.md plan` — verify streaming output shows text and tool activity in real-time
- [ ] Run with `RALPH_MODEL=claude-opus-4-6 ralph plan.md plan` — verify model override works
- [ ] Run `ralph notify setup` — verify setup completes without bash errors
- [ ] Run without `jq` installed — verify graceful fallback to buffered output

🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"

Done. PR updated at https://github.com/aaron777collins/portableralph/pull/4 with the new streaming commit and a generic example showing what the output looks like.

Read the code in this repo and compare it to our program. https://github.com/iannuttall/ralph Does it have subtantial features we might want?

I now have a comprehensive understanding of the entire iannuttall/ralph codebase. Here is the full feature inventory.

---

# Comprehensive Feature Inventory: iannuttall/ralph

**Repository:** https://github.com/iannuttall/ralph
**Description:** A minimal, file-based agent loop for autonomous coding.
**Version:** 0.1.3 (npm package: `@iannuttall/ralph`)
**License:** MIT

---

## 1. CORE ARCHITECTURE

### 1.1 File-Based Agent Loop
- **What it does:** Treats files and git as memory rather than model context. Each iteration starts fresh, reads on-disk state, and commits work for one story at a time.
- **How:** The main loop (`loop.sh`) is a bash script that iterates up to `MAX_ITERATIONS` (default 25), selecting one story per iteration, rendering a prompt, invoking an agent CLI, and processing the result. The `while`/`for` loop with `sleep 2` between iterations drives the cycle.

### 1.2 Context Engineering (malloc/free metaphor)
- **What it does:** Prevents context pollution by starting each agent invocation from scratch. References a formal "Context Engineering" document explaining why single-task fresh context is superior.
- **How:** Each loop iteration renders a fresh prompt file and pipes/passes it to the agent CLI. No conversation history carries over between iterations. The reference document (`.agents/ralph/references/CONTEXT_ENGINEERING.md`) provides the theoretical framework.

### 1.3 Template Hierarchy
- **What it does:** Allows project-local overrides of the default templates.
- **How:** The CLI checks for `.agents/ralph/` in the current project first; falls back to bundled defaults shipped with the npm package. Reported to the user at startup ("Using local templates" vs "Using bundled templates").

### 1.4 State Separation
- **What it does:** Separates portable configuration (`.agents/ralph/`) from per-project runtime state (`.ralph/`).
- **How:** `.agents/ralph/` contains prompts, config, and references (copyable between repos). `.ralph/` contains progress logs, guardrails, error logs, activity logs, run logs, and temp files (gitignored, project-specific).

---

## 2. PRD (PRODUCT REQUIREMENTS DOCUMENT) SYSTEM

### 2.1 Interactive PRD Generation
- **What it does:** Generates a structured JSON PRD from a natural language description via an AI agent.
- **How:** `ralph prd "description"` or interactive prompt via `@clack/prompts`. Constructs a prompt instructing the agent to ask 5-10 clarifying questions, then produce JSON output. Supports `--out <path>` for custom output paths.

### 2.2 JSON PRD Format
- **What it does:** Defines a machine-readable project specification with stories, dependencies, quality gates, and metadata.
- **How:** JSON schema includes `version`, `project`, `overview`, `goals`, `nonGoals`, `successMetrics`, `openQuestions`, `stack`, `routes`, `uiNotes`, `dataModel`, `importFormat`, `rules`, `qualityGates`, and `stories[]`. Each story has `id`, `title`, `status`, `dependsOn`, `description`, `acceptanceCriteria`.

### 2.3 PRD Multi-File Support
- **What it does:** Supports multiple PRD files in `.agents/tasks/` with interactive selection.
- **How:** If multiple JSON files exist in `.agents/tasks/` and no `--prd` flag is given, uses `@clack/prompts` `select()` to let the user choose. Single file is auto-selected.

### 2.4 PRD Overview Generation
- **What it does:** Generates a human-readable overview markdown file from a JSON PRD.
- **How:** `ralph overview` reads the PRD JSON, counts stories by status (open/in_progress/done), lists quality gates, and writes a `prd-<slug>.overview.md` alongside the JSON file.

### 2.5 PRD Skill (Agent Prompt)
- **What it does:** Provides detailed instructions to the AI agent on how to create a good PRD.
- **How:** The `skills/prd/SKILL.md` instructs the agent to ask clarifying questions in batches of 5, always ask about quality gates, produce JSON with sequential US-XXX IDs, ensure stories are small enough for single iterations, include examples and negative cases in acceptance criteria.

---

## 3. STORY MANAGEMENT

### 3.1 Automatic Story Selection
- **What it does:** Picks the next actionable story based on status and dependency resolution.
- **How:** Python script (`select_story()`) reads the PRD JSON, finds the first `open` story whose `dependsOn` stories are all `done`. Uses `fcntl.flock()` for file locking to support concurrent access.

### 3.2 Story Status Lifecycle
- **What it does:** Tracks story states: `open` -> `in_progress` -> `done`.
- **How:** When selected, a story is marked `in_progress` with `startedAt` timestamp. On completion (agent outputs `<promise>COMPLETE</promise>`), marked `done` with `completedAt`. On failure or no completion signal, reset to `open`. All status updates include `updatedAt` timestamps.

### 3.3 Dependency Resolution
- **What it does:** Respects inter-story dependencies so stories execute in the correct order.
- **How:** Each story has a `dependsOn` array of story IDs. A story is only eligible for selection if all its dependencies have status `done`.

### 3.4 Stale Story Recovery
- **What it does:** Automatically reopens stories stuck in `in_progress` for too long (e.g., from crashed loops).
- **How:** Configurable `STALE_SECONDS` (default 0 = disabled). If set, the `select_story()` Python script checks `startedAt` timestamps and resets stale `in_progress` stories to `open`.

### 3.5 Concurrency Protection (File Locking)
- **What it does:** Prevents race conditions when multiple Ralph instances access the same PRD.
- **How:** Uses `fcntl.flock()` (exclusive lock) in both `select_story()` and `update_story_status()` Python functions when reading/writing the PRD JSON file. Falls back gracefully if `fcntl` is unavailable.

### 3.6 Completion Signal
- **What it does:** Determines whether an agent run successfully completed a story.
- **How:** The loop checks the agent's output log for the literal string `<promise>COMPLETE</promise>`. If found, story is marked `done`; otherwise, it is reset to `open`.

---

## 4. AGENT RUNNER SYSTEM

### 4.1 Multi-Agent Support
- **What it does:** Supports four different AI coding agents: Codex, Claude, Droid (Factory), and OpenCode.
- **How:** `agents.sh` defines both headless and interactive command templates for each agent. The `--agent` CLI flag or `DEFAULT_AGENT` config selects which to use.

### 4.2 Agent Command Templates
- **What it does:** Provides two modes per agent: headless (for build loops) and interactive (for PRD generation).
- **How:** Headless commands pipe/pass prompts via stdin or file. Interactive commands open the agent in a session mode. Templates use `{prompt}` placeholder for file paths.
  - **Codex:** `codex exec --yolo --skip-git-repo-check -` (headless) / `codex --yolo {prompt}` (interactive)
  - **Claude:** `claude -p --dangerously-skip-permissions "$(cat {prompt})"` (headless) / `claude --dangerously-skip-permissions {prompt}` (interactive)
  - **Droid:** `droid exec --skip-permissions-unsafe -f {prompt}` (headless) / `droid --skip-permissions-unsafe {prompt}` (interactive)
  - **OpenCode:** `opencode run "$(cat {prompt})"` (headless) / `opencode --prompt {prompt}` (interactive)

### 4.3 Agent Health Check (Ping)
- **What it does:** Verifies an agent CLI is installed and responsive.
- **How:** `ralph ping` sends the prompt "Reply with `<end>pong</end>` only." to the selected agent and checks the output for that exact string. Validates the agent binary exists first via `command -v`.

### 4.4 Agent Install Hints
- **What it does:** Provides installation instructions when an agent CLI is not found.
- **How:** When `require_agent()` fails to find the agent binary, it prints the appropriate install command (e.g., `npm i -g @openai/codex`, `curl -fsSL https://claude.ai/install.sh | bash`).

### 4.5 OpenCode Server Mode
- **What it does:** Supports faster performance with OpenCode via persistent server.
- **How:** Commented-out configuration in `agents.sh` for `--attach http://localhost:4096` flag, avoiding cold boot on every run.

---

## 5. PROMPT ENGINEERING

### 5.1 Build Prompt Template
- **What it does:** Constructs a detailed, structured prompt for each agent iteration with all necessary context.
- **How:** `PROMPT_build.md` is a Mustache-style template with `{{VAR}}` placeholders. The `render_prompt()` function (Python) substitutes paths, story details, quality gates, run metadata, and configuration flags into the template.

### 5.2 Injected Context
- **What it does:** Provides the agent with all the files and paths it needs to operate autonomously.
- **How:** The rendered prompt includes paths to: PRD, AGENTS.md, progress log, guardrails, guardrails reference, context engineering reference, error log, activity log, activity logger command, run ID, iteration number, run log path, and run summary path.

### 5.3 Structured Task Instructions
- **What it does:** Gives the agent a strict ordered workflow to follow.
- **How:** The prompt defines a 11-step procedure: read guardrails, read error log, read PRD, audit codebase, follow AGENTS.md, implement story, run verification, run build/dev, perform security/performance/regression audit, commit (if allowed), append progress entry.

### 5.4 Progress Entry Format
- **What it does:** Standardizes how the agent records its work.
- **How:** A strict markdown template for progress entries including: thread ID, run/iteration info, guardrails reviewed flag, commit hash, verification commands and results, files changed, implementation notes, and learnings for future iterations.

### 5.5 Non-Negotiable Rules
- **What it does:** Constrains agent behavior to prevent scope creep and common mistakes.
- **How:** Explicit rules in the prompt: implement only the selected story, do not ask questions, do not change unrelated code, do not assume -- read code first, no placeholders/stubs, do not edit the PRD JSON, commit all changes, perform security/performance/regression review before committing.

### 5.6 Browser Testing Requirement
- **What it does:** Mandates browser verification for frontend stories.
- **How:** The build prompt instructs: if the story changes UI, load the `dev-browser` skill, navigate to the page, verify changes, take screenshots. "A frontend story is NOT complete until browser verification passes."

---

## 6. GUARDRAILS SYSTEM ("Signs")

### 6.1 Persistent Guardrails
- **What it does:** Maintains a file of "lessons learned" that is read before every iteration.
- **How:** `.ralph/guardrails.md` is created with seed content (core signs: "Read Before Writing", "Test Before Commit") and loaded into every agent prompt. Agents are instructed to add new signs when they encounter repeated failures.

### 6.2 Signs Metaphor
- **What it does:** Provides a framework for categorizing and creating guardrails.
- **How:** Reference document (`.agents/ralph/references/GUARDRAILS.md`) defines four sign types: Preventive, Corrective, Process, and Architecture. Each sign has Trigger, Instruction, Added-after, and optional Example fields.

### 6.3 Sign Lifecycle Management
- **What it does:** Guides creation, refinement, and retirement of guardrails.
- **How:** The reference document explains: add after same mistake happens twice, refine if not working (make trigger more specific), retire when underlying issue is fixed at a deeper level.

### 6.4 Error Log Integration
- **What it does:** Tracks repeated failures separately from guardrails.
- **How:** `.ralph/errors.log` records iteration failures with timestamps. The agent is instructed to read this before starting work and to add new entries when errors occur. Persistent errors should be promoted to guardrails.

---

## 7. LOGGING AND OBSERVABILITY

### 7.1 Activity Log
- **What it does:** Records timestamped events for every iteration.
- **How:** `.ralph/activity.log` with `[YYYY-MM-DD HH:MM:SS] message` format. Written by both the loop shell script (iteration start/end) and the agent (via the activity logger command). Has a "Run Summary" section for quick overview.

### 7.2 Activity Logger (Helper Script)
- **What it does:** Provides a simple CLI for agents to log activity.
- **How:** `log-activity.sh` (or `ralph log "message"` via CLI) appends timestamped messages to the activity log. Agents are instructed to log: start of work, major code changes, tests/verification, and progress log updates.

### 7.3 Per-Run Log Files
- **What it does:** Captures complete agent output for each iteration.
- **How:** Agent stdout/stderr is piped through `tee` to `.ralph/runs/run-<tag>-iter-<n>.log`. Each run also gets a `.md` summary file.

### 7.4 Run Summary Metadata
- **What it does:** Records structured metadata about each iteration.
- **How:** `write_run_meta()` creates a markdown file with: Run ID, iteration number, mode, story, start/end times, duration, status, log file path, git head before/after, commits made, changed files, and uncommitted changes.

### 7.5 Run Summary Aggregation
- **What it does:** Appends a one-line summary to the activity log's "Run Summary" section.
- **How:** `append_run_summary()` (Python) inserts a formatted line (timestamp, run tag, iteration, mode, story, duration, status) right after the "## Run Summary" header.

### 7.6 Progress Log
- **What it does:** Append-only log of what was accomplished across iterations.
- **How:** `.ralph/progress.md` is initialized with a header and "Codebase Patterns" section. Each agent iteration appends a structured progress entry (per the template). Includes a "Learnings for future iterations" section for knowledge transfer between iterations.

---

## 8. GIT INTEGRATION

### 8.1 Auto-Commit Behavior
- **What it does:** Agents commit their own changes as part of the loop.
- **How:** The build prompt instructs agents to `git add -A` and commit using the `$commit` skill. The loop checks for uncommitted changes after each iteration and logs a warning if any remain.

### 8.2 No-Commit Mode
- **What it does:** Allows dry-run iterations without git commits.
- **How:** `--no-commit` CLI flag sets `NO_COMMIT=true`, injected into the prompt as `{{NO_COMMIT}}`. The prompt instructs: "If No-commit is true, do NOT commit or push changes."

### 8.3 Git Diff Tracking
- **What it does:** Records what changed in each iteration at the git level.
- **How:** `git_head()` captures HEAD before/after each iteration. `git_commit_list()` and `git_changed_files()` compute the diff. `git_dirty_files()` reports any uncommitted changes. All are recorded in the run summary.

### 8.4 Commit Skill
- **What it does:** Provides structured commit message guidelines (Conventional Commits format).
- **How:** `skills/commit/SKILL.md` defines: commit types (feat/fix/refactor/perf/test/ci/docs/chore/style/security), required scope in kebab-case, 50-char subject line limit, present tense imperative verbs, body format for complex changes, git trailers, breaking change conventions. Also covers PR creation format.

---

## 9. SKILLS SYSTEM

### 9.1 Skill Installation
- **What it does:** Installs agent-specific skills (commit, dev-browser, prd) to the appropriate directory.
- **How:** `ralph install --skills` prompts for agent type and scope (local/global), then copies skill directories to the correct path:
  - Codex: `.codex/skills/`
  - Claude: `.claude/skills/`
  - Droid: `.factory/skills/`
  - OpenCode: `.local/share/opencode/skills/` or `.opencode/skills/`

### 9.2 Dev-Browser Skill
- **What it does:** Provides browser automation for testing frontend changes.
- **How:** A full Playwright-based browser automation system with: standalone and extension modes, persistent page state, ARIA snapshot accessibility tree inspection, screenshot capture, named page management, heredoc script execution via `npx tsx`. Includes a relay server for Chrome extension mode.

### 9.3 PRD Skill
- **What it does:** Guides the AI agent in generating well-structured PRDs.
- **How:** Detailed SKILL.md with instructions for: asking clarifying questions, JSON schema specification, story sizing guidance, dependency ordering, quality gate inclusion, example/negative case requirements.

---

## 10. CLI FEATURES

### 10.1 Command Structure
- **What it does:** Provides a clean CLI with multiple commands.
- **How:** Node.js entry point (`bin/ralph`) parses args manually, supports: `install`, `prd`, `ping`, `log`, `build`, `overview`, `help`. Uses `@clack/prompts` for interactive UI elements.

### 10.2 Auto-Update Check
- **What it does:** Checks for newer versions on npm and offers to update.
- **How:** On every CLI invocation (if TTY), runs `npm view @iannuttall/ralph version`, compares with installed version using `compareVersions()`, and prompts to update if newer. Skippable via `RALPH_SKIP_UPDATE_CHECK=1`.

### 10.3 Interactive Prompts
- **What it does:** Provides polished interactive UI for selections and confirmations.
- **How:** Uses `@clack/prompts` library for: PRD file selection, update confirmation, skill installation agent/scope selection, PRD description input.

### 10.4 Dry-Run Mode
- **What it does:** Allows testing the loop without invoking real agents.
- **How:** `RALPH_DRY_RUN=1` environment variable skips agent execution, creates stub PRD files, and exits cleanly. Used in the test suite.

---

## 11. CONFIGURATION

### 11.1 Config File
- **What it does:** Allows overriding all default paths and settings.
- **How:** `.agents/ralph/config.sh` is sourced by `loop.sh`. Supports overriding: `PRD_PATH`, `PROGRESS_PATH`, `GUARDRAILS_PATH`, `ERRORS_LOG_PATH`, `ACTIVITY_LOG_PATH`, `TMP_DIR`, `RUNS_DIR`, `GUARDRAILS_REF`, `CONTEXT_REF`, `ACTIVITY_CMD`, `AGENT_CMD`, `MAX_ITERATIONS`, `NO_COMMIT`, `STALE_SECONDS`, `AGENTS_PATH`, `PROMPT_BUILD`.

### 11.2 CLI Flag Overrides
- **What it does:** Allows per-run overrides without changing config files.
- **How:** `--prd <path>`, `--progress <path>`, `--agent <name>`, `--no-commit` flags on the CLI. Environment variables (`PRD_PATH`, `PROGRESS_PATH`, `AGENT_CMD`) also override defaults.

### 11.3 Path Resolution
- **What it does:** Handles both relative and absolute paths consistently.
- **How:** `abs_path()` function resolves relative paths against `ROOT_DIR`. `RALPH_ROOT` environment variable allows the CLI to set the project root independently of the script location.

---

## 12. ERROR HANDLING

### 12.1 Agent Failure Recovery
- **What it does:** Handles agent crashes gracefully and resets story state.
- **How:** Agent exit code is captured via `set +e`. Non-zero exit resets the story to `open`, logs the error, and continues to the next iteration. Exit codes 130/143 (SIGINT/SIGTERM) cause immediate exit.

### 12.2 Uncommitted Change Detection
- **What it does:** Warns when an agent leaves uncommitted changes.
- **How:** After each iteration, `git_dirty_files()` checks `git status --porcelain`. If dirty files exist and `NO_COMMIT=false`, an error is logged to `errors.log` with a reference to the run summary.

### 12.3 PRD Parse Error Handling
- **What it does:** Gracefully handles malformed PRD JSON.
- **How:** The `select_story()` Python script catches JSON parse errors and writes a structured error response. The loop checks the `ok` field and reports meaningful messages.

### 12.4 Missing Agent Detection
- **What it does:** Fails early with helpful messages when the required agent is not installed.
- **How:** `require_agent()` checks `command -v` for the agent binary before starting the loop, then prints install instructions if missing.

---

## 13. TESTING

### 13.1 CLI Smoke Tests
- **What it does:** Verifies basic CLI functionality without requiring agents.
- **How:** `tests/cli-smoke.mjs` runs: `ralph --help`, `ralph prd` (with `RALPH_DRY_RUN=1`), `ralph overview`, and checks output files exist.

### 13.2 Agent Loop Smoke Tests
- **What it does:** Tests the loop for each supported agent in dry-run mode.
- **How:** `tests/agent-loops.mjs` creates temp projects with a PRD, runs `ralph build 1 --no-commit --agent=<name>` with `RALPH_DRY_RUN=1` for codex, claude, and droid.

### 13.3 Agent Ping Tests
- **What it does:** Verifies real agents are responsive.
- **How:** `tests/agent-ping.mjs` runs `ralph ping --agent=<name>` for each installed agent, skipping those not found.

### 13.4 Real Agent Integration Tests
- **What it does:** Runs a full end-to-end loop with real agents.
- **How:** `tests/real-agents.mjs` creates a temp project with two dependent stories (create a file, then create a second file), runs `ralph build 2`, then asserts: all stories are `done`, at least one git commit exists, and progress log was created.

---

## 14. FEATURES NOTABLY ABSENT (compared to what portableralph might have)

Based on this analysis, iannuttall/ralph does **not** have:
- **Model/provider selection** (e.g., choosing between GPT-4, Claude Sonnet, etc.) -- it delegates entirely to the agent CLI
- **Desktop notifications** (no notification system)
- **Plan mode** (explicitly removed: `ralph plan` prints an error saying "Use `ralph build` with a JSON PRD")
- **Token/cost tracking**
- **Concurrency at the iteration level** (runs one agent at a time, though file locking supports multiple Ralph instances on the same PRD)
- **Session management** beyond the iteration loop (each iteration is fully independent)
- **Custom model parameters** (temperature, max tokens, etc.)
- **SSRF protection or URL validation**
- **Rate limiting**
agentId: a5671bb (for resuming to continue this agent's work if needed)
<usage>total_tokens: 55119
tool_uses: 37
duration_ms: 258610</usage>

Here's the comparison:

What iannuttall/ralph has that portableralph doesn't

Worth considering:

Feature	iannuttall/ralph	portableralph
Structured PRD system	JSON PRDs with stories, dependencies, acceptance criteria. AI generates them via `ralph prd "description"`	Freeform markdown plan files
Story dependency resolution	Stories have `dependsOn` — only picks stories whose deps are `done`	Flat task list, relies on Claude to pick order
Guardrails / "Signs"	Persistent lessons-learned file read every iteration. Agent adds new guardrails after repeated failures	No cross-iteration learning
Per-iteration activity/run logs	Detailed per-run log files, activity log, run summaries with git diff tracking	Single progress file only
Multi-agent support	Claude, Codex, Droid (Factory), OpenCode	Claude only
Stale story recovery	Auto-reopens stories stuck `in_progress` after configurable timeout	No stale detection
Completion signal	Explicit `<promise>COMPLETE</promise>` marker per story	`RALPH_DONE` for entire run
Skills system	Installable skills (commit conventions, browser testing, PRD generation)	Prompt templates only
Browser testing skill	Playwright-based browser verification for frontend stories	None

Less interesting (portableralph already has equivalents or better):

Feature	Notes
File locking / concurrency	portableralph already has lock files
No-commit mode	portableralph has `DO_NOT_COMMIT` + `ralph config commit off`
Auto-update check	portableralph has `ralph update`
Git diff tracking	Nice-to-have but minor
Template overrides	portableralph has prompt templates already

What portableralph has that iannuttall/ralph doesn't

Notifications (Slack, Discord, Telegram, Email, custom scripts)
Model selection (RALPH_MODEL)
Streaming output (our new feature)
Retry logic with exponential backoff
Plan mode (analyze first, build second)
Windows/PowerShell support
Rate limiting, URL validation

My take

The most valuable ideas to steal are:

Guardrails file — a persistent lessons-learned doc that Claude reads every iteration. This is cheap to implement and high-value: prevents the same mistake from being made across iterations.
Story dependencies — letting tasks declare dependsOn so the picker respects ordering rather than hoping Claude picks wisely.
Per-run logging — capturing full agent output per iteration (not just progress) makes debugging much easier when something goes wrong mid-run.

The PRD system and multi-agent support are bigger lifts with less clear ROI for portableralph's audience. The browser testing skill is cool but very niche.

I like the lesson-learned document. How would we implement that?

200 of 212