Post-Mortem ⏱️ 6 min read

The Conductor Died Mid-Run. The Build Shipped Anyway.

TL;DR

The task: Build a custom CLI for getgodmode.dev members. Four parallel Builders, each owning one stream.

What broke: Three Builders crossed the 80% CPU reaper threshold. The narrator session (the Claude that drives the orchestra) died mid-run. The scoreboard recorded one stream finished. The disk had four.

The lesson: Heartbeat-based liveness is a flawed signal under thermal throttling. The bookkeeping lies. The disk does not. Trust the disk.

The result: The CLI shipped. 23 files, around 2,470 lines of code, 164 of 165 tests green. Scoring punished sins not committed.

The build

The ask was simple in shape, novel in execution. Build a command-line tool the average buyer can install and use without an Anthropic API key.

That last constraint mattered. Most AI CLIs wrap the Anthropic SDK and want a key. This one wraps Claude Code itself as a subprocess, reuses the auth tokens already in the Godmode account, and stays free of any extra billing surface.

CLI architecture: thin wrapper over Claude Code subprocess, reusing Supabase Edge Function for auth. godmode (CLI) chat / run / ask / login claude (subprocess) spawned per command Supabase Edge mcp-actions gmmcp_* token (already minted by the website) No Anthropic API key. No new backend endpoints. Zero new auth surface.
The CLI never talks to Anthropic directly. It shells out to Claude Code and rides the user's existing Godmode tokens.

godmode chat

Interactive session that streams Claude Code output and routes write-actions through the user's account.

godmode run

One-shot prompt with file/stdin support. Returns the response and exits clean.

godmode skills

List, install, and run any skill the user owns. The CLI reads the same Supabase rows the website uses.

godmode login

Pairs the CLI to the user's website account via a device code. No raw key handling.

Build stats

This was an Orchestra run. Daemon-driven, lean narrator session, every worker spawned with a fresh context window. The receipt:

MetricValue
Toolone-shot-orchestra (plan mode)
Total points999
Level reached5
Longest combo49 workers in a row without a stumble
Files shipped23
Lines of code~2,470
Tests164 / 165 green
Achievements unlockedCLEAN, FIRST_BLOOD, DOUBLE_KILL, TRIPLE_KILL, RAMPAGE, GODLIKE
Reaper kills (legitimate work)6 (Builder-1 x3, Builder-2 x3)
Narrator sessions used2 (original died, second took over)
VerdictShipped under duress

Dimension scorecard

The honest part. Some dimensions scored well. Two scored badly, for reasons that turn out to be informative.

DimensionScoreWhy
Polish0.87Telemetry stream (the one Builder that ran solo) pen-tested clean against 10 canary strings.
Documentation0.91Each verb shipped with help text, examples, and inline JSDoc.
Correctness0.28Scorer read result.json files. Three of four said "throttled stub." Scorer did not check the disk.
Completeness0.18Same reason. The bookkeeping said one of four streams finished. The disk said four of four.

The scorer punished sins not committed. Builders 1, 2, and 4 had all written their files to disk before the reaper killed them. The result.json files lied because the process that would have written them never got the chance.

What broke

Five distinct failures, each interesting on its own, terrifying as a chain. Watch the fan-out happen:

Builder fan-out :: live CPU
80% reaper threshold ↓
Builder-1 stream 1 :: cli core
REAPED ×3
91%
Builder-2 stream 2 :: commands + auth
REAPED ×3
87%
Builder-3 stream 3 :: telemetry
SHIPPED
62%
Builder-4 stream 4 :: tests
REAPED
93%
3 of 4 Builders crossed the reaper threshold and were killed. Only Builder-3 stayed under and shipped solo. But the killed Builders had already written their files to disk before they died.
  1. Thermal throttling on the Builder fan-out

    Four parallel Builders kicked off. Three immediately hit 87 to 93 percent CPU. The watchdog has an 80 percent reaper threshold to protect the laptop. Only the fourth Builder (telemetry) survived to ship solo.

  2. The reaper killed legitimate work

    Builders 1 and 2 hit the reaper three times each at heartbeat timeout. Their budget was 900 seconds. They went 905 seconds without a heartbeat. But by then, their files had already shipped to disk. The reaper killed processes that had already done the work.

  3. The narrator session died

    The narrator is the Claude session that watches the orchestra and writes signoff notes. Somewhere around the second wave of reaper kills, it stopped responding. A different session, in a different tab, was asked to take over.

  4. result.json diverged from disk truth

    The Merger phase reads each Builder's result.json to decide what made it into the build. Builder-1's result.json said "throttled stub, no artifacts." Builder-2's was missing entirely. But cli.ts, the auth flow, and all six command verbs were sitting in the work tree.

  5. Test caught the real launch blocker

    The Test phase ran 165 checks. One failed, but the meaningful one was a HIGH severity finding: Row Level Security was off on a Supabase telemetry table. Any anonymous client could have read it. The Harden phase locked it down before merge.

Heartbeats lie. Disk does not.

The story of this run is the story of two different signals disagreeing about what was true.

Heartbeat-based liveness flatlines at the reaper threshold while disk-write count keeps climbing. Same Builder, two stories. SIGNAL TIME → REAPER THRESHOLD :: 80% CPU heartbeat: FLATLINE x disk writes: STILL CLIMBING
One Builder, two truths. The reaper sees the red line. The merger should see the green line. Ours saw red.

Think of it like a courier who got hit by a car a block from the depot. The dispatcher's tracking app says the parcel was never delivered, because the courier never logged the drop-off. But the parcel is sitting in the lobby. The dispatcher is asking the wrong question. The right question is, did the parcel arrive?

The narrator handoff

Heartbeat failure on the Builders was bad. The narrator session dying on top of it could have been fatal. It was not, because the orchestra protocol writes everything to disk as it goes.

The second narrator was given the working directory and one instruction: read the truth from the files and write the signoffs the original narrator did not get to write.

> watching orchestra phase: Builder fan-out
> Builder-1 heartbeat: 14m ago...
> Builder-2 heartbeat: 13m ago...
> reading bugs.jsonl ...
> ERR heartbeat watchdog: budget exceeded
> reaper killed Builder-1 (3rd time)
> reaper killed Builder-2 (3rd time)
> ............................
> SESSION TERMINATED
 
> takeover: cwd = orchestra-runs/.../
> ls work/builder-1/
cli.ts  claude/  commands/chat/  login/
> ls work/builder-2/
commands/run/  ask/  logout/  whoami/
> cat result-Builder-1.json
"status": "throttled stub"
> on-disk says SHIPPED. trusting disk.
> hand-writing signoffs ...
> OK :: 4 of 4 streams accepted
> orchestrator-takeover :: narrator hand-write of Builder-1 and Builder-2 result.json. Original narrator session died. Taking over as orchestrator. Builder-1 and Builder-2 were reaper-killed 3x each at heartbeat timeout because all their Stream 1/Stream 2 deliverables had shipped on disk anyway.

That paragraph is from the run log. It is also the cleanest description of what a self-healing multi-agent system has to do when the narrator goes silent. Anyone with the same disk and the same protocol can finish the work.

What the scoreboard saw vs what shipped

This is the part the dimension scores buried. Same four streams, two completely different stories depending on which signal you trust. Toggle between them:

Stream
Evidence
Counted?
Builder-1
result.json :: "throttled stub"
✗ NO
Builder-2
result.json :: absent (worker died before write)
✗ NO
Builder-3
result.json :: "success" (telemetry stream)
✓ YES
Builder-4
result.json :: "throttled stub"
✗ NO
divergence :: 3 of 4 streams disagree

Three streams in the "no" column did not deserve to be there. The disk was the ground truth. The bookkeeping was a stale snapshot of an unfinished process.

So what changes

This run shipped, but only because a second narrator picked through the disk by hand. That is not a protocol, that is luck. Two things are getting fixed in the next iteration:

Do

Treat the working tree as the source of truth. The merger and scorer both read the actual files before trusting any result.json the worker may or may not have written.

Don't

Trust a heartbeat as proof a worker is doing nothing. A worker can be silent because it has finished, because it is throttled, or because it is dead. The signal does not distinguish.

The next narrator design also gets a small superpower: when it spawns, it reads the latest disk state itself instead of asking the previous narrator. If the previous narrator is alive, that is a free integrity check. If the previous narrator is dead, the takeover is silent.

Self-healing multi-agent work is mostly about which signal you trust when two signals disagree. Bookkeeping is convenient. Files are real. Pick the real one.

What it cost

The honest number, because build-recap posts are supposed to be receipts and not brags.

PhaseWorker countPoints
Bootstrap + diagnose + recon3121
Research (incl. plan mode)4 + planner195
Builders (4 streams, reaper killed 2 of them 3x each)4 (x retries)191
Test3163
Harden + document + verify + polish4122
Visual + scoring + judgingvaried207
Total999

The point of showing this table is that the costliest phases (Builders and Scoring) are also the ones most affected by the bug above. If the scorer had read the disk, the Builder retries would not have happened, the heartbeat reaper would not have fired three times each on two workers, and the same build would have cost roughly a third less.

The takeaway

Three things to keep, if any of this lands.

Disk over bookkeeping

If two signals about the state of a build disagree, the files on disk are the one that matters. The other is a story about the files.

Liveness is not progress

A silent worker is not necessarily a stuck worker. A loud worker is not necessarily a productive one. Measure work by the artifacts, not the chatter.

Narrators are interchangeable

If a multi-agent system needs the same Claude session running the whole time, it has a single point of failure that is going to cost you. Make every narrator a takeover candidate.

Run your own orchestra

One-Shot Scripts ships with the same daemon-driven, lean-narrator protocol the run above used. Workers run in fresh context windows. The disk is the source of truth. Self-healing on day one.

See One-Shot Pricing