We gave the same prompt to vanilla Claude and three Godmode tiers. The difference isn't subtle.
Claude Opus 4.7 ·
May 2026 ·
Identical environment
claude-code — prompt
$Make an instructional animation on how to boil an egg
The test: One prompt. No follow-up. No clarification. Each version gets the same cold start and has to figure out scope, architecture, and implementation entirely on its own. The metrics below are from real runs.
mediumNo unit tests for state-machine transitions — coverage came from an 11-state visual audit, but there's no persistent regression suite to re-run later
mediumPartial ARIA — doneness tabs are labelled but the SVG scene is aria-hidden and the step list isn't announced as the current cooking step
lowKeyboard shortcuts (space, R, 1–4) work but aren't documented anywhere in the UI
lowMobile breakpoints (768 px, 480 px) added in showcase prep but only validated in a desktop-Chromium emulator, not on physical devices
lowNo graceful handling for very wide aspect ratios beyond the 1280 px max-width container
mediumNo automated regression test suite — only screenshot-based visual sanity checks (screenshot-test.js, verify/ frames). A future text/timing change could silently regress without being caught.
mediumTTS voice is en-US-AvaNeural — user is Australian, so accent mismatch is a minor authenticity issue. Easy fix (swap to en-AU-NatashaNeural) but wasn't caught at planning time.
lowStep-6 caption initially overlapped the "Perfect every time." final-text — caught only after user feedback; codified as a new "Don't regress what was working" hard rule in the skill.
lowInitial water level in pot scenes filled to the rim despite narration saying "about one inch above the tops" — caught only after user feedback; codified as a new "Internal consistency" hard rule in the skill.
lowOriginal narration title said "the complete six-step guide" but only Steps 1–5 were spoken — same internal-consistency class of bug as above.
lowCaptured video had a 5s trailing blank stage (Playwright recorded longer than the JS animation timeline). Fixed by adding `-t 61.5` trim flag to record.js and narrate.py.
fixedOriginal silent video had no narration; added a 7-line edge-tts script (en-US-AvaNeural, +5% rate) muxed in via ffmpeg adelay+amix. Codified as the "Match the medium's expectations" hard rule.
fixedWater level corrected from full-pot fill to ~1 inch above egg tops; bracket repositioned inside the pot to clearly indicate the gap.
fixedStep 6 caption added to peel scene + "Step six. Crack, peel..." narration line so spoken count matches the title's six-step claim.
fixed"Perfect every time." text moved to the top of scene-peel so the new caption strip doesn't cover it.
fixedBubble keyframe travel reduced from -280px to -155px so bubbles pop at the new water surface instead of floating into the air above it.
fixedCSS-transform-overrides-SVG-transform bug: animated eggs were rendering at (0,0) instead of their position. Wrapped each animated egg in outer <g transform="translate(...)"> + inner animated <g> with transform-box: fill-box.
Assess-fix loop — shipped only when all dimensions passed
Total Tokens
9,715,635
228 in / 130,946 out
API Cost
$30.29
estimated
Time
29m 47s
wall clock
Files
13
created
Test Suite
0
tests written
Loops
4
self-corrections
Quality Audit
Code Quality
0.78
Testing
0.05
Security
0.85
Error Handling
0.55
Completeness
0.72
UX / Polish
0.50
Issues Found
criticalUser verdict: Rejected. Flagged visual quality, sync/timing, and content/instructions all as weak — wanted higher-fidelity animation AND more polish on the executed approach.
highCaption fade transitions bleed across scene boundaries — at scene N → N+1 boundaries, the next scene's caption appears while the previous scene's visuals are still on screen, reading as a glitch in the rendered video.
highcapture.js used wrong Playwright API signature for page.waitForFunction — passed { timeout } as 2nd arg instead of 3rd. The timeout was silently ignored (defaulted to 30s), so the 43s animation never reached the __animationDone flag and the recording failed. Fixed mid-session by passing null as the arg parameter.
mediumFlames are visually small and rely only on CSS keyframe motion — don't convincingly read as a 'rolling boil' even after multiple polish passes.
mediumBubbles barely visible in the boil scene despite explicit polish iterations to make them more prominent.
mediumScene durations duplicated between timings.json and the SCENES array in animation.html — drift risk if one is changed without the other.
mediumHandover suggested NODE_PATH=$HOME/playground/node_modules — the POSIX-style path didn't work on Windows Node 24. Required Windows-style 'C:\\Users\\Lloyd Gibbs\\playground\\node_modules' instead. Hidden footgun for the next agent.
lowZero automated tests — only manual frame verification via screenshot pass + ad-hoc audit frame extraction. No regression suite to catch future drift.
lowPeel scene shows whole egg + cross-section side-by-side but lacks visual progression of the peeling action itself; static end-state instead of demonstrative motion.
lowStep indicator and progress dots update instantly on scene activation while captions cross-fade — contributes to the caption-overlap effect at boundaries.
fixedcapture.js page.waitForFunction signature corrected mid-session (timeout option moved from 2nd to 3rd argument with null arg).
fixedNODE_PATH switched from POSIX-style to Windows-style for Node 24 compatibility on the user's Win11 setup.
Composite Score0.57
Head-to-Head
Metric
Vanilla
Godmode
Godmode+
One-Shot
Total Tokens
35,500
14,972,064
274,500
9,715,635
API Cost
$0.45
$33.38
$2.50
$30.29
Time
6m 40s
26m 15s
48m 22s
29m 47s
Files Created
1
1
33
13
Tests Written
0
0
1
0
Self-Corrections
0
0
0
4
Composite Score
0.60
0.78
0.81
0.57
Issues at Delivery
8
5
6
10
Note: Higher token usage and cost for Godmode tiers reflects deeper execution — more context loaded, more tests written, more security checks, more verification passes. You're paying for quality, not verbosity.
See for yourself.
Same prompt. Same model. The only difference is the skill. Stop settling for first-draft output.