Merging the day's four pipeline fixes — and proving them live
Four parallel chats → one safe merge → a robot watches the real app do each thing · 2026-06-23
Four parallel chats → one safe merge → a robot watches the real app do each thing
Master summary — the gist in 30 seconds
TL;DRFour separate AI work-sessions each fixed part of the sales pipeline; this session safely combined all four into the live app and then had a tester-robot watch the real app to confirm every fix actually works.
Input: 4 feature branches (1 committed, 3 only half-saved) + your spoken requirements. Output: one merged, deployed app (build fc2b70a+v086) + a live screenshot-by-screenshot report showing 13 of 14 behaviours working for real.
Why this mattersCombining unfinished parallel work is where things silently break. A backup-first merge plus a vision-based live test means 'done' is proven on real pixels, not assumed from green unit tests.
flowchart LR
A[4 chats / 4 branches] --> B[Security backup]
B --> C[Merge into main]
C --> D[Deploy v086]
D --> E[Robot tests live app]
E --> F[13/14 proven]
1 · Reading what you actually asked for
TL;DREach chat had one typed request PLUS extra requests you typed while the AI was still working — and those were almost missed.
Input: 4 raw chat transcripts. Output: the full list of asks per chat, including the 'queued' follow-ups.
Why it mattersMessages typed mid-work are stored differently (as 'queued commands', not normal messages). A naive read drops them — which would have meant testing only half of what you wanted. Catching this is the difference between a real audit and a rubber stamp.
flowchart TD
M[Your message] -->|sent when idle| U[Normal message]
Q[Your message] -->|typed mid-run| K[Queued command]
U --> P[Parsed first try]
K --> X[Missed first try]
X -->|fixed| P
2 · A backup-first, conflict-aware merge
TL;DRSaved a full restore point, then merged the one real branch and consolidated the three half-saved ones into the trunk.
Input: 1 committed branch + 3 branches whose work was only sitting uncommitted. Output: a clean main with all four, 431 tests green, plus a bundle you can roll back to.
Why it mattersThree of the four 'branches' had no commits — their work lived loose in the working tree. Without the backup + careful consolidation, that work could vanish on the first wrong git move.
TL;DRAn Opus agent drove the live app step by step, judged each screenshot, and checked the data really changed — not just the picture.
Input: the deployed app + a list of expected behaviours. Output: a live report — 13 of 14 behaviours pass on real pixels; the 1 exception is verified by data + code instead of by emailing a real person.
Why it mattersIt is easy to fake 'all green'. Judging the actual screen plus the underlying state catches the bugs you found by hand last time. AUTOSEND stayed ON and no real lead was ever emailed.
flowchart TD
S[Drive live app] --> P[Screenshot]
P --> G[Vision judge]
P --> D[State check]
G --> V{Pixels AND data agree?}
D --> V
V -->|yes| OK[LIVE pass]
V -->|cannot send| BL[Blocked but verified]
4 · What's left + the open risk
TL;DRSign the test oracle, optionally finish the one send-side check, and watch out: another session is editing the same repo on a different branch.
Input: a PENDING_REVIEW run + a moving repo. Output: your sign-off, an optional live send test, and a push when you're ready.
Why it mattersThe run waits for YOUR signature by design (an AI must not sign its own homework). And because multiple sessions share this checkout, the branch moved under us — so verify the live build vs git before trusting anything against the current code.
timeline
title Next steps
Now : Sign ebo.signed
Then : Optional live send test (+zz recipient)
Watch : Reconcile concurrent branch 3a712d7
Finally : Push main on your trigger