Follow-ups from an independent review of #67 (flow do --auto). The feature shipped (admin-merged after @rr0hit's pre-rebase approval); these are non-blocking hardening items. None are user-visible regressions — the run-status tracking is self-healing via read-time pid reconciliation — but a few writes aren't strictly race-safe and the env strip is claude-specific. Recording them in one issue so they are not lost — unassigned for now (no owner; pick up when someone has bandwidth).
1. [high] Launch/finalize race can briefly mislabel a finished run
flow do --auto writes auto_run_status='running' (in recordAutoRunLaunched) after launchAutoRun starts the detached supervisor, which runs concurrently. If the harness exits almost instantly (e.g. claude not on PATH → immediate fail), the supervisor's finalizeAutoRun('dead') can land before the parent's record, which then clobbers the terminal row back to running with an already-dead pid. Reconciliation flips it back to dead on the next show/list, so it's self-healing, but there's a wrong-status window. (A completed→dead downgrade would require the run to succeed in microseconds, so that variant is theoretical.)
Fix: make finalize authoritative — guard recordAutoRunLaunched (and/or finalize) with a conditional WHERE on status/auto_run_finished so a finished run can't be resurrected to running. Keep --force relaunch working (clear the prior terminal state as part of the claim).
2. [medium] Completed run mislabeled dead on a transient DB error
In cmdAutoExec, after autoRunner returns nil the code re-reads the task to check status=='done'. If that GetTask errors (transient SQLITE_BUSY), it falls through to dead even though the run completed.
Fix: retry the read (DB already has a 30s busy_timeout) or treat read-failure as indeterminate rather than asserting dead.
3. [medium] TOCTOU on the "already running" guard
The auto in-flight guard (reconcileAutoRun + status check) runs before the status-flip tx with no lock, so two concurrent flow do --auto <same-task> can both pass it and spawn duplicate supervisors (last-writer-wins on pid; both write the same session jsonl). Low likelihood — launchd schedules distinct run-slugs — but the guard is advertised as a safety check and isn't truly race-safe.
Fix: fold the in-flight re-check into the status-flip transaction, or claim atomically in recordAutoRunLaunched (0 rows affected → lost the race → kill the just-spawned supervisor).
4. [low] autoChildEnv strips only claude's session-id env var
It drops CLAUDE_CODE_SESSION_ID, but AutoRunArgv is harness-generic and the supervisor resolves whatever harness the task is pinned to. A future codex/gemini adapter's session-id env (CODEX_THREAD_ID, GEMINI_SESSION_ID) would leak through, reintroducing the reverse-lookup confusion this strip prevents.
Fix: strip every h.SessionIDEnvVar() across allHarnesses(), not just claude's.
5. [low] --force auto-relaunch orphans a still-alive supervisor silently
When a prior run is genuinely still running and --force is passed, recordAutoRunLaunched overwrites the pid; the old supervisor keeps running and writing the same transcript, untracked. Intentional (mirrors interactive --force), but unlike the interactive path it emits no warning.
Fix: warn when --force overrides a still-alive auto run (name the orphaned old pid).
6. [low] pid-reuse window in reconciliation
A stale running row's pid could be reassigned after a reboot/long uptime, so reconciliation keeps it running (or matches a wrong pid). Inherent to pid-based liveness.
Fix (optional): also confirm the pid's argv mentions __auto-exec <slug> before treating it as alive (the harness already scans ps).
Test gaps
Source: independent post-rebase review of #67. Cleared as correct in the same review: the migration/column-ordering, AutoRunArgv (session pin + inject + skip-perms), --with-file threading, and the launch-failure rollback.
Follow-ups from an independent review of #67 (
flow do --auto). The feature shipped (admin-merged after @rr0hit's pre-rebase approval); these are non-blocking hardening items. None are user-visible regressions — the run-status tracking is self-healing via read-time pid reconciliation — but a few writes aren't strictly race-safe and the env strip is claude-specific. Recording them in one issue so they are not lost — unassigned for now (no owner; pick up when someone has bandwidth).1. [high] Launch/finalize race can briefly mislabel a finished run
flow do --autowritesauto_run_status='running'(inrecordAutoRunLaunched) afterlaunchAutoRunstarts the detached supervisor, which runs concurrently. If the harness exits almost instantly (e.g.claudenot on PATH → immediate fail), the supervisor'sfinalizeAutoRun('dead')can land before the parent's record, which then clobbers the terminal row back torunningwith an already-dead pid. Reconciliation flips it back todeadon the nextshow/list, so it's self-healing, but there's a wrong-status window. (Acompleted→deaddowngrade would require the run to succeed in microseconds, so that variant is theoretical.)Fix: make finalize authoritative — guard
recordAutoRunLaunched(and/or finalize) with a conditionalWHEREon status/auto_run_finishedso a finished run can't be resurrected torunning. Keep--forcerelaunch working (clear the prior terminal state as part of the claim).2. [medium] Completed run mislabeled
deadon a transient DB errorIn
cmdAutoExec, afterautoRunnerreturns nil the code re-reads the task to checkstatus=='done'. If thatGetTaskerrors (transientSQLITE_BUSY), it falls through todeadeven though the run completed.Fix: retry the read (DB already has a 30s busy_timeout) or treat read-failure as indeterminate rather than asserting
dead.3. [medium] TOCTOU on the "already running" guard
The auto in-flight guard (
reconcileAutoRun+ status check) runs before the status-flip tx with no lock, so two concurrentflow do --auto <same-task>can both pass it and spawn duplicate supervisors (last-writer-wins on pid; both write the same session jsonl). Low likelihood — launchd schedules distinct run-slugs — but the guard is advertised as a safety check and isn't truly race-safe.Fix: fold the in-flight re-check into the status-flip transaction, or claim atomically in
recordAutoRunLaunched(0 rows affected → lost the race → kill the just-spawned supervisor).4. [low]
autoChildEnvstrips only claude's session-id env varIt drops
CLAUDE_CODE_SESSION_ID, butAutoRunArgvis harness-generic and the supervisor resolves whatever harness the task is pinned to. A future codex/gemini adapter's session-id env (CODEX_THREAD_ID,GEMINI_SESSION_ID) would leak through, reintroducing the reverse-lookup confusion this strip prevents.Fix: strip every
h.SessionIDEnvVar()acrossallHarnesses(), not just claude's.5. [low]
--forceauto-relaunch orphans a still-alive supervisor silentlyWhen a prior run is genuinely still running and
--forceis passed,recordAutoRunLaunchedoverwrites the pid; the old supervisor keeps running and writing the same transcript, untracked. Intentional (mirrors interactive--force), but unlike the interactive path it emits no warning.Fix: warn when
--forceoverrides a still-alive auto run (name the orphaned old pid).6. [low] pid-reuse window in reconciliation
A stale
runningrow's pid could be reassigned after a reboot/long uptime, so reconciliation keeps itrunning(or matches a wrong pid). Inherent to pid-based liveness.Fix (optional): also confirm the pid's argv mentions
__auto-exec <slug>before treating it as alive (the harness already scansps).Test gaps
cmdAutoExecGetTask-error downgrade (Install path: pick brew / curl | sh / go install #2)flow do --autoTOCTOU (Linux support via pluggable terminal spawner #3)autoChildEnvstripping assertion (Consolidate flowde into flow (hide the second binary) #4)autoLauncherdetachment path (Setsid/ fd inheritance /Process.Releaseare fully stubbed today)Source: independent post-rebase review of #67. Cleared as correct in the same review: the migration/column-ordering,
AutoRunArgv(session pin + inject + skip-perms),--with-filethreading, and the launch-failure rollback.