Skip to content

flow do --auto: harden run-status writes + harness-generic env strip (follow-ups from #67 review) #68

@anshulsao

Description

@anshulsao

Follow-ups from an independent review of #67 (flow do --auto). The feature shipped (admin-merged after @rr0hit's pre-rebase approval); these are non-blocking hardening items. None are user-visible regressions — the run-status tracking is self-healing via read-time pid reconciliation — but a few writes aren't strictly race-safe and the env strip is claude-specific. Recording them in one issue so they are not lost — unassigned for now (no owner; pick up when someone has bandwidth).

1. [high] Launch/finalize race can briefly mislabel a finished run

flow do --auto writes auto_run_status='running' (in recordAutoRunLaunched) after launchAutoRun starts the detached supervisor, which runs concurrently. If the harness exits almost instantly (e.g. claude not on PATH → immediate fail), the supervisor's finalizeAutoRun('dead') can land before the parent's record, which then clobbers the terminal row back to running with an already-dead pid. Reconciliation flips it back to dead on the next show/list, so it's self-healing, but there's a wrong-status window. (A completeddead downgrade would require the run to succeed in microseconds, so that variant is theoretical.)
Fix: make finalize authoritative — guard recordAutoRunLaunched (and/or finalize) with a conditional WHERE on status/auto_run_finished so a finished run can't be resurrected to running. Keep --force relaunch working (clear the prior terminal state as part of the claim).

2. [medium] Completed run mislabeled dead on a transient DB error

In cmdAutoExec, after autoRunner returns nil the code re-reads the task to check status=='done'. If that GetTask errors (transient SQLITE_BUSY), it falls through to dead even though the run completed.
Fix: retry the read (DB already has a 30s busy_timeout) or treat read-failure as indeterminate rather than asserting dead.

3. [medium] TOCTOU on the "already running" guard

The auto in-flight guard (reconcileAutoRun + status check) runs before the status-flip tx with no lock, so two concurrent flow do --auto <same-task> can both pass it and spawn duplicate supervisors (last-writer-wins on pid; both write the same session jsonl). Low likelihood — launchd schedules distinct run-slugs — but the guard is advertised as a safety check and isn't truly race-safe.
Fix: fold the in-flight re-check into the status-flip transaction, or claim atomically in recordAutoRunLaunched (0 rows affected → lost the race → kill the just-spawned supervisor).

4. [low] autoChildEnv strips only claude's session-id env var

It drops CLAUDE_CODE_SESSION_ID, but AutoRunArgv is harness-generic and the supervisor resolves whatever harness the task is pinned to. A future codex/gemini adapter's session-id env (CODEX_THREAD_ID, GEMINI_SESSION_ID) would leak through, reintroducing the reverse-lookup confusion this strip prevents.
Fix: strip every h.SessionIDEnvVar() across allHarnesses(), not just claude's.

5. [low] --force auto-relaunch orphans a still-alive supervisor silently

When a prior run is genuinely still running and --force is passed, recordAutoRunLaunched overwrites the pid; the old supervisor keeps running and writing the same transcript, untracked. Intentional (mirrors interactive --force), but unlike the interactive path it emits no warning.
Fix: warn when --force overrides a still-alive auto run (name the orphaned old pid).

6. [low] pid-reuse window in reconciliation

A stale running row's pid could be reassigned after a reboot/long uptime, so reconciliation keeps it running (or matches a wrong pid). Inherent to pid-based liveness.
Fix (optional): also confirm the pid's argv mentions __auto-exec <slug> before treating it as alive (the harness already scans ps).

Test gaps


Source: independent post-rebase review of #67. Cleared as correct in the same review: the migration/column-ordering, AutoRunArgv (session pin + inject + skip-perms), --with-file threading, and the launch-failure rollback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions