Skip to content

feat(tamanu/start): probe started services and add --logs#436

Open
passcod wants to merge 3 commits into
mainfrom
start-probes
Open

feat(tamanu/start): probe started services and add --logs#436
passcod wants to merge 3 commits into
mainfrom
start-probes

Conversation

@passcod
Copy link
Copy Markdown
Member

@passcod passcod commented May 30, 2026

🤖 tamanu start previously returned as soon as the supervisor reported the units active — but an active unit isn't the same as the container accepting connections, so a service could be "started" yet not actually serving. This makes start verify readiness using the same HTTP probes the restart command already had, and adds a --logs option to watch startup.

First commit extracts the probe primitives (http_client, probe_until_ready, probe_url, probe_once, parse_duration, and the probe-URL construction) out of restart.rs into a shared probe module, with restart delegating to it — no behaviour change.

Second commit adds to start:

  • After the post-start caddy reload, it re-discovers the now-running services and probes every behind-caddy, expected-Up instance for readiness within a single overall budget (--probe-timeout, default 1m). If any fails to respond in time, start bails with an error naming the service. --no-probe-http skips the check.
  • --logs streams the tamanu service logs (journalctl on systemd, pm2 logs on Windows) for the duration of the start, torn down via a Drop guard when the command returns (on success or bail).

The instance-selection logic is a pure helper covered by unit tests; the Windows build was cross-checked.

passcod and others added 3 commits May 30, 2026 13:47
Move the HTTP readiness-probe helpers (http_client, parse_duration,
probe_until_ready, probe_url, probe_once) out of restart.rs into a new
probe.rs, and extract the probe-URL construction (container IP for
systemd, pm2 PORT for pm2) into a shared instance_probe_url. restart's
probe_instance now calls into the shared module; no behaviour change.

Co-authored-by: Claude <noreply@anthropic.com>
After bringing services up, probe the behind-caddy HTTP services (API,
frontend, patient portal) for readiness within a single one-minute
budget (--probe-timeout). The supervisor reporting a unit active isn't
the same as the container accepting connections, so this catches
services that started but never came up — start now bails (naming the
failing service) instead of silently reporting success.
--no-probe-http skips the check.

Add --logs to stream the tamanu service logs (journalctl -f on systemd,
pm2 logs on pm2) for the duration of the start/probe work. The follower
is held in a Drop guard so it's torn down on both success and bail.

Co-authored-by: Claude <noreply@anthropic.com>
@passcod passcod enabled auto-merge May 30, 2026 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant