Skip to content

packer: move to Reflog's Option B — lantern-box installs via cloud-init#248

Open
myleshorton wants to merge 3 commits intomainfrom
fisk/packer-option-b
Open

packer: move to Reflog's Option B — lantern-box installs via cloud-init#248
myleshorton wants to merge 3 commits intomainfrom
fisk/packer-option-b

Conversation

@myleshorton
Copy link
Copy Markdown
Contributor

Summary

Implements Reflog's Option B from #infrastructure-and-services thread ts=1776197690.140869 (2026-04-16): the lantern-box binary stops being baked into the packer image; cloud-init apt-installs it on first boot parameterized by release tag. Packer images become mostly-static per-provider and rebuild only when the image itself changes.

Paired with lantern-cloud PR that lands the control-plane: https://github.com/getlantern/lantern-cloud/pull/new/af/central-vps-updates-schema (design doc at docs/design/central-vps-updates.md).

Commits

  1. packer: strip pre-baked lantern-box install — remove the .deb download/install block from deploy/packer/provision.sh. Keep systemd drop-ins and env file scaffolding (they apply when cloud-init's apt install creates lantern-box.service).
  2. packer: remove lantern-box-update cron + stop rebuilding on every release
    • Delete /etc/cron.d/lantern-box-update and /usr/local/bin/lantern-box-update from provision.sh (~100 lines). The cron was producing 266 silent errors/hour with no host.name — central-orchestration hot-swap replaces it with per-route observability.
    • build-images.yaml: trigger on push to main when deploy/packer/** or the workflow itself changes, not on every release. Removes the 30-min CI hit + cross-region image push that Reflog originally flagged.
    • release.yaml: drop the Trigger Packer image builds step.
    • auto-tag.yaml: drop deploy/packer/** from the version-bump path filter.

Deploy order

Before merging + rolling out new images built from this code: set bandit_vps_default_release_tag (or a per-track override in bandit_vps_image_targets) in the lantern-cloud settings. Otherwise cloud-init's apt-install step is skipped, new VMs boot without lantern-box installed, and systemctl enable --now lantern-box during config push fails.

The lantern-cloud PR adds the schema + plumbing for setting those values. Merge the lantern-cloud PR first, set the default via psql or lc CLI, then merge this PR and trigger a fresh packer build.

Test plan

  • Merge lantern-cloud PR first
  • In staging: UPDATE settings SET value = 'v0.0.73' WHERE key = 'bandit_vps_default_release_tag' (or equivalent via lc)
  • Manually dispatch build-images.yaml in this repo for linode+alicloud (skip OCI for speed on first run)
  • Provision a new bandit VPS in staging, observe cloud-init log to confirm apt-get install lantern-box=0.0.73 runs
  • SSH in, dpkg-query -W -f='${Version}' lantern-box returns 0.0.73
  • Observe the route in the lantern-dashboard bandit feed
  • Bump bandit_vps_default_release_tag to a newer tag; watch BanditVPSHotSwapWorker converge the route
  • After a stable staging run, rebuild prod images and deploy

Reflog in #infrastructure-and-services thread ts=1776197690.140869
(2026-04-16) noted that rebuilding the full N-provider × M-region packer
image matrix on every lantern-box release is expensive (30min of CI per
release, traffic/API usage from pushing to every region), and proposed
moving the lantern-box .deb install to cloud-init.

This commit takes that step on the packer side. The lantern-box binary
is no longer downloaded or installed during `packer build`; the image
contributes only:
 - runtime deps (ca-certificates, tzdata, nftables)
 - /etc/lantern-box/ and /var/lib/lantern-box/ dirs
 - otelcol-contrib + systemd drop-in for host metrics
 - systemd drop-in for lantern-box's OTel env (still applies when
   lantern-box.service appears on disk via cloud-init's apt install)
 - /etc/cron.d/lantern-box-update fallback cron (to be removed in a
   follow-up once central orchestration is stable)

Cloud-init now owns the .deb install via the ReleaseTag field on
PackerCloudInitConfig, landed in
`getlantern/lantern-cloud` a6f92260f. See
`docs/design/central-vps-updates.md` for the full rollout plan.

**Deploy caveat**: before rolling out a new image built from this code,
bandit_vps_default_release_tag (or a per-track override in
bandit_vps_image_targets) MUST be set in lantern-cloud settings.
Otherwise cloud-init's apt-install step is skipped, the box boots
without lantern-box installed, and the provision worker's
`systemctl enable --now lantern-box` will fail. Revert path: re-merge
this commit's reverse diff.

VERSION env var is kept because the image name still includes it as a
label; the script itself no longer uses it, comment updated to match.
…ery release

Follow-ups to the Option B strip (91b027f). Both requested by Reflog in
the same thread ts=1776197690.140869 in #infrastructure-and-services.

Cron removal:
 The /etc/cron.d/lantern-box-update + /usr/local/bin/lantern-box-update
 cron was the thing that was silently failing 266 times/hour at peak
 with "install failed: expected 0.0.70 but got 0.0.68" and no host.name
 on the log lines. Under central orchestration (landed in lantern-cloud
 PR), BanditVPSHotSwapWorker SSHes in, apt-installs the target tag,
 and writes current_release_tag — with per-route success/failure
 observability. If SSH keeps failing, BanditVPSAutoreplaceWorker drains
 the route via destroy+pool rebuild. So the cron is strictly redundant.

 Drops ~100 lines from provision.sh. Also drops /var/log/lantern-box
 dir creation and the logrotate config (only the cron wrote there).

CI optimization:
 - build-images.yaml: trigger on push to main when deploy/packer/** or
   the workflow itself changes, not on every lantern-box release.
   Under Option B the binary lives in the .deb published to GitHub
   Releases, not in the image, so a plain Go release needs no rebuild.
 - release.yaml: drop the "Trigger Packer image builds" step. Releases
   still produce .debs via goreleaser, but images stay put.
 - auto-tag.yaml: drop deploy/packer/** from the version-bump path
   filter. Packer-only changes no longer cause a version bump → release
   cycle; build-images.yaml picks them up directly.
 - prepare job in build-images.yaml: on push events, derive version
   from the latest git tag (image is version-agnostic content-wise
   under Option B, but the name is still used as a prefix by the
   per-provider latestImage() helpers in lantern-cloud).

Net effect: a typical Go release goes goreleaser → .deb → GitHub
Releases → hot-swap. No image rebuild, no 36-region OCI fan-out, no
cross-region image push. Reflog's 30 minutes of CI per release + the
API/traffic cost goes away. Packer images rebuild only when the image
itself actually changes.
Copilot AI review requested due to automatic review settings April 22, 2026 19:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Packer image build/deploy flow to implement “Option B”: stop baking a specific lantern-box version into VM images and instead install the desired version at first boot via cloud-init, reducing image rebuild frequency and CI cost.

Changes:

  • Removes the pre-baked lantern-box .deb install and the per-host auto-update cron from the Packer provisioning script.
  • Adjusts GitHub Actions workflows so Packer image builds trigger on deploy/packer/** (and manual dispatch) rather than on every release.
  • Updates Packer README/docs to describe the new “version-agnostic image” approach.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
deploy/packer/provision.sh Removes baked-in install/cron and documents cloud-init installation; keeps systemd drop-ins and base deps.
deploy/packer/README.md Documents that lantern-box is no longer included in the image and adds operator guidance.
.github/workflows/release.yaml Removes the step that triggered Packer builds on every release.
.github/workflows/build-images.yaml Changes triggers to push-path based rebuilds and makes version input optional with “latest tag” fallback.
.github/workflows/auto-tag.yaml Removes deploy/packer/** from the auto-tag path filter to avoid version bumps for image-only changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deploy/packer/provision.sh Outdated
Comment thread deploy/packer/provision.sh Outdated
Comment thread deploy/packer/README.md
Comment thread .github/workflows/build-images.yaml
Comment thread .github/workflows/build-images.yaml Outdated
- provision.sh: stale comment said the packer image still contributes
  an "auto-update fallback cron" — the cron was removed in an earlier
  commit. Drop the phrase.
- provision.sh: add wireguard-tools to match the Dockerfile + README
  package list (per the "keep in sync with Dockerfile" comment above
  the apt-get install). The lantern-box .deb doesn't declare it as a
  runtime dep in goreleaser's nfpms section, so apt won't pull it in
  on first boot — we need to install it here.
- provision.sh: the old "command -v lantern-box" verification at the
  tail of the script was stranded when we stripped the pre-baked
  install. Under Option B, lantern-box is expected to be absent until
  cloud-init runs. Replace with a verification that checks the things
  the packer image actually contributes: the systemd drop-ins, the
  /etc/lantern-box + /var/lib/lantern-box dirs, the OTel config, the
  Lanternet CA cert, and the tailscale + otelcol-contrib sidecars.
- README.md: reconciled the intro line with the "Not in the image"
  section below. The old "Pre-baked VM images with lantern-box
  installed" claim contradicted Option B and would have confused
  future readers.
- build-images.yaml: switched `${{ inputs.version }}` and
  `${{ inputs.builders }}` to `github.event.inputs.*`. The `inputs`
  context is only populated for workflow_dispatch + workflow_call,
  not push. Evaluators have historically been permissive here but
  `github.event.inputs.*` is strictly more defensive on a push
  trigger without any downside (this workflow doesn't use
  workflow_call).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +96 to 101
# daemon-reload is a no-op here for the (not-yet-installed) lantern-box
# service, but the otelcol-contrib service below needs it to pick up its
# env drop-in. The apt install that runs under cloud-init will
# daemon-reload again after the service unit appears on disk.
systemctl daemon-reload

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants