Skip to content

Add Docker caching in deployments#4831

Merged
backspace merged 7 commits into
mainfrom
deployment-speedup-cs-11143
May 15, 2026
Merged

Add Docker caching in deployments#4831
backspace merged 7 commits into
mainfrom
deployment-speedup-cs-11143

Conversation

@backspace
Copy link
Copy Markdown
Contributor

@backspace backspace commented May 14, 2026

This temporarily inlines the gh-actions’s repository’s docker-build-ecr action to use ECR’s caching, which results in mild speed gains described below. It won’t hugely speed up deployments because the Docker images are built in parallel, but it’ll at least reduce concurrency.

If this works well with real-world usage, I’ll follow up to upstream it into gh-actions.

Claude:

Build-time impact

Compared 8 successful staging deploys from main against 4 warm-cache deploys on this branch. Times are seconds.

Docker image Baseline median (n=8) Baseline p25–p75 Cold ECR Warm ECR median (n=4) Δ vs baseline
prerender 305 273–368 293 258 −15%
realm-server 214 195–274 180 105 −51%
worker 203 195–234 187 110 −46%
pg-migration 206 192–228 176 115 −44%
prerender-mgr 213 194–220 177 107 −50%
bot-runner 213 196–410 205 178 −16%
ai-bot 209 197–239 187 181 −13%
host (control) 215 207–233 215 215 0% (control)

Critical-path Docker build (slowest of the parallel image builds) drops 305s → 258s (−47s). Total Docker CI-minutes (sum across the 7 image builds) drop 1,563s → 1,054s (−33%).

Notes

  • Five Dockerfiles got the cache-friendly pnpm-fetch layer ordering in this PR (prerender, realm-server, worker, pg-migration, prerender-manager). Those show the biggest drops. bot-runner and ai-bot use unchanged Dockerfiles and only gain from the GHA→ECR cache-backend swap.
  • host is a control: built by a separate workflow this PR doesn't touch. The 215s warm median exactly matches the baseline median, confirming the savings above aren't runner-side variance.
  • One of the four warm runs (25880095835) saw an ECR cache-pull transfer-bandwidth spike that doubled prerender and tripled pg-migration build times despite confirmed cache hits in the buildx logs. Medians absorb it.
  • Baseline ranges are wide (e.g. bot-runner 184–468s) because runner cold-start variance routinely swings GHA build times by ±100s. p25–p75 is more useful than min–max.

Data:

The new datapoint stabilizes the picture nicely: host control is now exactly 215 vs 215 baseline (a clean 0%), and the variance-prone medians settled. prerender critical-path savings firmed up to −15% / −47s. The 321s pg-migration outlier from the third warm run is balanced out by the 110s here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

Host Test Results

    1 files      1 suites   1h 40m 28s ⏱️
2 658 tests 2 643 ✅ 15 💤 0 ❌
2 677 runs  2 662 ✅ 15 💤 0 ❌

Results for commit 3fcf018.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   9m 31s ⏱️ + 2m 5s
1 377 tests +3  1 377 ✅ +4  0 💤 ±0  0 ❌  - 1 
1 458 runs  +3  1 458 ✅ +4  0 💤 ±0  0 ❌  - 1 

Results for commit 3fcf018. ± Comparison against earlier commit ae708a3.

backspace and others added 2 commits May 14, 2026 11:42
Reorder the dep-install steps so the `pnpm fetch` layer only invalidates
when the lockfile changes, not on every source edit. Previously the
lockfile COPY was followed immediately by `ADD . ./`, which meant any
file change blew away the fetch and forced ~2-3 min of re-downloading
into the pnpm store. With `cache-from/cache-to: type=gha,mode=max`
already wired through docker-ecr, this should cut several minutes off
each Docker build on the common no-dep-change path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two consecutive staging deploys on this branch showed the GHA cache
backend was making things worse: a "cache hit" on the pnpm fetch layer
took ~84s to download from GHA, plus ~144s to push the updated cache
back — totaling ~230s of cache-only network I/O for an image whose
actual `pnpm fetch` step takes ~30s when run fresh.

Temporarily replace the call to `cardstack/gh-actions/.../docker-ecr.yml`
with an in-repo composite action (`.github/actions/docker-build-ecr`)
that does the same auth + build + push but caches to the same ECR
repository (`<repo>:buildcache`) using `type=registry,mode=max` with
the OCI-manifest flags ECR requires. ECR pulls inside us-east-1 are
much faster than GHA cache traffic, so the layer ordering in the
pnpm Dockerfiles should finally produce a net win.

Inlining (rather than updating cardstack/gh-actions in place) lets us
iterate on the cache config without round-tripping through that repo.
We can fold it back once we're confident the approach holds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@backspace backspace force-pushed the deployment-speedup-cs-11143 branch from adfed76 to b7aecb1 Compare May 14, 2026 17:30
The SHA we initially pinned (v2.0.1) still uses Node.js 20, which is
deprecated on GitHub Actions runners — every build job in the latest
deploy run flagged "Node.js 20 actions are deprecated" against this
one action. v2.1.5 switched to node24 (verified by inspecting the
tagged commit's action.yml). All other pinned actions in the local
docker-build-ecr composite (`actions/checkout`, `setup-buildx-action`,
`configure-aws-credentials`, `build-push-action`) are already on
node24, so this single bump clears the warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@backspace backspace changed the title Add deployment speedups Add Docker caching in deployments May 14, 2026

WORKDIR /boxel/packages/postgres

CMD ./node_modules/.bin/ts-node --transpileOnly ./scripts/fix-migration-names.ts && ./node_modules/.bin/node-pg-migrate --check-order false --migrations-table migrations up && sleep infinity
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing issue, but CS-11154 tracks it.

@backspace backspace marked this pull request as ready for review May 14, 2026 21:10
@backspace backspace requested a review from a team May 14, 2026 21:37
@backspace backspace merged commit c2c70fc into main May 15, 2026
95 of 96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants