chore: add disk space cleanup steps to CI pipeline (Phase 3)#8279
chore: add disk space cleanup steps to CI pipeline (Phase 3)#8279vhvb1989 wants to merge 5 commits into
Conversation
Add a reusable cleanup-disk-space.yml step template and insert it at strategic points in the Linux CI jobs to address 71 'Free disk space on / is lower than 5%' warnings. Cleanup points in build-cli.yml: - After test run: clean Go cache, NuGet, and .NET temp artifacts - After release build: clean Go build cache - After Linux package build: clean Docker images/containers Cleanup points in cross-build-cli.yml: - After Linux ARM64 package build: clean Docker and Go cache All cleanup steps are Linux-only, use continueOnError to avoid breaking builds, and log disk usage before/after for observability. Resolves #7783 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a reusable Azure Pipelines step template to reclaim disk space on Linux build agents and wires it into the CLI build job templates to reduce “Free disk space on / is lower than 5%” warnings.
Changes:
- Added a reusable
cleanup-disk-space.ymlstep template with parameterized cleanup targets (Docker, Go cache, NuGet, .NET temp) and before/after disk usage logging. - Inserted cleanup steps into
build-cli.ymlat multiple points in the job to reduce transient disk pressure. - Inserted cleanup steps into
cross-build-cli.ymlafter Linux ARM64 package build.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| eng/pipelines/templates/steps/cleanup-disk-space.yml | New reusable cleanup template (Linux-only) for reclaiming disk space via targeted cache/prune steps. |
| eng/pipelines/templates/jobs/build-cli.yml | Adds multiple invocations of the cleanup template during the BuildCLI job. |
| eng/pipelines/templates/jobs/cross-build-cli.yml | Adds a cleanup invocation after the Linux ARM64 package build. |
- Remove CleanGoCache from post-test cleanup to preserve cache for release build - Add BuildLinuxPackages condition to Docker cleanup in build-cli.yml - Add BuildLinuxPackages condition to Docker+Go cleanup in cross-build-cli.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove redundant 2>/dev/null || true from cleanup commands; rely on continueOnError - Update PR description to reflect post-test cleanup no longer includes Go cache Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run azure-dev - cli |
|
Azure Pipelines could not run because the pipeline triggers exclude this branch/path. |
📋 Prioritization NoteThanks for the contribution! The linked issue isn't in the current milestone yet. |
There was a problem hiding this comment.
This cleanup template is a good targeted way to reduce Linux agent disk pressure, and the latest iteration addresses the earlier cache-timing and condition feedback. The Linux gating, continueOnError, and before/after df -h / logging make the cleanup behavior observable without making cleanup failures block the build.
I found two follow-up items worth addressing or confirming before merge:
-
eng/pipelines/templates/jobs/build-cli.ymllines 174-178
The post-test cleanup uses the template default condition ofsucceeded(). If the test step fails,Publish test resultsstill runs because it usessucceededOrFailed(), but this cleanup step will be skipped. Since reclaiming disk is still useful after failed tests, passCondition: succeededOrFailed()for this invocation. -
The affected Azure Pipelines template path does not appear to have been exercised on this PR. The
/azp run azure-dev - cliattempt was excluded by pipeline triggers, so the new compile-time${{ if }}blocks and composed runtime conditions have not been validated in the target ADO pipeline. Please run the affected pipeline to confirm template expansion for the new cleanup wiring.
- Add Go cache cleanup BEFORE Linux package build (not just after) - Split cleanup: Go cache before Docker build, Docker after - Add temp/cache directory overrides (DOCKER_TMPDIR, TMPDIR, GOCACHE) to CrossBuildCLI LinuxARM64 matrix entry, matching BuildCLI Linux Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add Docker prune and system cache cleanup (apt, temp files) BEFORE the build-linux-packages step. The Docker build + Copacetic security scan consume ~4GB, pushing disk from 92% to 98% and generating 54+ disk space warnings. By cleaning Docker images and system caches before the build, we free enough space to stay under the 95% warning threshold. Changes: - cleanup-disk-space.yml: Add CleanSystemCaches parameter (apt-get clean, remove apt lists/archives, temp Go build files) - build-cli.yml: Add Docker + system cache cleanup before Linux packages - cross-build-cli.yml: Add Docker + system cache cleanup before Linux packages Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jongio
left a comment
There was a problem hiding this comment.
One item from @hemarina's review isn't addressed in the latest commits: the post-test cleanup step in build-cli.yml defaults to succeeded(), so it won't run after test failures. Disk reclamation is still useful in that scenario for any succeededOrFailed() steps that follow.
@JeffreyCA's approval was submitted against d0b97c9 (before the latest 2 commits). May need re-approval depending on branch protection rules.
The pipeline still hasn't been exercised in ADO - the /azp run attempt was excluded by branch triggers. Worth validating template expansion before merging.
| displayName: Publish test results | ||
| condition: succeededOrFailed() | ||
|
|
||
| - template: /eng/pipelines/templates/steps/cleanup-disk-space.yml |
There was a problem hiding this comment.
This invocation should pass Condition: succeededOrFailed() so cleanup still runs after test failures. The default condition is succeeded(), meaning disk won't be reclaimed if the test step fails - but subsequent succeededOrFailed() steps (and the agent's next job on self-hosted pools) would benefit from the freed space.
Phase 3: Reduce disk space warnings from Linux CI agents
Resolves #7783
Problem
The 1ES Pipeline Template checks disk space between every task and emits
Free disk space on / is lower than 5%warnings when usage exceeds 95%. Linux agents in the ADO pipeline were generating 71+ such warnings per build.Solution
Created a reusable disk cleanup step template and inserted it at strategic points in the CI pipeline to keep disk usage below the 95% warning threshold.
Changes
eng/pipelines/templates/steps/cleanup-disk-space.yml(new) — Reusable cleanup template with parameterized targets:CleanDocker:docker system prune -af --volumesCleanGoCache:go clean -cache && go clean -testcacheCleanNuGet:dotnet nuget locals all --clearCleanDotNet: Remove temp NuGet/dotnet artifactsCleanSystemCaches: Clean apt cache, apt lists, temp Go build filescontinueOnError: true, withdf -hdiagnosticseng/pipelines/templates/jobs/build-cli.yml— 3 cleanup insertions:eng/pipelines/templates/jobs/cross-build-cli.yml— 2 cleanup insertions:eng/pipelines/templates/stages/build-and-test.yml— AddedDOCKER_TMPDIR,TMPDIR,GOCACHEoverrides to LinuxARM64 cross-build matrix (matching BuildCLI_Linux)Key Design Decisions
BuildContainerImage+ Copacetic security scan consumes ~4GB. Cleaning Docker images and system caches BEFORE the build keeps disk under the 95% threshold during Copacetic scanningBuildLinuxPackages: Only runs when the fpm Docker container is actually builtcontinueOnError: true: Cleanup failures should never block the build