Skip to content

chore: add disk space cleanup steps to CI pipeline (Phase 3)#8279

Open
vhvb1989 wants to merge 5 commits into
mainfrom
ci-warnings-reduction
Open

chore: add disk space cleanup steps to CI pipeline (Phase 3)#8279
vhvb1989 wants to merge 5 commits into
mainfrom
ci-warnings-reduction

Conversation

@vhvb1989
Copy link
Copy Markdown
Member

@vhvb1989 vhvb1989 commented May 20, 2026

Phase 3: Reduce disk space warnings from Linux CI agents

Resolves #7783

Problem

The 1ES Pipeline Template checks disk space between every task and emits Free disk space on / is lower than 5% warnings when usage exceeds 95%. Linux agents in the ADO pipeline were generating 71+ such warnings per build.

Solution

Created a reusable disk cleanup step template and inserted it at strategic points in the CI pipeline to keep disk usage below the 95% warning threshold.

Changes

  1. eng/pipelines/templates/steps/cleanup-disk-space.yml (new) — Reusable cleanup template with parameterized targets:

    • CleanDocker: docker system prune -af --volumes
    • CleanGoCache: go clean -cache && go clean -testcache
    • CleanNuGet: dotnet nuget locals all --clear
    • CleanDotNet: Remove temp NuGet/dotnet artifacts
    • CleanSystemCaches: Clean apt cache, apt lists, temp Go build files
    • All steps Linux-only, continueOnError: true, with df -h diagnostics
  2. eng/pipelines/templates/jobs/build-cli.yml — 3 cleanup insertions:

    • After tests: NuGet + .NET temp cleanup
    • Before Linux packages: Go cache + Docker + system caches cleanup (proactive)
    • After Linux packages: Docker cleanup (reactive)
  3. eng/pipelines/templates/jobs/cross-build-cli.yml — 2 cleanup insertions:

    • Before Linux packages: Go cache + Docker + system caches cleanup
    • After Linux packages: Docker cleanup
  4. eng/pipelines/templates/stages/build-and-test.yml — Added DOCKER_TMPDIR, TMPDIR, GOCACHE overrides to LinuxARM64 cross-build matrix (matching BuildCLI_Linux)

Key Design Decisions

  • Pre-build Docker cleanup: The 1ES BuildContainerImage + Copacetic security scan consumes ~4GB. Cleaning Docker images and system caches BEFORE the build keeps disk under the 95% threshold during Copacetic scanning
  • Post-test cleanup excludes Go cache: Go cache is needed for the release build; cleaning it after tests would force a full rebuild
  • Docker cleanup conditioned on BuildLinuxPackages: Only runs when the fpm Docker container is actually built
  • continueOnError: true: Cleanup failures should never block the build

Add a reusable cleanup-disk-space.yml step template and insert it at
strategic points in the Linux CI jobs to address 71 'Free disk space
on / is lower than 5%' warnings.

Cleanup points in build-cli.yml:
- After test run: clean Go cache, NuGet, and .NET temp artifacts
- After release build: clean Go build cache
- After Linux package build: clean Docker images/containers

Cleanup points in cross-build-cli.yml:
- After Linux ARM64 package build: clean Docker and Go cache

All cleanup steps are Linux-only, use continueOnError to avoid breaking
builds, and log disk usage before/after for observability.

Resolves #7783

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a reusable Azure Pipelines step template to reclaim disk space on Linux build agents and wires it into the CLI build job templates to reduce “Free disk space on / is lower than 5%” warnings.

Changes:

  • Added a reusable cleanup-disk-space.yml step template with parameterized cleanup targets (Docker, Go cache, NuGet, .NET temp) and before/after disk usage logging.
  • Inserted cleanup steps into build-cli.yml at multiple points in the job to reduce transient disk pressure.
  • Inserted cleanup steps into cross-build-cli.yml after Linux ARM64 package build.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
eng/pipelines/templates/steps/cleanup-disk-space.yml New reusable cleanup template (Linux-only) for reclaiming disk space via targeted cache/prune steps.
eng/pipelines/templates/jobs/build-cli.yml Adds multiple invocations of the cleanup template during the BuildCLI job.
eng/pipelines/templates/jobs/cross-build-cli.yml Adds a cleanup invocation after the Linux ARM64 package build.

Comment thread eng/pipelines/templates/jobs/build-cli.yml Outdated
Comment thread eng/pipelines/templates/jobs/build-cli.yml
Comment thread eng/pipelines/templates/jobs/cross-build-cli.yml
- Remove CleanGoCache from post-test cleanup to preserve cache for release build
- Add BuildLinuxPackages condition to Docker cleanup in build-cli.yml
- Add BuildLinuxPackages condition to Docker+Go cleanup in cross-build-cli.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread eng/pipelines/templates/steps/cleanup-disk-space.yml Outdated
Comment thread eng/pipelines/templates/jobs/build-cli.yml
- Remove redundant 2>/dev/null || true from cleanup commands; rely on continueOnError
- Update PR description to reflect post-test cleanup no longer includes Go cache

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@vhvb1989 vhvb1989 marked this pull request as ready for review May 21, 2026 03:09
@vhvb1989
Copy link
Copy Markdown
Member Author

/azp run azure-dev - cli

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

📋 Prioritization Note

Thanks for the contribution! The linked issue isn't in the current milestone yet.
Review may take a bit longer — reach out to @rajeshkamal5050 or @kristenwomack if you'd like to discuss prioritization.

Copy link
Copy Markdown
Contributor

@hemarina hemarina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cleanup template is a good targeted way to reduce Linux agent disk pressure, and the latest iteration addresses the earlier cache-timing and condition feedback. The Linux gating, continueOnError, and before/after df -h / logging make the cleanup behavior observable without making cleanup failures block the build.

I found two follow-up items worth addressing or confirming before merge:

  1. eng/pipelines/templates/jobs/build-cli.yml lines 174-178
    The post-test cleanup uses the template default condition of succeeded(). If the test step fails, Publish test results still runs because it uses succeededOrFailed(), but this cleanup step will be skipped. Since reclaiming disk is still useful after failed tests, pass Condition: succeededOrFailed() for this invocation.

  2. The affected Azure Pipelines template path does not appear to have been exercised on this PR. The /azp run azure-dev - cli attempt was excluded by pipeline triggers, so the new compile-time ${{ if }} blocks and composed runtime conditions have not been validated in the target ADO pipeline. Please run the affected pipeline to confirm template expansion for the new cleanup wiring.

- Add Go cache cleanup BEFORE Linux package build (not just after)
- Split cleanup: Go cache before Docker build, Docker after
- Add temp/cache directory overrides (DOCKER_TMPDIR, TMPDIR, GOCACHE)
  to CrossBuildCLI LinuxARM64 matrix entry, matching BuildCLI Linux

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vhvb1989 vhvb1989 requested a review from RickWinter as a code owner May 26, 2026 23:32
Add Docker prune and system cache cleanup (apt, temp files) BEFORE the
build-linux-packages step. The Docker build + Copacetic security scan
consume ~4GB, pushing disk from 92% to 98% and generating 54+ disk space
warnings. By cleaning Docker images and system caches before the build,
we free enough space to stay under the 95% warning threshold.

Changes:
- cleanup-disk-space.yml: Add CleanSystemCaches parameter (apt-get clean,
  remove apt lists/archives, temp Go build files)
- build-cli.yml: Add Docker + system cache cleanup before Linux packages
- cross-build-cli.yml: Add Docker + system cache cleanup before Linux packages

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One item from @hemarina's review isn't addressed in the latest commits: the post-test cleanup step in build-cli.yml defaults to succeeded(), so it won't run after test failures. Disk reclamation is still useful in that scenario for any succeededOrFailed() steps that follow.

@JeffreyCA's approval was submitted against d0b97c9 (before the latest 2 commits). May need re-approval depending on branch protection rules.

The pipeline still hasn't been exercised in ADO - the /azp run attempt was excluded by branch triggers. Worth validating template expansion before merging.

displayName: Publish test results
condition: succeededOrFailed()

- template: /eng/pipelines/templates/steps/cleanup-disk-space.yml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This invocation should pass Condition: succeededOrFailed() so cleanup still runs after test failures. The default condition is succeeded(), meaning disk won't be reclaimed if the test step fails - but subsequent succeededOrFailed() steps (and the agent's next job on self-hosted pools) would benefit from the freed space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chore: add disk space cleanup steps to CI pipeline (Phase 3)

5 participants