Skip to content

test(aci): disable Test_ACI to cut corerp-cloud CI time (~2x)#12247

Open
sylvainsf wants to merge 1 commit into
mainfrom
disable-aci-functional-test
Open

test(aci): disable Test_ACI to cut corerp-cloud CI time (~2x)#12247
sylvainsf wants to merge 1 commit into
mainfrom
disable-aci-functional-test

Conversation

@sylvainsf

Copy link
Copy Markdown
Contributor

Description

Test_ACI is the single biggest contributor to corerp-cloud functional-test wall time, and disabling it roughly halves the cloud functional workflow's duration.

It deploys real Azure Container Instances (plus a supporting VNet, NSG, and internal load balancer) via the built-in compute.kind: 'aci' path, then independently re-reads those Azure resources in PostStepVerify. Because the subscription-shared StandardCores ACI quota (ContainerGroupQuotaReached) is frequently exhausted by concurrent CI runs, the test is wrapped in a 2x retry with 60s backoff — so on quota contention the entire app deploy is retried end-to-end.

Timing impact (measured)

Per-test durations from a recent corerp-cloud run (junit results):

Test Duration
Test_ACI ~11m36s
Test_AWS_LogsLogGroup_Existing 0m52s
Test_AWSRedeployWithUpdatedResourceUpdatesResource 0m51s
Test_AWS_MultiIdentifier_Resource 0m44s
Test_TerraformRecipe_AzureResourceGroup 0m44s
every other test in the leg < 1m each
  • Test_ACI alone accounts for more than half of the corerp-cloud leg's wall-clock time.
  • The corerp-cloud leg is the long pole of the functional-test-cloud workflow (Build Radius for test ~8m40s runs first, then the cloud legs).
  • The cloud workflow has been swinging ~27 → 42 minutes on main for weeks, and that entire swing tracks Test_ACI: when ACI hits quota and burns its retries it runs ~36m+ and drags the whole workflow toward ~42m; when ACI is fast (~11m) the workflow lands near ~27m.

In short, this one test roughly doubles the cloud functional run on bad-quota days while every other test finishes in under a minute.

What changed

  • Add t.Skip(...) at the top of Test_ACI with a comment explaining the cost and the conditions for re-enabling.
  • The fast Test_isTransientAzureError unit test in the same file is left enabled — it has no deploy and runs in milliseconds.

Follow-up

Re-enable once the ACI tests are isolated onto their own quota-aware lane (so an exhausted StandardCores quota can't serialize in front of the ~26 sub-minute tests), and/or the CI subscription's container-group quota is raised. Related: #12044 (Test_ACI cleanup timeout flake) and #12163 (restructure the test matrix into functional vs integration/E2E).

Type of change

  • This pull request is a minor refactor, code cleanup, test improvement, or other maintenance task and doesn't change the functionality of Radius.

Contributor checklist

  • An overview of proposed schema changes is included in a linked GitHub issue.
    • Not applicable
  • A design document is added or updated under eng/design-notes/, if new APIs are being introduced.
    • Not applicable
  • The design document has been reviewed and approved by Radius maintainers/approvers.
    • Not applicable
  • A PR for resource-types-contrib is created, if resource types or recipes are affected.
    • Not applicable
  • A PR for dashboard is created, if the Radius Dashboard is affected.
    • Not applicable
  • A PR for the documentation repository is created, if the changes affect documentation or any user-facing behavior.
    • Not applicable

Test_ACI provisions real Azure Container Instances (plus a VNet/NSG/ILB)
and is wrapped in a 2x retry with 60s backoff to tolerate the
subscription-shared 'StandardCores' ACI quota (ContainerGroupQuotaReached).
When the quota is exhausted by concurrent CI runs the whole deploy is
retried end-to-end, so a single run takes ~11-12 minutes -- more than half
of the entire corerp-cloud functional leg's wall-clock time, versus under
a minute for every other test in the leg.

Skip the test until the ACI tests can be isolated onto their own
quota-aware lane. The fast Test_isTransientAzureError unit test in the same
file remains enabled.

See #12044 and #12163.

Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
Copilot AI review requested due to automatic review settings June 25, 2026 06:44
@sylvainsf sylvainsf requested review from a team as code owners June 25, 2026 06:44
@github-actions

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@radius-functional-tests

radius-functional-tests Bot commented Jun 25, 2026

Copy link
Copy Markdown

Radius functional test overview

🔍 Go to test action run

Click here to see the test run details
Name Value
Repository radius-project/radius
Commit ref 435cd10
Unique ID func2b378aa135
Image tag pr-func2b378aa135
  • KinD: v0.29.0
  • Dapr: 1.14.4
  • Azure KeyVault CSI driver: 1.4.2
  • Azure Workload identity webhook: 1.3.0
  • Bicep recipe location ghcr.io/radius-project/dev/test/testrecipes/test-bicep-recipes/<name>:pr-func2b378aa135
  • Terraform recipe location http://tf-module-server.radius-test-tf-module-server.svc.cluster.local/<name>.zip (in cluster)
  • applications-rp test image location: ghcr.io/radius-project/dev/applications-rp:pr-func2b378aa135
  • dynamic-rp test image location: ghcr.io/radius-project/dev/dynamic-rp:pr-func2b378aa135
  • controller test image location: ghcr.io/radius-project/dev/controller:pr-func2b378aa135
  • ucp test image location: ghcr.io/radius-project/dev/ucpd:pr-func2b378aa135
  • deployment-engine test image location: ghcr.io/radius-project/deployment-engine:latest

Test Status

⌛ Building Radius and pushing container images for functional tests...
✅ Container images build succeeded
⌛ Publishing Bicep Recipes for functional tests...
✅ Recipe publishing succeeded
⌛ Starting ucp-cloud functional tests...
⌛ Starting corerp-cloud functional tests...
✅ ucp-cloud functional tests succeeded
✅ corerp-cloud functional tests succeeded

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces functional-test-cloud runtime by disabling the Test_ACI cloud functional test, which provisions real Azure Container Instances and can hit subscription-shared quota contention (triggering end-to-end deploy retries). This is a targeted CI-time optimization for the corerp-cloud test leg and aligns with the stated follow-up plan to re-enable once ACI tests are isolated or quota is increased.

Changes:

  • Add an unconditional t.Skip(...) at the start of Test_ACI with a detailed rationale and links to tracking issues (#12044, #12163).
  • Leave the rest of the file (including non-deploy unit coverage like transient-error classification helpers) unchanged/available for future re-enable.

@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 52.88%. Comparing base (2626297) to head (435cd10).

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12247      +/-   ##
==========================================
- Coverage   52.88%   52.88%   -0.01%     
==========================================
  Files         751      751              
  Lines       48353    48353              
==========================================
- Hits        25573    25570       -3     
- Misses      20383    20385       +2     
- Partials     2397     2398       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

Unit Tests

    2 files  ±0    450 suites  ±0   7m 45s ⏱️ +8s
5 591 tests ±0  5 589 ✅ ±0  2 💤 ±0  0 ❌ ±0 
6 788 runs  ±0  6 786 ✅ ±0  2 💤 ±0  0 ❌ ±0 

Results for commit 435cd10. ± Comparison against base commit 2626297.

@DariuszPorowski

Copy link
Copy Markdown
Member

@sylvainsf yesterday, I fixed auto-purge for tests, so quota should not be the issue anymore. Your call :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants