test(aci): disable Test_ACI to cut corerp-cloud CI time (~2x)#12247
test(aci): disable Test_ACI to cut corerp-cloud CI time (~2x)#12247sylvainsf wants to merge 1 commit into
Conversation
Test_ACI provisions real Azure Container Instances (plus a VNet/NSG/ILB) and is wrapped in a 2x retry with 60s backoff to tolerate the subscription-shared 'StandardCores' ACI quota (ContainerGroupQuotaReached). When the quota is exhausted by concurrent CI runs the whole deploy is retried end-to-end, so a single run takes ~11-12 minutes -- more than half of the entire corerp-cloud functional leg's wall-clock time, versus under a minute for every other test in the leg. Skip the test until the ACI tests can be isolated onto their own quota-aware lane. The fast Test_isTransientAzureError unit test in the same file remains enabled. See #12044 and #12163. Signed-off-by: Sylvain Niles <sylvainniles@microsoft.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Radius functional test overviewClick here to see the test run details
Test Status⌛ Building Radius and pushing container images for functional tests... |
There was a problem hiding this comment.
Pull request overview
This PR reduces functional-test-cloud runtime by disabling the Test_ACI cloud functional test, which provisions real Azure Container Instances and can hit subscription-shared quota contention (triggering end-to-end deploy retries). This is a targeted CI-time optimization for the corerp-cloud test leg and aligns with the stated follow-up plan to re-enable once ACI tests are isolated or quota is increased.
Changes:
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12247 +/- ##
==========================================
- Coverage 52.88% 52.88% -0.01%
==========================================
Files 751 751
Lines 48353 48353
==========================================
- Hits 25573 25570 -3
- Misses 20383 20385 +2
- Partials 2397 2398 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@sylvainsf yesterday, I fixed auto-purge for tests, so quota should not be the issue anymore. Your call :) |
Description
Test_ACIis the single biggest contributor tocorerp-cloudfunctional-test wall time, and disabling it roughly halves the cloud functional workflow's duration.It deploys real Azure Container Instances (plus a supporting VNet, NSG, and internal load balancer) via the built-in
compute.kind: 'aci'path, then independently re-reads those Azure resources inPostStepVerify. Because the subscription-sharedStandardCoresACI quota (ContainerGroupQuotaReached) is frequently exhausted by concurrent CI runs, the test is wrapped in a 2x retry with 60s backoff — so on quota contention the entire app deploy is retried end-to-end.Timing impact (measured)
Per-test durations from a recent
corerp-cloudrun (junit results):Test_ACITest_AWS_LogsLogGroup_ExistingTest_AWSRedeployWithUpdatedResourceUpdatesResourceTest_AWS_MultiIdentifier_ResourceTest_TerraformRecipe_AzureResourceGroupTest_ACIalone accounts for more than half of thecorerp-cloudleg's wall-clock time.corerp-cloudleg is the long pole of thefunctional-test-cloudworkflow (Build Radius for test~8m40s runs first, then the cloud legs).mainfor weeks, and that entire swing tracksTest_ACI: when ACI hits quota and burns its retries it runs ~36m+ and drags the whole workflow toward ~42m; when ACI is fast (~11m) the workflow lands near ~27m.In short, this one test roughly doubles the cloud functional run on bad-quota days while every other test finishes in under a minute.
What changed
t.Skip(...)at the top ofTest_ACIwith a comment explaining the cost and the conditions for re-enabling.Test_isTransientAzureErrorunit test in the same file is left enabled — it has no deploy and runs in milliseconds.Follow-up
Re-enable once the ACI tests are isolated onto their own quota-aware lane (so an exhausted
StandardCoresquota can't serialize in front of the ~26 sub-minute tests), and/or the CI subscription's container-group quota is raised. Related: #12044 (Test_ACI cleanup timeout flake) and #12163 (restructure the test matrix into functional vs integration/E2E).Type of change
Contributor checklist
eng/design-notes/, if new APIs are being introduced.