Fix Azure deprovisioner deleting DNS records for v2 self-managed clusters#79984
Conversation
…ters
The deprovisioner's is_ci_rg() function did not recognize v2 self-managed
guest cluster resource groups (e.g., public-ea334332a2-public-ea334332a2-dx4l4).
These RGs use a {type}-{10hex}-{type}-{hex}-{suffix} naming convention instead
of the {20hex}-{hex} pattern used by legacy guest clusters.
Because these RGs were not recognized, Phase 3 (DNS sweep) treated their DNS
records as orphaned and deleted them — including *.apps wildcard A records —
while the clusters were still running. This caused all hosted clusters to fail
with console operator DNS resolution errors.
Changes:
- Add v2 guest cluster RG pattern to is_ci_rg()
- Extract the 10-hex job hash as the active prefix for both management and
v2 guest RGs, so DNS records containing that hash are protected
- Log active prefixes for easier debugging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR enhances the Azure deprovision script to recognize and process a new V2 self-managed guest resource group naming scheme. The RG detection function is updated, the Phase 3 DNS sweep prefix extraction is reworked to handle multiple RG patterns explicitly, and the output reporting is adjusted to display prefix lists. ChangesAzure Deprovision Logic Update
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 15✅ Passed checks (15 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[REHEARSALNOTIFIER]
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals. Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, csrwng The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse skip |
|
@bryan-cox: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@bryan-cox: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…ters (openshift#79984) The deprovisioner's is_ci_rg() function did not recognize v2 self-managed guest cluster resource groups (e.g., public-ea334332a2-public-ea334332a2-dx4l4). These RGs use a {type}-{10hex}-{type}-{hex}-{suffix} naming convention instead of the {20hex}-{hex} pattern used by legacy guest clusters. Because these RGs were not recognized, Phase 3 (DNS sweep) treated their DNS records as orphaned and deleted them — including *.apps wildcard A records — while the clusters were still running. This caused all hosted clusters to fail with console operator DNS resolution errors. Changes: - Add v2 guest cluster RG pattern to is_ci_rg() - Extract the 10-hex job hash as the active prefix for both management and v2 guest RGs, so DNS records containing that hash are protected - Log active prefixes for easier debugging Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
is_ci_rg()function did not recognize v2 self-managed guest cluster resource groups (e.g.,public-ea334332a2-public-ea334332a2-dx4l4), causing Phase 3 (DNS sweep) to delete their*.appswildcard DNS records while clusters were still running*.apps.{public,private,upgrade,autoscaling,oauth-lb}-ea334332a2records during an in-flight v2 self-managed rehearsalis_ci_rg()and fixes prefix extraction to use the 10-hex job hash, protecting all DNS records for active jobsTest plan
periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-azure-v2-self-managedon PR CNTRLPLANE-3206: add periodic e2e job for self-managed Azure v2 #79876 to confirm DNS records survive🤖 Generated with Claude Code
Summary by CodeRabbit
This PR adds a new Azure HyperShift deprovisioning script that fixes a critical bug where the deprovisioner incorrectly deletes active DNS records for v2 self-managed clusters.
Background
The deprovisioner is a periodic job that runs in the OpenShift CI infrastructure to clean up stale resources. It operates in three phases:
The Problem
v2 self-managed guest cluster resource groups use a different naming convention (
{type}-{10hex}-{type}-{hex}-{suffix}, e.g.,public-ea334332a2-public-ea334332a2-dx4l4) compared to legacy clusters and management clusters. The previousis_ci_rg()function didn't recognize this pattern, causing Phase 3 to treat v2 cluster DNS records as orphaned and delete them—including critical wildcard*.appsrecords—while the clusters were still running. This broke hosted cluster DNS resolution.The Fix
Updated
is_ci_rg()function: Added regex pattern to recognize v2 self-managed guest cluster RGs alongside existing management and legacy guest patterns.Improved Phase 3 prefix extraction logic: Instead of a generic string manipulation approach, the script now uses pattern-specific extraction:
-mgmt--([0-9a-f]{10})-segment using regex capture groupsEnhanced logging: The script now prints the list of active cluster prefixes found during Phase 3, making it easier to debug and verify correct identification of active clusters.
Affected Infrastructure
This change directly impacts the Azure HyperShift test infrastructure deprovisioning, specifically the periodic cleanup job for v2 self-managed hosted clusters. It ensures DNS records for active jobs are protected during the cleanup process.