Skip to content

TRT-2669: Revert #31112 "NO-JIRA: Remove fixed bugs on CO conditions (2)"#31201

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
smg247:revert-31112-17792990653N
May 21, 2026
Merged

TRT-2669: Revert #31112 "NO-JIRA: Remove fixed bugs on CO conditions (2)"#31201
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
smg247:revert-31112-17792990653N

Conversation

@smg247
Copy link
Copy Markdown
Member

@smg247 smg247 commented May 20, 2026

Reverts #31112 ; tracked by TRT-2669

Per OpenShift policy, we are reverting this breaking change to get CI and/or nightly payloads flowing again.

This PR removed the image-registry operator from the exception list in the machine-scaling test's AfterEach assertion (test/extended/machines/scale.go:280). The image-registry operator's node-ca DaemonSet legitimately toggles Progressing=True for ~0.5s when nodes are added/removed during scaling. The underlying bug (OCPBUGS-62626) is not actually fixed.

Impact: Payload 5.0.0-0.nightly-2026-05-20-101113 was Rejected. The aws-ovn-serial-2of2 blocking job failed deterministically on both attempts:

To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem:

  • e2e-aws-ovn-serial (the job that was broken)

CC: @hongkailiu

Summary by CodeRabbit

  • Tests
    • Enhanced cluster-operator monitoring exceptions to better tolerate expected states during stable and upgrade flows (network, monitoring, image-registry, node-tuning, olm, storage)
    • Added specific non-fatal/progressing allowances for network Degraded, monitoring update failures, image-registry progressing reasons, and node-tuning progressing states
    • Updated diagnostic issue references to OCPBUGS-62623/62626/62630/62632/62633 for clearer troubleshooting

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 20, 2026

@smg247: This pull request references TRT-2669 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Reverts #31112 ; tracked by TRT-2669

Per OpenShift policy, we are reverting this breaking change to get CI and/or nightly payloads flowing again.

This PR removed the image-registry operator from the exception list in the machine-scaling test's AfterEach assertion (test/extended/machines/scale.go:280). The image-registry operator's node-ca DaemonSet legitimately toggles Progressing=True for ~0.5s when nodes are added/removed during scaling. The underlying bug (OCPBUGS-62626) is not actually fixed.

Impact: Payload 5.0.0-0.nightly-2026-05-20-101113 was Rejected. The aws-ovn-serial-2of2 blocking job failed deterministically on both attempts:

To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem:

  • e2e-aws-ovn-serial (the job that was broken)

CC: @hongkailiu

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Walkthrough

Updates legacy CVO monitor test exception mappings for stable, upgrade, and progressing state checks and updates OCPBUGS URL strings used by the scale test's except helper for several operators.

Changes

CVO Monitor Test Operator Exceptions

Layer / File(s) Summary
Stable-system operator exceptions
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
Adds network operator exception for Degraded=True in stable-system state transition checks.
Upgrade-state operator exceptions
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
Adds monitoring operator exception for Available=False and Unknown with specific Prometheus/alertmanager/console/plugins update failure reasons during upgrade checks.
Progressing-state operator exceptions
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
Adds image-registry progressing reasons (NodeCADaemonUnavailable::Ready, DeploymentNotCompleted), updates network progressing reasons mapping, and adds node-tuning progressing reasons (Reconciling, ProfileProgressing).
Test exemption URL mappings
test/extended/machines/scale.go
Updates bug-tracker URLs in the scale test helper's except function for dns, image-registry, network, node-tuning, and storage operators to new OCPBUGS-* identifiers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • openshift/origin#31112: Modifies the same CVO operator exception mappings and the except() exemptions in test/extended/machines/scale.go.

Suggested labels

lgtm

Suggested reviewers

  • sjenning
  • p0lyn0mial
  • deads2k
  • petr-muller
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: reverting a previous commit (#31112) that had removed operator exceptions. The revert restores exception mappings for the image-registry and other operators.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test names in scale.go are static: "grow and decrease when scaling..." with no dynamic content, variables, or interpolation in test declarations.
Test Structure And Quality ✅ Passed Test exhibits proper Ginkgo patterns: single responsibility, BeforeEach/AfterEach cleanup, timeouts on Eventually, meaningful assertions. PR changes only restore operator exception mappings.
Microshift Test Compatibility ✅ Passed The test is protected via [apigroup:machine.openshift.io] tag; MicroShift CI automatically skips tests with unavailable apigroups.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The changes only update exception mappings in existing test code and operator monitoring logic, reverting previous deletions.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies test/monitoring code (operators.go and scale.go), not deployment manifests, operator code, or controllers. No scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed No stdout writes in process-level code; fmt.Sprintf calls build strings, except functions nested in test blocks, no main/init/TestMain functions.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests added. PR reverts previous changes to exception mappings in existing test structures only.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented May 20, 2026

#31172 may fix these issues, but do we want to wait for its tests to pass?

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 20, 2026

My plan is to see which PR passes tests first (unless there is some infra issue requiring us to get this in without full verification)

@openshift-ci openshift-ci Bot requested review from p0lyn0mial and sjenning May 20, 2026 17:48
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go`:
- Around line 361-369: In case "monitoring" the boolean precedence causes
condition.Type == configv1.OperatorAvailable to only apply to the first branch;
wrap the inner OR in parentheses so the type check covers both branches.
Concretely, update the conditional in the switch's monitoring case (the check
using condition.Type, condition.Status, and condition.Reason) so it reads:
condition.Type == configv1.OperatorAvailable && ( (condition.Status ==
configv1.ConditionFalse && (condition.Reason == "PlatformTasksFailed" || ... ||
condition.Reason == "UpdatingPrometheusOperatorFailed")) || (condition.Status ==
configv1.ConditionUnknown && condition.Reason == "UpdatingPrometheusFailed") ).
This ensures the OperatorAvailable guard applies to the
ConditionUnknown/UpdatingPrometheusFailed branch as well.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 52af68d9-aa8e-4be8-858c-2dfd39ce192a

📥 Commits

Reviewing files that changed from the base of the PR and between 428b9a0 and f0eee1e.

📒 Files selected for processing (2)
  • pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
  • test/extended/machines/scale.go

@openshift-ci openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 20, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 20, 2026

/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2

@hongkailiu
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2026
@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 20, 2026

/override ci/prow/e2e-aws-ovn-serial-2of2
/override ci/prow/e2e-aws-ovn-fips
/override ci/prow/e2e-aws-ovn-microshift
/override ci/prow/e2e-aws-ovn-microshift-serial

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@smg247: Overrode contexts on behalf of smg247: ci/prow/e2e-aws-ovn-fips, ci/prow/e2e-aws-ovn-microshift, ci/prow/e2e-aws-ovn-microshift-serial, ci/prow/e2e-aws-ovn-serial-2of2

Details

In response to this:

/override ci/prow/e2e-aws-ovn-serial-2of2
/override ci/prow/e2e-aws-ovn-fips
/override ci/prow/e2e-aws-ovn-microshift
/override ci/prow/e2e-aws-ovn-microshift-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

…ed-CO-bugs-b"

This reverts commit 428b9a0, reversing
changes made to 4486518.
@smg247 smg247 force-pushed the revert-31112-17792990653N branch from f0eee1e to c2e9eb5 Compare May 20, 2026 22:03
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 20, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go (1)

361-370: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Operator type guard is skipped for the ConditionUnknown branch due to boolean precedence.

The condition.Type == configv1.OperatorAvailable check only applies to the first branch. The ConditionUnknown && UpdatingPrometheusFailed branch can match non-Available condition types and accidentally suppress real failures.

🔧 Proposed fix to correct the precedence
 		case "monitoring":
 			if condition.Type == configv1.OperatorAvailable &&
-				(condition.Status == configv1.ConditionFalse &&
+				((condition.Status == configv1.ConditionFalse &&
 					(condition.Reason == "PlatformTasksFailed" ||
 						condition.Reason == "UpdatingAlertmanagerFailed" ||
 						condition.Reason == "UpdatingConsolePluginComponentsFailed" ||
 						condition.Reason == "UpdatingPrometheusK8SFailed" ||
 						condition.Reason == "UpdatingPrometheusOperatorFailed")) ||
-				(condition.Status == configv1.ConditionUnknown && condition.Reason == "UpdatingPrometheusFailed") {
+					(condition.Status == configv1.ConditionUnknown && condition.Reason == "UpdatingPrometheusFailed")) {
 				return "https://issues.redhat.com/browse/OCPBUGS-23745"
 			}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go`
around lines 361 - 370, The boolean precedence causes the ConditionUnknown
branch to bypass the operator type guard: in the "monitoring" case ensure the
configv1.OperatorAvailable check applies to both branches by grouping the entire
status/reason OR-expression with the condition.Type ==
configv1.OperatorAvailable check (or by repeating the Type check for the
ConditionUnknown branch); update the conditional around condition.Type,
condition.Status and condition.Reason (including the "UpdatingPrometheusFailed"
reason) so only OperatorAvailable conditions are considered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go`:
- Around line 361-370: The boolean precedence causes the ConditionUnknown branch
to bypass the operator type guard: in the "monitoring" case ensure the
configv1.OperatorAvailable check applies to both branches by grouping the entire
status/reason OR-expression with the condition.Type ==
configv1.OperatorAvailable check (or by repeating the Type check for the
ConditionUnknown branch); update the conditional around condition.Type,
condition.Status and condition.Reason (including the "UpdatingPrometheusFailed"
reason) so only OperatorAvailable conditions are considered.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d60aec1d-c70b-43cb-87fe-ffafe22bf1bd

📥 Commits

Reviewing files that changed from the base of the PR and between f0eee1e and c2e9eb5.

📒 Files selected for processing (2)
  • pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
  • test/extended/machines/scale.go

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@hongkailiu
Copy link
Copy Markdown
Member

I compared the changes between https://github.com/openshift/origin/pull/31112/changes and the difference is that the later removes olm/OCPBUGS-62635 but the former does not do it any more because it is removed by https://github.com/openshift/origin/pull/31172/changes#diff-7f3e3f1bd5edd0c5d4b9963c7074fb65d4145dc3ad3ba42190d3bf713f98addcL358 which caused the rebase here.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongkailiu, smg247

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tmshort
Copy link
Copy Markdown
Contributor

tmshort commented May 21, 2026

/test e2e-vsphere-ovn-upi

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 21, 2026

e2e-aws-ovn-serial-2of2 is going to pass, that is the one we were trying to fix. The vsphere failures looks completely unrelated. I am going to override the others to get this in tonight.

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 21, 2026

/override ci/prow/e2e-vsphere-ovn-upi
/override ci/prow/e2e-metal-ipi-ovn-ipv6
/override ci/prow/e2e-aws-ovn-serial-2of2

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@smg247: Overrode contexts on behalf of smg247: ci/prow/e2e-aws-ovn-serial-2of2, ci/prow/e2e-metal-ipi-ovn-ipv6, ci/prow/e2e-vsphere-ovn-upi

Details

In response to this:

/override ci/prow/e2e-vsphere-ovn-upi
/override ci/prow/e2e-metal-ipi-ovn-ipv6
/override ci/prow/e2e-aws-ovn-serial-2of2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 21, 2026

/verified by ci

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 21, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@smg247: This PR has been marked as verified by ci.

Details

In response to this:

/verified by ci

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 21, 2026

/override ci/prow/unit
/override ci/prow/verify
/override ci/prow/images
/override ci/prow/okd-scos-images

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@smg247: Overrode contexts on behalf of smg247: ci/prow/images, ci/prow/okd-scos-images, ci/prow/unit, ci/prow/verify

Details

In response to this:

/override ci/prow/unit
/override ci/prow/verify
/override ci/prow/images
/override ci/prow/okd-scos-images

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot Bot merged commit 00c4cba into openshift:main May 21, 2026
21 checks passed
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 21, 2026

@smg247: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants