TRT-2669: Revert #31112 "NO-JIRA: Remove fixed bugs on CO conditions (2)"#31201
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@smg247: This pull request references TRT-2669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughUpdates legacy CVO monitor test exception mappings for stable, upgrade, and progressing state checks and updates OCPBUGS URL strings used by the scale test's ChangesCVO Monitor Test Operator Exceptions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
#31172 may fix these issues, but do we want to wait for its tests to pass? |
|
My plan is to see which PR passes tests first (unless there is some infra issue requiring us to get this in without full verification) |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go`:
- Around line 361-369: In case "monitoring" the boolean precedence causes
condition.Type == configv1.OperatorAvailable to only apply to the first branch;
wrap the inner OR in parentheses so the type check covers both branches.
Concretely, update the conditional in the switch's monitoring case (the check
using condition.Type, condition.Status, and condition.Reason) so it reads:
condition.Type == configv1.OperatorAvailable && ( (condition.Status ==
configv1.ConditionFalse && (condition.Reason == "PlatformTasksFailed" || ... ||
condition.Reason == "UpdatingPrometheusOperatorFailed")) || (condition.Status ==
configv1.ConditionUnknown && condition.Reason == "UpdatingPrometheusFailed") ).
This ensures the OperatorAvailable guard applies to the
ConditionUnknown/UpdatingPrometheusFailed branch as well.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 52af68d9-aa8e-4be8-858c-2dfd39ce192a
📒 Files selected for processing (2)
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.gotest/extended/machines/scale.go
|
Scheduling required tests: |
|
/test e2e-aws-ovn-serial-1of2 |
|
/lgtm |
|
/override ci/prow/e2e-aws-ovn-serial-2of2 |
|
@smg247: Overrode contexts on behalf of smg247: ci/prow/e2e-aws-ovn-fips, ci/prow/e2e-aws-ovn-microshift, ci/prow/e2e-aws-ovn-microshift-serial, ci/prow/e2e-aws-ovn-serial-2of2 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
f0eee1e to
c2e9eb5
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go (1)
361-370:⚠️ Potential issue | 🟠 Major | ⚡ Quick winOperator type guard is skipped for the
ConditionUnknownbranch due to boolean precedence.The
condition.Type == configv1.OperatorAvailablecheck only applies to the first branch. TheConditionUnknown && UpdatingPrometheusFailedbranch can match non-Availablecondition types and accidentally suppress real failures.🔧 Proposed fix to correct the precedence
case "monitoring": if condition.Type == configv1.OperatorAvailable && - (condition.Status == configv1.ConditionFalse && + ((condition.Status == configv1.ConditionFalse && (condition.Reason == "PlatformTasksFailed" || condition.Reason == "UpdatingAlertmanagerFailed" || condition.Reason == "UpdatingConsolePluginComponentsFailed" || condition.Reason == "UpdatingPrometheusK8SFailed" || condition.Reason == "UpdatingPrometheusOperatorFailed")) || - (condition.Status == configv1.ConditionUnknown && condition.Reason == "UpdatingPrometheusFailed") { + (condition.Status == configv1.ConditionUnknown && condition.Reason == "UpdatingPrometheusFailed")) { return "https://issues.redhat.com/browse/OCPBUGS-23745" }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go` around lines 361 - 370, The boolean precedence causes the ConditionUnknown branch to bypass the operator type guard: in the "monitoring" case ensure the configv1.OperatorAvailable check applies to both branches by grouping the entire status/reason OR-expression with the condition.Type == configv1.OperatorAvailable check (or by repeating the Type check for the ConditionUnknown branch); update the conditional around condition.Type, condition.Status and condition.Reason (including the "UpdatingPrometheusFailed" reason) so only OperatorAvailable conditions are considered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go`:
- Around line 361-370: The boolean precedence causes the ConditionUnknown branch
to bypass the operator type guard: in the "monitoring" case ensure the
configv1.OperatorAvailable check applies to both branches by grouping the entire
status/reason OR-expression with the condition.Type ==
configv1.OperatorAvailable check (or by repeating the Type check for the
ConditionUnknown branch); update the conditional around condition.Type,
condition.Status and condition.Reason (including the "UpdatingPrometheusFailed"
reason) so only OperatorAvailable conditions are considered.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d60aec1d-c70b-43cb-87fe-ffafe22bf1bd
📒 Files selected for processing (2)
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.gotest/extended/machines/scale.go
|
Scheduling required tests: |
|
I compared the changes between https://github.com/openshift/origin/pull/31112/changes and the difference is that the later removes olm/OCPBUGS-62635 but the former does not do it any more because it is removed by https://github.com/openshift/origin/pull/31172/changes#diff-7f3e3f1bd5edd0c5d4b9963c7074fb65d4145dc3ad3ba42190d3bf713f98addcL358 which caused the rebase here. /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hongkailiu, smg247 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-vsphere-ovn-upi |
|
e2e-aws-ovn-serial-2of2 is going to pass, that is the one we were trying to fix. The vsphere failures looks completely unrelated. I am going to override the others to get this in tonight. |
|
/override ci/prow/e2e-vsphere-ovn-upi |
|
@smg247: Overrode contexts on behalf of smg247: ci/prow/e2e-aws-ovn-serial-2of2, ci/prow/e2e-metal-ipi-ovn-ipv6, ci/prow/e2e-vsphere-ovn-upi DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/verified by ci |
|
@smg247: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override ci/prow/unit |
|
@smg247: Overrode contexts on behalf of smg247: ci/prow/images, ci/prow/okd-scos-images, ci/prow/unit, ci/prow/verify DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@smg247: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Reverts #31112 ; tracked by TRT-2669
Per OpenShift policy, we are reverting this breaking change to get CI and/or nightly payloads flowing again.
This PR removed the
image-registryoperator from the exception list in the machine-scaling test's AfterEach assertion (test/extended/machines/scale.go:280). The image-registry operator'snode-caDaemonSet legitimately togglesProgressing=Truefor ~0.5s when nodes are added/removed during scaling. The underlying bug (OCPBUGS-62626) is not actually fixed.Impact: Payload
5.0.0-0.nightly-2026-05-20-101113was Rejected. Theaws-ovn-serial-2of2blocking job failed deterministically on both attempts:To unrevert this, revert this PR, and layer an additional separate commit on top that addresses the problem. Before merging the unrevert, please run these jobs on the PR and check the result of these jobs to confirm the fix has corrected the problem:
e2e-aws-ovn-serial(the job that was broken)CC: @hongkailiu
Summary by CodeRabbit