Skip to content

OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions to SNO#31172

Open
tmshort wants to merge 1 commit into
openshift:mainfrom
tmshort:fix-OPRUN-4569
Open

OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions to SNO#31172
tmshort wants to merge 1 commit into
openshift:mainfrom
tmshort:fix-OPRUN-4569

Conversation

@tmshort
Copy link
Copy Markdown
Contributor

@tmshort tmshort commented May 13, 2026

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:

  • PR openshift/cluster-olm-operator#202: 2 replicas + PDB on HA topology prevents Available=False and spurious Progressing=True during rolling updates (OCPBUGS-62517, OCPBUGS-62635)

Remove the OCPBUGS-62517 exception for olm Available=False entirely. The testUpgradeOperatorStateTransitions function already has a blanket SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to testUpgradeOperatorProgressingStateTransitions so it can detect topology, then scope both remaining exceptions to SNO only:

  • OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica PDB fix prevents this; on SNO there is still 1 replica and the node reboot restarts all pods simultaneously.
  • OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods and returns RetryableError. On SNO the OS-level reboot kills all pods at once so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally kept; OLMv0 is in maintenance mode.

Assisted-by: claude

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced operator state transition validation to account for cluster topology in upgrade scenarios.
    • Refined exception handling for OLM operator state transitions to apply only to single-node deployments where applicable.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 13, 2026

@tmshort: This pull request references OPRUN-4569 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

… to SNO

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:

Remove the OCPBUGS-62517 exception for olm Available=False entirely. The testUpgradeOperatorStateTransitions function already has a blanket SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to testUpgradeOperatorProgressingStateTransitions so it can detect topology, then scope both remaining exceptions to SNO only:

  • OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica PDB fix prevents this; on SNO there is still 1 replica and the node reboot restarts all pods simultaneously.
  • OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods and returns RetryableError. On SNO the OS-level reboot kills all pods at once so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally kept; OLMv0 is in maintenance mode.

Assisted-by: claude

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tmshort tmshort changed the title OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions… OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions to SNO May 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5d92b20a-bc21-4435-a43f-a64300ad37e6

📥 Commits

Reviewing files that changed from the base of the PR and between ddb8197 and 7f86353.

📒 Files selected for processing (1)
  • pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go

Walkthrough

This PR refactors operator upgrade test exception handling in the cluster version operator's monitor tests. It removes an exception for the olm operator in state-transition tests and conditions two progressing-state exceptions (olm and operator-lifecycle-manager-packageserver) to apply only on single-node deployments by introducing control-plane topology detection.

Changes

Operator test exception scoping

Layer / File(s) Summary
Remove olm exception from state transition test
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
The olm operator exception for ControllerManager_Deploying reason (OCPBUGS-62517) is removed from testUpgradeOperatorStateTransitions, eliminating the special case handling for OperatorAvailable=False with that reason suffix.
Add topology detection and scope progressing exceptions
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
testUpgradeOperatorProgressingStateTransitions now calls getControlPlaneTopology with clientConfig to derive an isSingleNode flag, and gates the olm (ControllerManager_Deploying, OCPBUGS-62635) and operator-lifecycle-manager-packageserver (empty reason, OCPBUGS-63672) exceptions to apply only when isSingleNode is true.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • openshift/origin#31171: Both PRs modify operator exception handling in testUpgradeOperatorStateTransitions and testUpgradeOperatorProgressingStateTransitions, with this PR tightening exceptions to single-node deployments and removing the state-transition exception.
  • openshift/origin#31138: Both PRs update testUpgradeOperatorProgressingStateTransitions to scope exception handling based on control-plane topology using clientConfig and the isSingleNode flag.

Suggested labels

ok-to-test

Suggested reviewers

  • sjenning
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main changes: removing OLMv1 exceptions and scoping OLMv0 exceptions to single-node deployments, which aligns with the core modifications in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names use stable, deterministic values from predefined enumerations (KnownOperators, condition types, Bugzilla mappings). No dynamic information that changes between runs is present.
Test Structure And Quality ✅ Passed Test code properly structured: removes OLMv1 exception, adds topology detection, scopes SNO exceptions appropriately. Error handling with messages present. Follows repository patterns.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added. The PR modifies monitor test utility functions returning []*junitapi.JUnitTestCase, which are part of the MonitorTest framework, not Ginkgo specs.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests (It/Describe/Context/When). It only modifies utility functions for monitoring tests. The custom check applies to new e2e tests only. No SNO issues found.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies test code only, not deployment manifests or operator controllers. No scheduling constraints introduced. Changes improve topology awareness by scoping exceptions to detected topology.
Ote Binary Stdout Contract ✅ Passed File is a test library, not an OTE binary. No process-level code or stdout writes present. Logrus calls write to stderr by default.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The modified file contains monitoring test library functions only, not Ginkgo tests. Check is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from p0lyn0mial and sjenning May 13, 2026 15:44
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tmshort
Once this PR has been reviewed and has the lgtm label, please assign jogeo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate aggregated-aws-ovn-single-node-upgrade-5.0-micro 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 13, 2026
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/test e2e-aws-ovn-single-node

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/804bfe50-4ee4-11f1-9c63-4ffc4aa22055-0

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/test e2e-aws-ovn-single-node

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate periodic-ci-openshift-release-master-aggregated-aws-ovn-single-node-upgrade-5.0-micro 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate periodic-ci-openshift-release-main-aws-ovn-single-node-upgrade-5.0-micro 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

/test e2e-gcp-csi
/test e2e-metal-ip-ovn-ipv6

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

/test e2e-metal-ipi-ovn-ipv6

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/804bfe50-4ee4-11f1-9c63-4ffc4aa22055-0

These tests passed!

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

/test e2e-gcp-csi

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 14, 2026

@tmshort: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-csi ddb8197 link true /test e2e-gcp-csi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 15, 2026

Job Failure Risk Analysis for sha: ddb8197

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-csi Medium
[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should not change condition/Available
This test has passed 97.81% of 6952 runs on release 5.0 [Overall] in the last week.
---
verify the cluster readiness and stability
This test has passed 95.17% of 6026 runs on release 5.0 [Overall] in the last week.
---
verify operator conditions insights
This test has passed 97.24% of 6021 runs on release 5.0 [Overall] in the last week.
---
[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should not change condition/Degraded
This test has passed 97.81% of 6952 runs on release 5.0 [Overall] in the last week.

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 15, 2026

/test e2e-gcp-csi

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2026
… to SNO

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:
- PR openshift#202: 2 replicas + PDB on HA topology prevents Available=False and
  spurious Progressing=True during rolling updates (OCPBUGS-62517, OCPBUGS-62635)

Remove the OCPBUGS-62517 exception for olm Available=False entirely.
The testUpgradeOperatorStateTransitions function already has a blanket
SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to
testUpgradeOperatorProgressingStateTransitions so it can detect topology,
then scope both remaining exceptions to SNO only:

- OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica
  PDB fix prevents this; on SNO there is still 1 replica and the node reboot
  restarts all pods simultaneously.
- OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on
  empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods
  and returns RetryableError. On SNO the OS-level reboot kills all pods at once
  so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally
kept; OLMv0 is in maintenance mode.

Assisted-by: claude
Signed-off-by: Todd Short <todd.short@me.com>
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2026
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 15, 2026

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 15, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cb069f00-505f-11f1-853c-368c8b1d8d9a-0

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants