Skip to content

OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions to SNO#31172

Merged
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
tmshort:fix-OPRUN-4569
May 20, 2026
Merged

OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions to SNO#31172
openshift-merge-bot[bot] merged 1 commit into
openshift:mainfrom
tmshort:fix-OPRUN-4569

Conversation

@tmshort
Copy link
Copy Markdown
Contributor

@tmshort tmshort commented May 13, 2026

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:

Remove the OCPBUGS-62517 exception for olm Available=False entirely. The testUpgradeOperatorStateTransitions function already has a blanket SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to testUpgradeOperatorProgressingStateTransitions so it can detect topology, then scope both remaining exceptions to SNO only:

  • OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica PDB fix prevents this; on SNO there is still 1 replica and the node reboot restarts all pods simultaneously. This had been removed, but is now restored since the issue still occurs under SNO.
  • OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods and returns RetryableError. On SNO the OS-level reboot kills all pods at once so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally kept; OLMv0 is in maintenance mode.

Assisted-by: claude

Summary by CodeRabbit

  • Bug Fixes
    • Operator state transition checks during upgrades now consider control-plane topology.
    • Progressing-state exceptions (including OLM and package server cases) are now limited to single-node deployments where appropriate, removing prior unconditional exceptions.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 13, 2026

@tmshort: This pull request references OPRUN-4569 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

… to SNO

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:

Remove the OCPBUGS-62517 exception for olm Available=False entirely. The testUpgradeOperatorStateTransitions function already has a blanket SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to testUpgradeOperatorProgressingStateTransitions so it can detect topology, then scope both remaining exceptions to SNO only:

  • OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica PDB fix prevents this; on SNO there is still 1 replica and the node reboot restarts all pods simultaneously.
  • OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods and returns RetryableError. On SNO the OS-level reboot kills all pods at once so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally kept; OLMv0 is in maintenance mode.

Assisted-by: claude

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tmshort tmshort changed the title OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions… OPRUN-4569: test: remove OLMv1 OTE exceptions; scope OLMv0 exceptions to SNO May 13, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Walkthrough

Removes an olm exception in state-transition tests and updates progressing-state tests to detect control-plane topology, using an isSingleNode flag to restrict progressing exceptions for olm and operator-lifecycle-manager-packageserver to single-node clusters.

Changes

Operator test exception scoping

Layer / File(s) Summary
Remove olm exception from state transition test
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
The olm operator exception for ControllerManager_Deploying reason (OCPBUGS-62517) is removed from testUpgradeOperatorStateTransitions.
Add topology detection in progressing test
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
testUpgradeOperatorProgressingStateTransitions now calls getControlPlaneTopology with clientConfig to derive isSingleNode/isTwoNode and updates the warning message about topology exceptions.
Gate progressing exceptions to single-node
pkg/monitortests/clusterversionoperator/legacycvomonitortests/operators.go
The progressing-state exceptions for olm (Reason suffix ControllerManager_DeployingOCPBUGS-62635) and operator-lifecycle-manager-packageserver (empty reason → OCPBUGS-63672) are now returned only when isSingleNode is true.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • openshift/origin#31112: Modifies the same operator exception mappings in the cluster version operator monitor tests.

Suggested labels

lgtm, verified

Suggested reviewers

  • sjenning
🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: removing OLMv1 exceptions and scoping OLMv0 exceptions to single-node deployments, matching the core modifications in the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names use static values from predefined operator lists and enums. No dynamic information like pod names, timestamps, UUIDs, nodes, or IPs are present.
Test Structure And Quality ✅ Passed Modified code contains helper functions generating JUnit test cases, not Ginkgo test code (no Describe/Context/It blocks). Check is not applicable.
Microshift Test Compatibility ✅ Passed PR modifies helper functions in monitoring test framework that generate JUnit results, not Ginkgo e2e tests. No new Ginkgo test definitions (It(), Describe(), etc.) are added to the codebase.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests added. Only modifications to existing monitoring test helper functions that generate JUnit test cases. No test declarations (It/Describe/Context/When) found.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies test monitoring code to add topology-awareness by scoping exceptions to SNO. No scheduling constraints (affinity, nodeSelectors, PDBs) are introduced.
Ote Binary Stdout Contract ✅ Passed PR changes only modify exception-handling logic within test functions. No fmt.Print/stdout writes found at process level; only logrus logging to stderr in function bodies.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR modifies utility helper functions only; no new Ginkgo e2e tests (It(), Describe(), etc.) added, so IPv6/disconnected network check does not apply.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from p0lyn0mial and sjenning May 13, 2026 15:44
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate aggregated-aws-ovn-single-node-upgrade-5.0-micro 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 13, 2026
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/test e2e-aws-ovn-single-node

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/804bfe50-4ee4-11f1-9c63-4ffc4aa22055-0

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/test e2e-aws-ovn-single-node

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate periodic-ci-openshift-release-master-aggregated-aws-ovn-single-node-upgrade-5.0-micro 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 13, 2026

/payload-aggregate periodic-ci-openshift-release-main-aws-ovn-single-node-upgrade-5.0-micro 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 13, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

/test e2e-gcp-csi
/test e2e-metal-ip-ovn-ipv6

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

/test e2e-metal-ipi-ovn-ipv6

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/804bfe50-4ee4-11f1-9c63-4ffc4aa22055-0

These tests passed!

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 14, 2026

/test e2e-gcp-csi

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 15, 2026

Job Failure Risk Analysis for sha: ddb8197

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-gcp-csi Medium
[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should not change condition/Available
This test has passed 97.81% of 6952 runs on release 5.0 [Overall] in the last week.
---
verify the cluster readiness and stability
This test has passed 95.17% of 6026 runs on release 5.0 [Overall] in the last week.
---
verify operator conditions insights
This test has passed 97.24% of 6021 runs on release 5.0 [Overall] in the last week.
---
[Monitor:legacy-cvo-invariants][bz-Insights Operator] clusteroperator/insights should not change condition/Degraded
This test has passed 97.81% of 6952 runs on release 5.0 [Overall] in the last week.

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 15, 2026

/test e2e-gcp-csi

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2026
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2026
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 15, 2026

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 10

… to SNO

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:
- PR openshift#202: 2 replicas + PDB on HA topology prevents Available=False and
  spurious Progressing=True during rolling updates (OCPBUGS-62517, OCPBUGS-62635)

Remove the OCPBUGS-62517 exception for olm Available=False entirely.
The testUpgradeOperatorStateTransitions function already has a blanket
SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to
testUpgradeOperatorProgressingStateTransitions so it can detect topology,
then scope both remaining exceptions to SNO only:

- OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica
  PDB fix prevents this; on SNO there is still 1 replica and the node reboot
  restarts all pods simultaneously. This had been removed, but is now restored
  since the issue still occurs under SNO.
- OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on
  empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods
  and returns RetryableError. On SNO the OS-level reboot kills all pods at once
  so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally
kept; OLMv0 is in maintenance mode.

Assisted-by: claude
Signed-off-by: Todd Short <todd.short@me.com>
@openshift-ci-robot openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label May 20, 2026
@openshift-ci openshift-ci Bot removed lgtm Indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 20, 2026

@tmshort: This pull request references OPRUN-4569 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

The OLMv1 fixes in cluster-olm-operator are now in release-5.0:

Remove the OCPBUGS-62517 exception for olm Available=False entirely. The testUpgradeOperatorStateTransitions function already has a blanket SNO exemption, so single-node is covered.

For the Progressing-related exceptions, add clientConfig to testUpgradeOperatorProgressingStateTransitions so it can detect topology, then scope both remaining exceptions to SNO only:

  • OCPBUGS-62635: olm Progressing=True during MCO window. On HA the 2-replica
    PDB fix prevents this; on SNO there is still 1 replica and the node reboot
    restarts all pods simultaneously. This had been removed, but is now restored
    since the issue still occurs under SNO.
  • OCPBUGS-63672: operator-lifecycle-manager-packageserver Progressing=True on
    empty reason. On HA, isAPIServiceBackendDisrupted() detects terminating pods
    and returns RetryableError. On SNO the OS-level reboot kills all pods at once
    so no terminating pod is observed and the detection does not fire.

The operator-lifecycle-manager exception (OCPBUGS-65583) is intentionally
kept; OLMv0 is in maintenance mode.

Assisted-by: claude

Summary by CodeRabbit

  • Bug Fixes
  • Enhanced operator state transition validation to account for cluster topology in upgrade scenarios.
  • Refined exception handling for OLM operator state transitions to apply only to single-node deployments where applicable.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 20, 2026

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/175c74f0-5453-11f1-817d-5d5b6fe83209-0

Comment on lines +773 to +780
case "olm":
// CatalogdDeploymentCatalogdControllerManager_Deploying
// OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying
// On HA, cluster-olm-operator PR #202 (2 replicas + PDB) prevents this.
// On SNO there is only one replica and the node reboot restarts all pods simultaneously.
if strings.HasSuffix(reason, "ControllerManager_Deploying") && isSingleNode {
return "https://issues.redhat.com/browse/OCPBUGS-62635"
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#31112 removed this OLM exceptions, which needs to remain due to SNO. So, this restores it.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

@tmshort tmshort requested a review from oceanc80 May 20, 2026 15:15
@bandrade
Copy link
Copy Markdown
Contributor

/label qe-approved
/verified by @bandrade

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label May 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@bandrade: This PR has been marked as verified by @bandrade.

Details

In response to this:

/label qe-approved
/verified by @bandrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pedjak
Copy link
Copy Markdown
Contributor

pedjak commented May 20, 2026

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jogeo, oceanc80, pedjak, tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 20, 2026

/test e2e-vsphere-ovn-upi

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

1 similar comment
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@tmshort: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 20, 2026

/payload-aggregate periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/a3617a10-5474-11f1-8510-07ccf8d8c128-0

@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 20, 2026

This recent /payload-aggregate is a duplicate of #31172 (comment)

@openshift-merge-bot openshift-merge-bot Bot merged commit 5ba689e into openshift:main May 20, 2026
21 checks passed
@tmshort tmshort deleted the fix-OPRUN-4569 branch May 20, 2026 20:51
@tmshort
Copy link
Copy Markdown
Contributor Author

tmshort commented May 20, 2026

@tmshort: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-main-ci-5.0-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/175c74f0-5453-11f1-817d-5d5b6fe83209-0

These tests passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. qe-approved Signifies that QE has signed off on this PR ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants