Skip to content

TELCODOCS#2637: Adding custom MCP upgrade content#111682

Open
sr1kar99 wants to merge 1 commit into
openshift:mainfrom
sr1kar99:2637-adding-custom-mcp-upgrade
Open

TELCODOCS#2637: Adding custom MCP upgrade content#111682
sr1kar99 wants to merge 1 commit into
openshift:mainfrom
sr1kar99:2637-adding-custom-mcp-upgrade

Conversation

@sr1kar99
Copy link
Copy Markdown
Contributor

@sr1kar99 sr1kar99 commented May 14, 2026

Version(s):
4.21+

Issue:
https://redhat.atlassian.net/browse/TELCODOCS-2637

Link to docs preview:

QE review:

  • QE has approved this change.

@openshift-ci openshift-ci Bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 14, 2026
@ocpdocs-previewbot
Copy link
Copy Markdown

ocpdocs-previewbot commented May 14, 2026

Comment thread modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc Outdated
@sr1kar99
Copy link
Copy Markdown
Contributor Author

@r3v5
Could you please review this PR?
Thanks!


* *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`).

. Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply the.... to identify each KFD for the Kubernetes scheduler.

Looks like only one topology label needs to be added, but a different one must be added to each MCP (worker-0, worker-1, worker-2, worker-3..).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as follows:

. Apply the topology.kubernetes.io/zone label to each node to identify its KFD for the scheduler. You must apply the corresponding custom node role label (for example, worker-0, worker-1, worker-2) to assign each node to its custom MCP by running the following command:


* *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`).

. Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command:
Copy link
Copy Markdown

@alosadagrande alosadagrande May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend including the node role label on each node during installation so you avoid any disruption on the node once the cluster is installed. The label, then, will already be included when the cluster is ready.

It can also be done later in a cluster already installed, but a warning message should be included that moving a node from a different MCP can cause disruption.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the IMPORTANT NOTE to include this new info:

"Apply the topology.kubernetes.io/zone label and custom node role labels during cluster installation or node scaling whenever possible. Applying these labels before scheduling workloads ensures that the Kubernetes scheduler can distribute application replicas correctly across failure domains.

Applying or changing node role labels after installation can move nodes between MCPs, which might temporarily disrupt workloads on those nodes. If workloads are already running when you apply or modify these labels, you might need to reschedule workloads to achieve the intended HA distribution."

@sr1kar99 sr1kar99 force-pushed the 2637-adding-custom-mcp-upgrade branch from 93d93c5 to bb43ada Compare May 17, 2026 20:44
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 17, 2026

@sr1kar99: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +102 to +105
worker-0.topology.kubernetes.io/zone=kfd0 Ready worker,worker-0 27h v1.31.13 kfd0
worker-1.topology.kubernetes.io/zone=kfd1 Ready worker,worker-1 27h v1.31.13 kfd0
worker-2.topology.kubernetes.io/zone=kfd2 Ready worker,worker-2 27h v1.31.13 kfd2
worker-3.topology.kubernetes.io/zone=kfd3 Ready worker,worker-3 27h v1.31.13 kfd3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently shows this has one worker in each MCP and thus one worker in each zone. I think it would be more instructive to show 2 workers per MCP and zone, even if we use only 2 MCPs if we need to keep this short.

worker-3.topology.kubernetes.io/zone=kfd3 Ready worker,worker-3 27h v1.31.13 kfd3
----

. Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should advise that the consequence of scheduling before zone labeling is that the scheduler won't distribute across zones

- key: machineconfiguration.openshift.io/role
operator: In
values: [ worker, worker-0 ]
paused: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normal state of pause is false. Only set it to true when the upgrade process is starting. Set it true before starting the control plane upgrade.


. Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains.

. Keep each custom MCP paused until you are ready to upgrade the cluster.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, the normal state for pause should be false. The wording here should be pause the MCPs when you are ready to upgrade the cluster

Configure each custom MCP with `maxUnavailable: 100%` so that all nodes in that pool to update at the same time. This setting applies only to the nodes in the selected MCP, not to the entire cluster.

Plan the number and size of custom MCPs based on your cluster topology, workload distribution, and application availability requirements, including Pod Disruption Budgets (PDBs).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need some additional context here. Things not covered are:

  • Use of zones is not a guarantee that the scheduler will spread replicas across zones. It is a soft constraint, but it is beneficial to have the scheduler taking failure domains (and thus upgrade domains) into account
  • The ability to take an entire failure domain offline depends on several criteria
    • The application/workload is designed to be highly available
    • The application/workload has PodDisruptionBudgets protecting the HA replicas
    • The application is able to meet minimum service level requirements while the failure domain is offline
  • Recommendation is to have sufficient spare capacity in the cluster to accomodate pods disrupted when failure domain is taken offline
  • If application level contraints require less than a full failure domain be taken offline concurrently the maxUnavailable setting can be reduced, eg to 50% of the failure domain, or whatever capacity allows the application to meet service requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants