TELCODOCS#2637: Adding custom MCP upgrade content#111682
Conversation
|
🤖 Sun May 17 20:52:25 - Prow CI generated the docs preview: |
|
@r3v5 |
|
|
||
| * *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`). | ||
|
|
||
| . Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command: |
There was a problem hiding this comment.
Apply the.... to identify each KFD for the Kubernetes scheduler.
Looks like only one topology label needs to be added, but a different one must be added to each MCP (worker-0, worker-1, worker-2, worker-3..).
There was a problem hiding this comment.
Updated as follows:
. Apply the topology.kubernetes.io/zone label to each node to identify its KFD for the scheduler. You must apply the corresponding custom node role label (for example, worker-0, worker-1, worker-2) to assign each node to its custom MCP by running the following command:
|
|
||
| * *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`). | ||
|
|
||
| . Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command: |
There was a problem hiding this comment.
I would recommend including the node role label on each node during installation so you avoid any disruption on the node once the cluster is installed. The label, then, will already be included when the cluster is ready.
It can also be done later in a cluster already installed, but a warning message should be included that moving a node from a different MCP can cause disruption.
There was a problem hiding this comment.
Updated the IMPORTANT NOTE to include this new info:
"Apply the topology.kubernetes.io/zone label and custom node role labels during cluster installation or node scaling whenever possible. Applying these labels before scheduling workloads ensures that the Kubernetes scheduler can distribute application replicas correctly across failure domains.
Applying or changing node role labels after installation can move nodes between MCPs, which might temporarily disrupt workloads on those nodes. If workloads are already running when you apply or modify these labels, you might need to reschedule workloads to achieve the intended HA distribution."
93d93c5 to
bb43ada
Compare
|
@sr1kar99: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| worker-0.topology.kubernetes.io/zone=kfd0 Ready worker,worker-0 27h v1.31.13 kfd0 | ||
| worker-1.topology.kubernetes.io/zone=kfd1 Ready worker,worker-1 27h v1.31.13 kfd0 | ||
| worker-2.topology.kubernetes.io/zone=kfd2 Ready worker,worker-2 27h v1.31.13 kfd2 | ||
| worker-3.topology.kubernetes.io/zone=kfd3 Ready worker,worker-3 27h v1.31.13 kfd3 |
There was a problem hiding this comment.
As currently shows this has one worker in each MCP and thus one worker in each zone. I think it would be more instructive to show 2 workers per MCP and zone, even if we use only 2 MCPs if we need to keep this short.
| worker-3.topology.kubernetes.io/zone=kfd3 Ready worker,worker-3 27h v1.31.13 kfd3 | ||
| ---- | ||
|
|
||
| . Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains. |
There was a problem hiding this comment.
Should advise that the consequence of scheduling before zone labeling is that the scheduler won't distribute across zones
| - key: machineconfiguration.openshift.io/role | ||
| operator: In | ||
| values: [ worker, worker-0 ] | ||
| paused: true |
There was a problem hiding this comment.
The normal state of pause is false. Only set it to true when the upgrade process is starting. Set it true before starting the control plane upgrade.
|
|
||
| . Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains. | ||
|
|
||
| . Keep each custom MCP paused until you are ready to upgrade the cluster. |
There was a problem hiding this comment.
As noted above, the normal state for pause should be false. The wording here should be pause the MCPs when you are ready to upgrade the cluster
| Configure each custom MCP with `maxUnavailable: 100%` so that all nodes in that pool to update at the same time. This setting applies only to the nodes in the selected MCP, not to the entire cluster. | ||
|
|
||
| Plan the number and size of custom MCPs based on your cluster topology, workload distribution, and application availability requirements, including Pod Disruption Budgets (PDBs). | ||
|
|
There was a problem hiding this comment.
I believe we need some additional context here. Things not covered are:
- Use of zones is not a guarantee that the scheduler will spread replicas across zones. It is a soft constraint, but it is beneficial to have the scheduler taking failure domains (and thus upgrade domains) into account
- The ability to take an entire failure domain offline depends on several criteria
- The application/workload is designed to be highly available
- The application/workload has PodDisruptionBudgets protecting the HA replicas
- The application is able to meet minimum service level requirements while the failure domain is offline
- Recommendation is to have sufficient spare capacity in the cluster to accomodate pods disrupted when failure domain is taken offline
- If application level contraints require less than a full failure domain be taken offline concurrently the maxUnavailable setting can be reduced, eg to 50% of the failure domain, or whatever capacity allows the application to meet service requirements.
Version(s):
4.21+
Issue:
https://redhat.atlassian.net/browse/TELCODOCS-2637
Link to docs preview:
QE review: