Monitor MachinePool Health#1473
Closed
friegger wants to merge 8 commits into
Closed
Conversation
machinepool-lifecycle-controller monitors MachinePools based on the proposal. Signed-off-by: Felix Riegger <felix.riegger@sap.com>
Signed-off-by: Felix Riegger <felix.riegger@sap.com>
ed66ffe to
6833b9c
Compare
Signed-off-by: Felix Riegger <felix.riegger@sap.com>
cdf84c1 to
19f5b40
Compare
Signed-off-by: Felix Riegger <felix.riegger@sap.com>
Implement the MachinePool health heartbeat described in IEP-15.
The poollet now actively reports pool liveness so the lifecycle controller
can fail pools whose poollets have gone away.
It includes:
- config
- Provision the ironcore-machinepool-lease namespace via the install
kustomization so operators don't have to know the magic name.
- Grant the poollet coordination.k8s.io/leases RBAC scoped to that
namespace.
- heartbeat runnable
- Add pure helpers ComputeReadyCondition (maps an IRI Status probe
result to a MachinePool Ready condition) and ReadyConditionsDiffer
(decides whether a patch is warranted, ignoring timestamps so we
don't flap downstream watchers).
- Add MachinePoolHeartbeat, a ticker-driven Runnable that probes the
IRI runtime via Status, renews the pool's Lease in
ironcore-machinepool-lease, and patches Ready only when its value or
observedGeneration actually changes. Errors on either sub-step are
logged and retried on the next tick; the lifecycle controller's
grace period absorbs short blips. Lease takeover from a previous
holder is logged at Info as required by IEP-15.
- app arguments that make the heartbeat intervals configurable,
defaulting to the IEP-15 values.
Signed-off-by: Felix Riegger <felix.riegger@sap.com>
…manager Signed-off-by: Felix Riegger <felix.riegger@sap.com>
Splits each component install into a namespace-free `default/` base and a `standalone/` wrapper that ships Namespaces. Resolves the kustomize ID conflict introduced in 242aaf9 (where the parent namespace transformer renamed every Namespace in the bundle, including the new lease Namespace, to ironcore-system) and eliminates the remove-namespace.yaml patch dance the combined wrappers used to work around it. Layer model: - config/<component>/default/ - sets namespace+namePrefix, emits no Namespace (used as a base) - config/<component>/standalone/ - wraps default/ and adds Namespaces (use this for single-component deploys) - config/namespaces/ironcore-system/ - shared Namespace kustomization - config/namespaces/machinepool-lease/ - shared Namespace + lease RBAC - config/default/, config/etcdless/ - reference the bases and namespace kustomizations directly; no more remove-namespace.yaml patches Closes two IEP-15 RBAC gaps: - Adds the missing RoleBinding for poollet lease renewal; without it every poollet got 403 on its first renewal. - Adds Role + cross-namespace RoleBinding granting the controller manager get/list/watch on coordination.k8s.io/leases in ironcore-machinepool-lease (the lifecycle controller was reading leases with no Role granting access). Also moves the apiserver-side lease Role out of config/apiserver/rbac/, where the parent transformer was silently rewriting its metadata.namespace from ironcore-machinepool-lease to ironcore-system. It now lives alongside its Namespace in config/namespaces/machinepool-lease/. Behavioral change for downstream consumers: users who previously ran `kustomize build config/controller/default` for a complete deploy must migrate to `config/controller/standalone`; same for `config/apiserver/default` -> `config/apiserver/standalone`. The combined config/default and config/etcdless paths produce output that is byte-identical to main for every non-lease document, plus the new ironcore-machinepool-lease Namespace and its RBAC. The Makefile install/uninstall/deploy/undeploy targets are retargeted at the standalone variants accordingly. hack/validate-kustomize.sh is also made portable (GNU realpath --relative-to is unavailable on macOS). Signed-off-by: Felix Riegger <felix.riegger@sap.com>
85560b8 to
fbec8f1
Compare
Contributor
Author
|
Superseeded by #1476. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This change mostly consists of two parts:
Implements
machinepool-lifecycle-controllermonitoring MachinePool health, with corresponding API additions for MachinePool/VolumePool/BucketPool conditions, codegen updatesThe poollet now actively reports pool liveness so the lifecycle controller can set pools whose poollets have gone away to Unknown. It also includes kustomizations adding the Lease namespace and RBAC adjustments.
The MachinePoolHeartbeat, a ticker-driven Runnable that probes the IRI runtime via Status, renews the pool's Lease in ironcore-machinepool-lease, and patches Ready only when its value or observedGeneration actually changes. Errors on either sub-step are logged and retried on the next tick; the lifecycle controller's grace period absorbs short blips. Lease takeover from a previous holder is logged at Info. Contains app arguments that make the heartbeat intervals configurable, defaulting to the IEP-15 values.
Contributes to #1472