Add InstallKubePrometheusStack lib step#1237
Open
LeonardCareer wants to merge 1 commit into
Open
Conversation
wonderyl
reviewed
Jun 26, 2026
| import lib.steps.azure | ||
|
|
||
| # Install kube-prometheus-stack via Helm. Caller must run azure.GetCredentials first and check in a values.yaml. | ||
| InstallKubePrometheusStack = lambda serviceConnection: str, valuesFile: str, namespace: str = "monitoring", releaseName: str = "prometheus", chartVersion: str = "", waitTimeout: str = "10m" -> steps.Step { |
Collaborator
There was a problem hiding this comment.
The name can be more succinct. e.g. InstallPrometheus. KubePrometheus is not a thing. Stack bears no meaning.
wonderyl
reviewed
Jun 26, 2026
| # Prometheus web service is svc/<releaseName>-kube-prometheus-prometheus on port 9090. | ||
| echo "==> Prometheus web service: svc/${releaseName}-kube-prometheus-prometheus in ${namespace} (port 9090)" | ||
| """ | ||
| azure.AzCli(serviceConnection, "Install kube-prometheus-stack", script) |
wonderyl
reviewed
Jun 26, 2026
| script = """ | ||
| set -euo pipefail | ||
|
|
||
| # Install Helm v3 if missing |
Collaborator
There was a problem hiding this comment.
consider extracting install helm into a separate step. Imaging there are 10 helm charts, each have their own way to install helm 3.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
kcl/lib/steps/k8s/install_prometheus.k, providing one new step:The step installs the kube-prometheus-stack Helm chart (Operator + Prometheus + kube-state-metrics + Grafana + Alertmanager) into the current kubectl context.
Why
Multiple benchmark scenarios need to provision an in-cluster Prometheus on a freshly created AKS cluster — currently each scenario hand-rolls its own helm/kubectl-apply step. This step covers both cases discussed so far:
What the caller must prepare
azure.GetCredentials(...)in a prior step so kubectl works against the target cluster.values.yamlchecked in to the caller's pipeline repo, e.g.kcl/<scenario>/prometheus-values.yaml. The path passed tovaluesFileis repo-relative under$(Pipeline.Workspace)/s/. All workload-specific tuning (retention, storage, resource requests, scrape rules, PodMonitor/ServiceMonitor selectors, node selectors, tolerations, enabling/disabling Grafana/Alertmanager/KSM) lives in this file — see the chart'svalues.yamlfor the full surface.ghcr.io,quay.io,pkg-containers.githubusercontent.com,production.cloudflare.docker.comand*.of each) — caller handles this in their cluster-create step.kubectl apply -f ...after this step. Not lib's concern.What the step does
helmis not on PATH.prometheus-communityhelm repo.helm upgrade --install <releaseName> prometheus-community/kube-prometheus-stackwith the caller's values file,--create-namespace,--wait,--atomic(rolls back cleanly on failure), and configurable timeout.svc/<releaseName>-kube-prometheus-prometheus, port 9090) so the caller knows where to port-forward / scrape from next.Parameters
serviceConnectionvaluesFilenamespace"monitoring"releaseName"prometheus"chartVersion""(latest)waitTimeout"10m"Validation
End-to-end-tested in the Telescope S5 lease benchmark pipeline (internal repo) against an AKS H8 hyperscale cluster in southeastasia:
svc/prometheus-kube-prometheus-prometheusis port-forwardable and serves/-/ready,/api/v1/query,/api/v1/query_rangeadditionalScrapeConfigsscrape config picks up the workload pods correctly