HPK allows HPC users to run their own private Kubernetes "mini Cloud" on a typical HPC cluster and then issue commands to it using Kubernetes-native tools.
To deploy, copy the scripts/ folder contents to your HPC account under ~/hpk/ and run:
cd hpk
sbatch --nodes=3 hpk.slurmThen configure and use kubectl:
export KUBECONFIG=${HOME}/.hpk/kubeconfig
kubectl get nodesUsers run a Slurm command to deploy one rootless container per cluster node, which we call bubble (using the hpk-bubble image). One bubble acts as the Kubernetes control plane, while the others act as worker nodes; together they form the Kubernetes cluster. Each bubble runs an instance of K3s, alongside the HPK-specific kubelet (hpk-kubelet), implemented using the Virtual Kubelet framework.
For external networking, bubbles use slirp4netns, while for internal, overlay networking they run Flannel and communicate over VXLAN tunnels (each host forwards UDP port 8472 to the bubble).
Inside the bubble, an Apptainer wrapper (hpktainer) is used to spawn "pods" (using the hpk-pause image, derived from hpktainer-base); these are containers that are given unique network addresses in the corresponding Flannel subnet and host user application containers.
All pod containers are placed in a bridge (hpk-bridge) at the bubble level to talk to each other directly. With the proper routing rules, they can route traffic to pods running in other bubbles (via the Flannel interface) and the outside world.
The pod network stack is again implemented in userspace using a pair of TAP interfaces; one in the nested container and one in the bubble (the interface connected to the hpk-bridge). The pair is connected via two instances of the hpk-net-daemon that forward traffic over a UNIX socket created in a shared folder.
HPK implements a 4-level distributed architecture.
-
Level 1: Host Node (Slurm Worker)
- The physical node managed by Slurm.
- Executes
hpk.slurm, which launches the bubble.
-
Level 2: Bubble (Node Overlay)
- Implemented in the
hpk-bubblecontainer. - An Apptainer instance acting as a virtual node.
- Runs K3s (the base Kubernetes distribution) and Flannel (overlay networking). The first bubble, which acts as the Kubernetes control plane, also runs etcd for supporting Flannel.
- Runs the local
hpk-kubelet, which registers itself as a node in the K3s cluster. - Connects to other bubbles via a VXLAN overlay network (Flannel).
- Implemented in the
-
Level 3: Pod
- Implemented in the
hpk-pausecontainer. - Spawned by
hpk-kubeletviahpktainer. - Each Pod is an Apptainer container with its own network namespace connected to the Bubble's bridge (
hpk-bridge). - The Pod's entrypoint is the
hpk-pausebinary, which acts as a "pause container" to hold the network namespace and capture application container signals.
- Implemented in the
-
Level 4: Application Container
- User application containers spawned by
hpk-pause. - These run within the same network namespace as the Level 3 Pod.
- They share the Pod's IP address and can communicate over
localhost.
- User application containers spawned by
All binaries are built and embedded in container images. The deployment script uses these images.
To build, run:
makeThis uses docker buildx to build and push the images with multi-architecture support (amd64/arm64) to the configured registry (default: docker.io/chazapis). You can override the registry:
REGISTRY=myregistry.io/user makeNote for developers: You can also build the binaries locally for testing purposes using make binaries. These will be placed in bin/.
You can test the setup locally using the provided Vagrant environment, which simulates a multi-node cluster using VMs.
This creates a 2-node cluster (controller, node) running Ubuntu 24.04 with Slurm pre-installed.
cd vagrant
vagrant up
vagrant reload # Required to apply security settings (AppArmor disable)The VMs use mDNS for networking and are accessible as controller.local and node.local.
Option A: For production testing (using published images)
Upload the project scripts to the controller node:
# From the repository root on your host
ssh -o StrictHostKeyChecking=no vagrant@controller.local "mkdir -p ~/hpk" # Password is 'vagrant'
scp -r -o StrictHostKeyChecking=no scripts/* vagrant@controller.local:~/hpk/ # Password is 'vagrant'Option B: For development (using local images)
For rapid iteration during development, build and deploy images directly to the VMs:
make developThis will:
- Build all images locally for your current architecture
- Export them as
.tarfiles - Copy them to both VMs at
~/.hpk/images/ - Copy the
scripts/directory to the controller at~/hpk/ - Remove old
.siffiles to ensure fresh builds are used
To use the local images, set HPK_DEV=1 before running the cluster (see step 3).
Connect to the controller and submit the Slurm job:
ssh -o StrictHostKeyChecking=no vagrant@controller.local # Password is 'vagrant'
export HPK_DEV=1 # If using development mode/local images
cd ~/hpk
sbatch --nodes=2 hpk.slurmThis will launch one controller bubble and one node bubble on the Vagrant VMs.
Once running, you can connect to the controller bubble:
apptainer shell instance://bubble1Inside the bubble, you can run nested containers:
hpktainer run docker://docker.io/chazapis/hpktainer-base:latest /bin/shAnd verify connectivity:
ip addr show tap0 # Should show Flannel IP
ping 8.8.8.8 # External access