Skip to content

fix(dra): support Deckhouse GPU attributes#130

Open
danilrwx wants to merge 2 commits into
v1.6.2-virtualizationfrom
feat/gpu/add-deckhouse-dra-support
Open

fix(dra): support Deckhouse GPU attributes#130
danilrwx wants to merge 2 commits into
v1.6.2-virtualizationfrom
feat/gpu/add-deckhouse-dra-support

Conversation

@danilrwx

@danilrwx danilrwx commented Jun 22, 2026

Copy link
Copy Markdown

What this PR does

Before this PR:

The KubeVirt DRA status controller only recognized the upstream Kubernetes GPU device attributes (resource.kubernetes.io/pcieRoot, resource.kubernetes.io/mDevUUID). A GPU published by the Deckhouse GPU DRA driver under its own namespaced attributes (gpu.deckhouse.io/pciAddress, gpu.deckhouse.io/deviceType, gpu.deckhouse.io/sharingStrategy) could not be resolved into a GPU status, so VM GPU passthrough based on the Deckhouse driver did not work.

After this PR:

The DRA status controller understands the Deckhouse GPU attributes in addition to the upstream ones, and validates that an allocated GPU is safe for VM passthrough:

  • reads gpu.deckhouse.io/pciAddress, gpu.deckhouse.io/deviceType (physical/mig), and gpu.deckhouse.io/sharingStrategy;
  • rejects GPUs that allowMultipleAllocations, use any sharing strategy, are MIG without an mDevUUID, or expose a PCI address with a non-physical device type;
  • normalizes 8-hex-digit PCI domain prefixes to the 4-digit form KubeVirt expects.

References

Why we need it and why it was done in this way

The Deckhouse GPU DRA driver publishes device attributes under the gpu.deckhouse.io/ namespace rather than resource.kubernetes.io/. The virtualization module selects GPUs by productName and only accepts exclusive physical devices, so KubeVirt must read the Deckhouse attributes and enforce the same exclusivity constraints as a safety layer.

The following tradeoffs were made:

  • Exclusivity validation lives in KubeVirt as the last safety layer. The virtualization controller also constrains selection with a CEL selector, so the constraints are enforced in two places intentionally.

The following alternatives were considered:

  • Rename the Deckhouse driver attributes to the upstream resource.kubernetes.io/ keys — rejected to keep driver-specific metadata namespaced and avoid collisions with other DRA drivers.

Links to places where the discussion took place: deckhouse/virtualization#2520

Special notes for your reviewer

Deckhouse-only downstream change. Tracked under the deckhouse/virtualization GPU DRA effort. The unrelated chore(lint): migrate golangci config to v2 commit is bundled on this branch.

Checklist

Release note

DRA: recognize Deckhouse GPU device attributes (gpu.deckhouse.io/pciAddress, deviceType, sharingStrategy) and reject GPUs unsuitable for VM passthrough (multi-allocation, shared, MIG without mdevUUID, non-physical with PCI address).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant