Skip to content

Harden virtio-blk and virtio-net#51

Merged
jserv merged 1 commit intomasterfrom
virtio
Apr 29, 2026
Merged

Harden virtio-blk and virtio-net#51
jserv merged 1 commit intomasterfrom
virtio

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented Apr 29, 2026

Address ten correctness and safety issues across the virtqueue chain walking, descriptor validation, and back-pressure paths.

Chain walking. Both devices walked exactly one (virtio-net) or three (virtio-blk) descriptors per request, so a guest that split a packet or a >4 KiB block I/O across multiple descriptors got partial transfers and the next chain descriptor was reinterpreted as a new request head. Both device emulators now snapshot the entire chain via VRING_DESC_F_NEXT, sized to the VIRTQ_SIZE-bounded stack array, and route data through readv/writev (net) or per-segment diskimg I/O (blk).

Descriptor direction. Neither device validated VRING_DESC_F_WRITE before touching guest memory, so a buggy or malicious guest could hand a "readable-only" status descriptor and have the device write into memory it never offered for writes. Every consumer site now checks direction matches the request type and rejects mismatches with VIRTIO_BLK_S_IOERR or an empty USED publication.

NULL guards. virtq_get_avail can legitimately return NULL on a malformed chain; the prior code dereferenced the result on virtio-blk's data and status reads, which is a host segfault, not a queue stall. Every consumer site now NULL-checks. On a malformed chain the device stalls the queue rather than publishing USED with chain[0].id (the buffer ID lives on the chain's last descriptor in packed virtqueues, so publishing with a head id risks pointing the driver at an unrelated in-flight chain).

Always-publish-USED. Every error path in virtio-blk/virtio-net used to return without flipping USED on the head, leaking descriptors and eventually drifting the device's view of the ring out of sync with the driver's. Successful walks now always publish USED — IOERR with len=1 on validation failure, the real status byte plus device-writable byte count on success.

Stack-array bounds. Per-iteration desc_snap[VIRTQ_SIZE] / iov[VIRTQ_SIZE] arrays are sized to the host-advertised queue depth. virtio-pci now clamps guest writes to queue_size at VIRTQ_SIZE and the walkers cap at VIRTQ_SIZE, so a guest cannot make the walk overrun the stack arrays.

Atomic event flag stores. virtq_handle_avail reads guest_event->flags with __ATOMIC_ACQUIRE; the worker threads paired this with plain stores, so the compiler was free to tear or reorder against the surrounding completion writes. A new virtq_set_guest_event_flags helper replaces every plain store with __atomic_store_n / __ATOMIC_RELEASE.

virtio-blk SG arithmetic. data_size widened from uint16_t to uint32_t so >64 KiB segments don't silently truncate before reaching diskimg I/O. Per-segment writable_total now accumulates in uint64_t with __builtin_add_overflow, capping at UINT32_MAX-1 so the reported used.len plus the trailing status byte fits the packed-ring uint32_t.

VIRTIO_BLK_F_FLUSH. Without the negotiated FLUSH feature the Linux virtio-blk driver runs in writeback-without-barriers mode, so guest fsync(2) returns success while data sits in the host page cache. The device now advertises VIRTIO_BLK_F_FLUSH, dispatches T_FLUSH to a new diskimg_flush() (fdatasync), and fsyncs the backing file at exit.

TX back-pressure. The TX worker previously required tapfd POLLOUT in its poll predicate; with a level-triggered ioeventfd POLLIN that meant 100% CPU until the TAP drained. The poll set now drops the TAP predicate by default, drains the ioeventfd on a guest kick, and arms POLLOUT only after a transient writev() EAGAIN. On EAGAIN the chain's next_avail_idx and used_wrap_count are rolled back so the TAP-blocked chain is retried on POLLOUT rather than silently dropped.


Summary by cubic

Hardened virtio-blk and virtio-net by walking full packed-virtqueue descriptor chains, enforcing descriptor direction, using atomic event-flag stores, and improving TX back-pressure. This prevents partial I/O, ring desyncs, and busy spins, and adds flush support so guest fsync is durable.

  • Bug Fixes

    • Walk entire chains; build iovecs and use readv/writev (net) or per-segment disk I/O (blk).
    • Validate VRING_DESC_F_WRITE and header sizes; NULL-check chain elements; stall on malformed chains.
    • Always publish USED with the last-descriptor buffer ID and correct used.len via virtq_publish_used.
    • Clamp queue_size and chain length to VIRTQ_SIZE; bound stack arrays; widen blk data_size to uint32_t and cap 64-bit totals.
    • Use release stores for guest event flags via virtq_set_guest_event_flags and for USED flips to fix ordering.
    • TX back-pressure (net): poll drains ioeventfd, arm POLLOUT only after EAGAIN, roll back indices to retry, and stop CPU spin; RX/TX handle the virtio header correctly.
  • New Features

    • Advertise VIRTIO_BLK_F_FLUSH; handle T_FLUSH via fdatasync and flush the backing file on device shutdown.

Written for commit b461e1a. Summary will update on new commits. Review in cubic

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/virtio-net.c">

<violation number="1" location="src/virtio-net.c:372">
P2: `goto tx_publish` skips `dev->tx_wait_for_tap = false`, leaving stale back-pressure state if a previous iteration hit EAGAIN. Move the assignment above the label so it applies to all paths that publish USED.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/virtio-net.c
Address ten correctness and safety issues across the virtqueue chain
walking, descriptor validation, and back-pressure paths.

Chain walking. Both devices walked exactly one (virtio-net) or three
(virtio-blk) descriptors per request, so a guest that split a packet or
a >4 KiB block I/O across multiple descriptors got partial transfers and
the next chain descriptor was reinterpreted as a new request head. Both
device emulators now snapshot the entire chain via VRING_DESC_F_NEXT,
sized to the VIRTQ_SIZE-bounded stack array, and route data through
readv/writev (net) or per-segment diskimg I/O (blk).

Descriptor direction. Neither device validated VRING_DESC_F_WRITE before
touching guest memory, so a buggy or malicious guest could hand a
"readable-only" status descriptor and have the device write into memory
it never offered for writes. Every consumer site now checks direction
matches the request type and rejects mismatches with VIRTIO_BLK_S_IOERR
or an empty USED publication.

NULL guards. virtq_get_avail can legitimately return NULL on a
malformed chain; the prior code dereferenced the result on virtio-blk's
data and status reads, which is a host segfault, not a queue stall.
Every consumer site now NULL-checks. On a malformed chain the device
stalls the queue rather than publishing USED with chain[0].id (the
buffer ID lives on the chain's last descriptor in packed virtqueues, so
publishing with a head id risks pointing the driver at an unrelated
in-flight chain).

Always-publish-USED. Every error path in virtio-blk/virtio-net used to
return without flipping USED on the head, leaking descriptors and
eventually drifting the device's view of the ring out of sync with the
driver's. Successful walks now always publish USED — IOERR with len=1
on validation failure, the real status byte plus device-writable byte
count on success.

Stack-array bounds. Per-iteration desc_snap[VIRTQ_SIZE] / iov[VIRTQ_SIZE]
arrays are sized to the host-advertised queue depth. virtio-pci now
clamps guest writes to queue_size at VIRTQ_SIZE and the walkers cap at
VIRTQ_SIZE, so a guest cannot make the walk overrun the stack arrays.

Atomic event flag stores. virtq_handle_avail reads guest_event->flags
with __ATOMIC_ACQUIRE; the worker threads paired this with plain stores,
so the compiler was free to tear or reorder against the surrounding
completion writes. A new virtq_set_guest_event_flags helper replaces
every plain store with __atomic_store_n / __ATOMIC_RELEASE.

virtio-blk SG arithmetic. data_size widened from uint16_t to uint32_t
so >64 KiB segments don't silently truncate before reaching diskimg I/O.
Per-segment writable_total now accumulates in uint64_t with
__builtin_add_overflow, capping at UINT32_MAX-1 so the reported used.len
plus the trailing status byte fits the packed-ring uint32_t.

VIRTIO_BLK_F_FLUSH. Without the negotiated FLUSH feature the Linux
virtio-blk driver runs in writeback-without-barriers mode, so guest
fsync(2) returns success while data sits in the host page cache. The
device now advertises VIRTIO_BLK_F_FLUSH, dispatches T_FLUSH to a new
diskimg_flush() (fdatasync), and fsyncs the backing file at exit.

TX back-pressure. The TX worker previously required tapfd POLLOUT in
its poll predicate; with a level-triggered ioeventfd POLLIN that meant
100% CPU until the TAP drained. The poll set now drops the TAP
predicate by default, drains the ioeventfd on a guest kick, and arms
POLLOUT only after a transient writev() EAGAIN. On EAGAIN the chain's
next_avail_idx and used_wrap_count are rolled back so the TAP-blocked
chain is retried on POLLOUT rather than silently dropped.
@jserv jserv merged commit 606e9ca into master Apr 29, 2026
10 checks passed
@jserv jserv deleted the virtio branch April 29, 2026 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant