fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS by AztecBot · Pull Request #23469 · AztecProtocol/aztec-packages

AztecBot · 2026-05-21T13:07:27Z

Summary

aztec start --local-network reliably SIGBUSes a few blocks into a run on macOS arm64 (since v5.0.0-nightly.20260520, i.e. after #21625 shipped the shared_ptr use-after-free fix). This is a different fault from the one #21625 fixed: a stack-guard violation (stack overflow) on a nodejs_module.node worker thread running AVM-simulation code, not a use-after-free.

This pins an explicit, generous stack size on the ThreadedAsyncOperation worker thread.

Root cause

ThreadedAsyncOperation::Queue() (introduced in #21138) runs the AVM simulation (_fn) directly on a bare std::thread(...).detach(). A std::thread uses the OS default stack for non-main threads, which is 512 KB on macOS versus 8 MB on Linux. The AVM-simulation call chain is deep enough to overflow 512 KB, so on macOS arm64 the worker writes into its stack-guard page and the process aborts with:

EXC_BAD_ACCESS / SIGBUS, KERN_PROTECTION_FAILURE
"Could not determine thread index for stack guard region"
  #0 _platform_memmove
  #1.. nodejs_module.node  bb::nodejs (AVM simulation path)

Linux is unaffected because its 8 MB default is comfortably large. The previous AsyncOperation path never hit this either: it ran on the libuv threadpool, whose threads are sized from RLIMIT_STACK (8 MB soft on macOS), not the 512 KB raw-thread default.

Fix

std::thread can't set a stack size, so launch the worker via pthreads with pthread_attr_setstacksize pinned to a generous WORKER_STACK_SIZE (32 MB — 4× the 8 MB that the libuv path proved sufficient, with headroom for deeper future call chains). Falls back to a default-stack std::thread only if pthreads is unavailable (_WIN32) or pthread_create fails.

The shared_ptr lifetime model from #21625 is preserved exactly — both the worker lambda and the BlockingCall completion callback still capture self, so this does not reintroduce the use-after-free. Only the thread-launch mechanism changed.

Testing

The full bb build is too heavy to run in this session, so this is not yet a local end-to-end repro/fix verification — it relies on CI for compilation and on a macOS arm64 aztec start --local-network run to confirm the crash is gone.
The pthread/std::function trampoline was compiled and run standalone under -std=c++20 -Wall -Wextra -Werror: the worker thread receives a 32 MB stack (pthread_get_stacksize_np reports 33554432), and the work runs and completes.
Requested: verify against tonight's nightly on macOS arm64 (M3) — the reporter's exact repro.

Notes for reviewers

Targets next (not merge-train/barretenberg) to match fix: use shared_ptr in ThreadedAsyncOperation to prevent SIGBUS on macOS #21625's base and to make the nightly, since this is an urgent release-affecting crash. Happy to retarget if you'd prefer it go through the merge train.
32 MB is a deliberate over-provision; if you'd rather mirror the libuv path precisely we could instead size from getrlimit(RLIMIT_STACK). The fixed constant is simpler and the virtual reservation only commits pages as touched.
The longer-term fix is the NAPI→IPC migration (refactor: replace NAPI with IPC for world state, AVM, and contracts DB #21331 / feat: AVM cutover — delete NAPI AVM, wire IPC simulator pool + CDB IPC server [PR 3b] #23196 / feat: kv-store over IPC; aztec-kvdb binary; LMDBStore NAPI scaffold [PR 4] #23238), which removes this in-process worker entirely. This is a targeted stop-gap for the shipping NAPI path.

Related: #21138 (introduced the threaded model), #21625 (use-after-free fix), #21629 (open alternative).

Created by claudebox · group: slackbot

AztecBot · 2026-05-22T14:22:34Z

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/497c7a7dfbe368f3�497c7a7dfbe368f38;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_mbps.parallel.test.ts "builds multiple blocks per slot with L2 to L1 messages" (316s) (code: 0) group:e2e-p2p-epoch-flakes

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS

414dcc4

AztecBot added ci-barretenberg-full Run all barretenberg checks. claudebox Owned by claudebox. it can push to this PR. labels May 21, 2026

Thunkar marked this pull request as ready for review May 22, 2026 09:12

Thunkar requested a review from ludamad May 22, 2026 09:12

ludamad approved these changes May 22, 2026

View reviewed changes

ludamad added this pull request to the merge queue May 22, 2026

Merged via the queue into next with commit 24540c5 May 22, 2026
55 of 60 checks passed

ludamad deleted the cb/4bd36dc505c2 branch May 22, 2026 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469
ludamad merged 1 commit into
nextfrom
cb/4bd36dc505c2

AztecBot commented May 21, 2026

Uh oh!

AztecBot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AztecBot commented May 21, 2026

Summary

Root cause

Fix

Testing

Notes for reviewers

Uh oh!

AztecBot commented May 22, 2026

Flakey Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants