Skip to content

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469

Merged
ludamad merged 1 commit into
nextfrom
cb/4bd36dc505c2
May 22, 2026
Merged

fix: pin AVM NAPI worker thread stack size to prevent SIGBUS on macOS#23469
ludamad merged 1 commit into
nextfrom
cb/4bd36dc505c2

Conversation

@AztecBot

Copy link
Copy Markdown
Collaborator

Summary

aztec start --local-network reliably SIGBUSes a few blocks into a run on macOS arm64 (since v5.0.0-nightly.20260520, i.e. after #21625 shipped the shared_ptr use-after-free fix). This is a different fault from the one #21625 fixed: a stack-guard violation (stack overflow) on a nodejs_module.node worker thread running AVM-simulation code, not a use-after-free.

This pins an explicit, generous stack size on the ThreadedAsyncOperation worker thread.

Root cause

ThreadedAsyncOperation::Queue() (introduced in #21138) runs the AVM simulation (_fn) directly on a bare std::thread(...).detach(). A std::thread uses the OS default stack for non-main threads, which is 512 KB on macOS versus 8 MB on Linux. The AVM-simulation call chain is deep enough to overflow 512 KB, so on macOS arm64 the worker writes into its stack-guard page and the process aborts with:

EXC_BAD_ACCESS / SIGBUS, KERN_PROTECTION_FAILURE
"Could not determine thread index for stack guard region"
  #0 _platform_memmove
  #1.. nodejs_module.node  bb::nodejs (AVM simulation path)

Linux is unaffected because its 8 MB default is comfortably large. The previous AsyncOperation path never hit this either: it ran on the libuv threadpool, whose threads are sized from RLIMIT_STACK (8 MB soft on macOS), not the 512 KB raw-thread default.

Fix

std::thread can't set a stack size, so launch the worker via pthreads with pthread_attr_setstacksize pinned to a generous WORKER_STACK_SIZE (32 MB — 4× the 8 MB that the libuv path proved sufficient, with headroom for deeper future call chains). Falls back to a default-stack std::thread only if pthreads is unavailable (_WIN32) or pthread_create fails.

The shared_ptr lifetime model from #21625 is preserved exactly — both the worker lambda and the BlockingCall completion callback still capture self, so this does not reintroduce the use-after-free. Only the thread-launch mechanism changed.

Testing

  • The full bb build is too heavy to run in this session, so this is not yet a local end-to-end repro/fix verification — it relies on CI for compilation and on a macOS arm64 aztec start --local-network run to confirm the crash is gone.
  • The pthread/std::function trampoline was compiled and run standalone under -std=c++20 -Wall -Wextra -Werror: the worker thread receives a 32 MB stack (pthread_get_stacksize_np reports 33554432), and the work runs and completes.
  • Requested: verify against tonight's nightly on macOS arm64 (M3) — the reporter's exact repro.

Notes for reviewers

Related: #21138 (introduced the threaded model), #21625 (use-after-free fix), #21629 (open alternative).


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-barretenberg-full Run all barretenberg checks. claudebox Owned by claudebox. it can push to this PR. labels May 21, 2026
@Thunkar Thunkar marked this pull request as ready for review May 22, 2026 09:12
@Thunkar Thunkar requested a review from ludamad May 22, 2026 09:12
@ludamad ludamad added this pull request to the merge queue May 22, 2026
@AztecBot

Copy link
Copy Markdown
Collaborator Author

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/497c7a7dfbe368f3�497c7a7dfbe368f38;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_mbps.parallel.test.ts "builds multiple blocks per slot with L2 to L1 messages" (316s) (code: 0) group:e2e-p2p-epoch-flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-barretenberg-full Run all barretenberg checks. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants