feat: WebAssembly binding (full stack)#531
Open
gergelyvagujhelyi wants to merge 196 commits into
Open
Conversation
Adds a new workspace member nobodywho-wasm that mirrors the Python binding's async API surface (Model, Chat, TokenStream, Encoder) via wasm-bindgen. Status: scaffold only. Compiles cleanly on native (rlib) so cargo check at the workspace root keeps working, but wasm-pack build --target web needs two prerequisites that aren't done yet: 1. The marek-hradil/llama-cpp-rs fork needs a wasm32 build path (Emscripten cmake). We'll carry our own fork as a patch carrier. 2. nobodywho/core needs cfg(target_arch = "wasm32") gating for std::thread, ureq downloads, and tokio rt-multi-thread, plus a get_model_from_bytes API for browser use. Both blockers are tracked in nobodywho/wasm/README.md and as TODO comments in the binding source. The binding's shape is independent of both.
Step 2a of the WASM binding rollout. Splits a few core dependencies so that `cargo check --target wasm32-unknown-unknown -p nobodywho` no longer fails immediately on dep resolution. Native builds are unchanged. - tokio: `rt-multi-thread` only on native; wasm gets plain `rt` (no OS threads on wasm32-unknown-unknown). - ureq: native only (raw TCP/TLS not available in a browser sandbox). - indicatif: native only (no terminal in a browser). - dirs: native only AND non-android (Android already reads /proc/self/cmdline directly; wasm has no filesystem). - `default_progress_callback` in llm.rs and the `indicatif::*` import: gated to non-wasm. Native callers unchanged. Still blocking wasm-pack build (tracked in wasm/README.md): - Step 1: llama-cpp-2 fork needs wasm32 build path - Step 2b: Worker refactor (std::thread::spawn -> spawn_local) - Step 2c: Model::load_bytes constructor cargo check --workspace passes; only pre-existing deprecation warnings in uniffi/godot/flutter remain.
The wasm branch (https://github.com/nobodywho-ooo/llama-cpp-rs/tree/wasm) is forked from marek-hradil/main at 8550f04e, so it inherits the llguidance / EOS-fix / lark / with_logits_ith_mut patches that core depends on today. It adds one commit (a35e66a) that scaffolds wasm32 detection in the build script — a no-op on native targets. Why this swap matters: - Native builds are functionally unchanged (the new branch's only added commit is wasm32-gated, native targets never see it). - The wasm32 build path now lives in a repo nobodywho controls. Step 1 of the WASM rollout (Emscripten cmake support, feature gating, load_from_buffer wrapper) lands on this branch. cargo check --workspace passes; only pre-existing deprecation warnings in uniffi/godot/flutter remain. When the marek-hradil/main moves forward with more patches, we'll rebase the wasm branch on top.
cargo check --target wasm32-unknown-unknown -p nobodywho-wasm now succeeds (with dead-code warnings only). Linking will still fail because the llama-cpp-rs wasm branch doesn't produce a wasm artifact yet, but every Rust crate in the workspace now type-checks under the wasm32 target. Changes: * grammar/gbnf/Cargo.toml — disable jsonschema default features. The only use is jsonschema::meta::is_valid (a pure in-memory metaschema check); defaults pull in resolve-http/resolve-file -> reqwest -> hyper -> tokio with the 'net' feature -> mio, which doesn't build on wasm32. Drops ~12 transitive crates from the dependency graph. * core/Cargo.toml — move monty (Python interpreter) and bashkit (virtual bash) into the not(target_arch = wasm32) deps block. Both have native- only requirements: monty has no wasm32 target; bashkit forces tokio/full which pulls mio + raw sockets. * core/src/tool_calling/mod.rs — cfg-gate Tool::python and Tool::bash (along with their monty/bashkit imports) to non-wasm. Wasm consumers can still build custom tools via Tool::new. * core/src/llm.rs — cfg-gate the model-loader infrastructure (get_model, get_model_async, parse_model_path, resolve_fancy_path_to_fs, the cache dir helpers, download_file, download_model_from_hf, download_model_from_url, ParsedModelPath enum) to non-wasm32. A future Model::load_from_bytes API will handle in-memory loading for wasm (see nobodywho/wasm/README.md Step 2c). * wasm/src/lib.rs — fix init() to use tracing_wasm::try_set_as_global_default instead of constructing a Layer and passing it to set_global_default (which needs a Subscriber).
Update Cargo.lock to point at the rebased nobodywho-ooo/llama-cpp-rs wasm
branch (6c35d9a). That branch is now marek-hradil/main + Asbjorn's three
cherry-picked Emscripten commits + a fix for two non-exhaustive matches he
missed + WASM.md doc.
Update wasm/README.md to reflect the new world:
- Target is now wasm32-unknown-emscripten (not -unknown-unknown).
- Document the emsdk install path so contributors can actually link.
- Note that cargo check now panics with a clear sysroot message when emcc
isn't installed, which is the correct behavior.
- Drop the obsolete Step 1 block ("llama-cpp-2 needs a wasm32 build path");
that's now done.
- Expand Step 2a recap with the additional cfg-gates that landed in
0e2f913 (gbnf, monty/bashkit, model-loader infra).
- core/src/llm.rs: add get_model_from_bytes(bytes, gpu_layers) which calls the new LlamaModel::load_from_buffer in the llama-cpp-2 fork (commit 606c4759 on the wasm branch). Bypasses every filesystem and HTTP code path — pure 'bytes in, Model out'. Gated to non-windows-MSVC to match the upstream wrapper. - wasm/src/lib.rs: Model.loadBytes(uint8Array) now delegates to get_model_from_bytes with gpu_layers=0 (wasm32 has no GPU). The placeholder JsError is gone — this is a real path now. - Cargo.lock: pull llama-cpp-2 fork at 606c4759 (load_from_buffer). cargo check --workspace passes on native. cargo check --target wasm32-unknown-emscripten -p nobodywho-wasm goes through the full toolchain and halts at the expected 'Could not detect Emscripten sysroot' message when emcc isn't installed.
…+ cfg Step 2b: makes the background worker pattern (ChatHandleAsync, EncoderAsync, CrossEncoderAsync, plus the sync ChatHandle) compile and behave correctly on wasm32-unknown-emscripten while preserving native behavior bit-for-bit. Changes: * WorkerGuard: channel field changed from std::sync::mpsc::Sender to tokio::sync::mpsc::UnboundedSender. join_handle field cfg-gated to non-wasm. Two constructors: 3-arg on native, 2-arg on wasm. * Worker loops: `std::sync::mpsc::channel()` -> `tokio::sync::mpsc::unbounded_channel()`. Native loops use `blocking_recv()` inside std::thread::spawn — semantically identical to the previous blocking recv. Wasm uses `wasm_bindgen_futures::spawn_local` + `recv().await` — a single-threaded cooperative pump on the JS event loop. Decode work blocks the event loop during each message; Web-Worker parallelism is a follow-up. * core/Cargo.toml: add wasm-bindgen-futures to the wasm32 deps block. * WorkerGuard Drop: native joins the thread (unchanged ordering wrt closing the sender so LLAMA_BACKEND stays alive during teardown). Wasm: dropping the sender causes recv().await to return None and the spawn_local future completes on its next poll — no explicit join. Files touched: - core/Cargo.toml - core/src/llm.rs (WorkerGuard struct, constructors, Drop) - core/src/chat.rs (ChatHandle::new, ChatHandleAsync::new) - core/src/encoder.rs (EncoderAsync::new) - core/src/crossencoder.rs (CrossEncoderAsync::new) cargo check --workspace (native) passes. cargo check --target wasm32-unknown-emscripten -p nobodywho-wasm passes through everything Rust and halts at the expected 'install emcc' Emscripten sysroot check.
cargo build --target wasm32-unknown-emscripten -p nobodywho-wasm produces a 113MB debug nobodywho_wasm.wasm artifact (~10-20MB after release strip). End-to-end: bindgen + cc + cmake + emscripten + wasm-bindgen + wasm-ld all in one pipeline. Two changes here: * wasm/src/lib.rs — rewrite the wasm-bindgen surface to return js_sys::Promise manually rather than `pub async fn`. The macro-generated futures captured non-UnwindSafe types (tokio::sync::Mutex, tokio mpsc receivers, TokenStreamAsync's interior) which made future_to_promise reject them at compile time. The new `promisify` helper wraps each future body in AssertUnwindSafe + catch_unwind, asserting the unwind-safety we get for free in a single-threaded JS environment and turning any panic into a rejected promise rather than tearing down the wasm instance. * core/src/chat.rs — gate `ChatBuilder::build()` and the sync `ChatHandle` type/impl to non-wasm32. The sync variant uses `.completed()` which blocks; there's nothing to block on in a browser tab. Use `build_async` and await on the returned future instead. * Cargo.lock — pull the llama-cpp-rs fork at 85987657, which adds the -fPIC fixes (configure_emscripten_cc for the wrapper shims, and CMAKE_C_FLAGS=-fPIC + CMAKE_POSITION_INDEPENDENT_CODE=ON for llama.cpp's own cmake build). Without those, wasm-ld errors with 'relocation R_WASM_MEMORY_ADDR_LEB cannot be used against symbol …; recompile with -fPIC' for every static symbol in the C/C++ code. Native cargo check --workspace still passes.
…main
Three things, one commit:
- cargo fmt --all on the two sites the linting CI flagged
(core/src/llm.rs:185 get_model_from_bytes signature joined onto one
line; wasm/src/lib.rs:57 promisify .map() closure collapsed).
- cfg-gate the items that pr1's wasm-target build was warning about,
dropping warning count from 9 to 0:
* core/src/llm.rs — Read/Write, Duration, info_span imports.
* core/src/chat.rs — sync TokenStream + impl (only consumers are the
sync ChatHandle, already gated, and the Python binding which is
native-only). Doc-comment notes blocking_recv would deadlock the
JS event loop, so this is also a footgun guard.
* core/src/memory.rs — GgufModelInfo, read_gguf_model_info,
estimate_per_layer_bytes, plan_model_loading, plus the now-unused
std::path::Path import. plan_context / ContextPlan stay public:
they are called from chat-context setup which runs on wasm too.
- regenerate Cargo.nix and crate-hashes.json. The pre-existing files
pointed at marek-hradil/llama-cpp-rs and didn't know about the
nobodywho-wasm workspace member, so nix flake check failed.
Plus a small native error-path test for get_model_from_bytes in
core/src/llm.rs (gated off Windows MSVC, mirroring the function's own
gate). Tests the InvalidModel error path; the success path on Linux
hits an upstream llama.cpp issue with load_from_buffer returning NULL,
tracked separately.
Includes the rebase onto main (PR #527, watchOS Metal-skip): adopts
main's visionos/watchos exclusions on the Vulkan target cfg + the
updated comment, while keeping pr1's switch to the
nobodywho-ooo/llama-cpp-rs#wasm fork. The iOS Metal block stays
explicit (with wasm-fork URL) until the wasm fork rebases on top of
marek-hradil's main to pick up the upstream watchOS Metal-skip patch.
Cargo.lock incorporates the windows-sys 0.52 -> 0.61 transitive bump
from main.
Two nix bugs surfaced and worked around (worth knowing for downstream
PRs):
1. crate2nix's '-h crate-hashes.json' caches by name@version, not by
full URL+commit — URL changes silently reuse the old tree's hash.
2. pkgs.fetchgit defaults fetchSubmodules = true. nix-prefetch-git
defaults to --no-fetch-submodules. The llama-cpp-rs repo vendors
llama.cpp as a submodule, so the with-submodules tree hashes
differently. Manually putting the correct (with-submodules) hash
into crate-hashes.json before running crate2nix is the workaround.
Combined documentation pass spanning the Emscripten →
wasm32-unknown-unknown + wasi-sdk exploration:
- first README rewrite once the Emscripten build started working
- documented the wasm-bindgen + Emscripten incompatibility that
eventually drove the switch to wasm32-unknown-unknown
- rewrote the outstanding-work section after the Path B trial gave
a clearer picture of remaining tasks
- reflected Path B's C++ side completion with Rust mtmd gating in
core as the next step
Path B prep: introduce a `mtmd` feature on `nobodywho` (default on) that gates multimodal support. The wasm binding opts out via `default-features = false` so the underlying llama-cpp-2 doesn't enable its own `mtmd` feature for wasm builds. Native unchanged (feature is on by default, so every existing code path stays compiled). Wasm build no longer pulls in mtmd's miniaudio dependency (which uses pthread sched APIs wasi-libc lacks). This commit gates the easy parts: - core/Cargo.toml: mtmd feature definition, propagation to llama-cpp-2 - core/src/errors.rs: FailedReadingMediaEmbeddings variant - core/src/llm.rs: MtmdInputChunks import + read_media_embeddings method - core/src/template.rs: MtmdBitmap import + RenderedChat.bitmaps field - wasm/Cargo.toml: default-features = false on nobodywho dep Still pending for the full wasm build: - core/src/chat.rs: ChatContext::bitmaps field + ~5 methods that use it - core/src/tokenizer.rs: TokenizerChunk::Image/Audio variants (~30 sites) The ChatContext refactor in particular is non-mechanical — bitmaps are the central state for multimodal chat, and gating them out cleanly means deciding whether to factor out a non-multimodal ChatContext or to live with cfg-attributes on every method. That's a design call I want to leave for review.
Reverts the premature `default-features = false` on the nobodywho dep in the wasm crate. The mtmd cargo feature is added on core, but until core/src/chat.rs and core/src/tokenizer.rs are fully gated for mtmd-less builds (the ChatContext bitmaps refactor + TokenizerChunk variant gating), opting out here breaks the Emscripten path that previously worked. Once those two files are gated, switch this back to default-features = false to enable wasm32-unknown-unknown builds. Also: pulled the fork's wasm branch to 2b592da, which tightens the mtmd gate in llama-cpp-2/src/lib.rs to allow Emscripten (was: any wasm32).
cargo build --target wasm32-unknown-unknown -p nobodywho-wasm +
wasm-bindgen --target web now produces a complete `pkg/` directory:
pkg/
├── nobodywho_wasm.d.ts 9.1K — TS typings for the public API
├── nobodywho_wasm.js 37K — JS loader / wasm-bindgen glue
├── nobodywho_wasm_bg.wasm 21M — debug; ~5-7M release-stripped
└── nobodywho_wasm_bg.wasm.d.ts
JS consumers can:
import init, { Model, Chat } from './pkg/nobodywho_wasm.js';
await init();
const model = await Model.loadBytes(uint8Array);
const chat = new Chat(model, { contextSize: 2048 });
const stream = await chat.ask('Hello');
// ... await stream.nextToken() in a loop
The whole pipeline that now runs:
bindgen + cc::Build (wasi-sdk clang)
+
cmake (llama.cpp -> wasm32-wasip1 + wasi-libc, with the fork's
source-level patches applied at build time)
v
wasm-bindgen post-processor
v
pkg/ ready for npm publish
What made it click on this iteration:
* core/src/chat.rs: gate the bitmap-construction block in ChatWorker::ask()
behind the mtmd feature, and the bitmap-extraction in
sync_context_with_render. ChatContext's bitmaps field + the four
methods that touch it (add_bitmaps, garbage_collect_bitmaps,
create_bitmap_id, remove_bitmaps) are also feature-gated.
* Cargo.lock: pulls fork branch at 360e169 — the design pivot that
unblocked everything. Earlier the fork gated mtmd off entirely on
wasm-unknown, which forced ~30 cfg sites in nobodywho/core/tokenizer.rs.
Cleaner approach: keep bindgen running for mtmd headers (FFI types
exist in bindings.rs) but skip compiling the C++. The Rust wrapper
module compiles normally; mtmd_* symbols become undefined imports in
the .wasm — the JS host can polyfill them, but in practice the wasm
binding doesn't expose multimodal so they're never called.
* wasm/README.md: rewritten to reflect the working pipeline. Emscripten
path documented as alternate/fallback (wasm-bindgen-cli doesn't accept
Emscripten output, so it's not useful for npm distribution).
Adds three smoke tests under wasm/examples/:
- smoke.html: in-browser test page (open via local HTTP server, requires
a bundler or manual env-stub for the unresolved imports).
- smoke.mjs: ESM test trying wasm-bindgen's web-target glue from Node.
Fails because wasm-bindgen-cli emits `import * as ... from 'env'`
for non-__wbindgen import groups, and 'env' isn't a real module.
Documented as a known wasm-bindgen-cli limitation; npm distribution
needs --target bundler + a bundler that aliases 'env'.
- smoke-manual.mjs: bypasses wasm-bindgen's auto-init and instantiates
the wasm directly with Proxy-based stub imports. THIS WORKS.
Output of smoke-manual.mjs after the -fexceptions fix:
Wasm size: 20.6 MB
Compiling…
Imports: 89
./nobodywho_wasm_bg.js: 53 entries
env: 22 entries
wasi_snapshot_preview1: 14 entries
Instantiating…
Exports: 74
Class-like: chat_ask, chat_new, chat_reset, chat_resetHistory,
encoder_encode, encoder_new, init, model_loadBytes,
tokenstream_completed, tokenstream_nextToken
Proves the wasm is well-formed: compiles under V8 (so passes the
wasm-validation that previously failed on exception-model mismatch),
instantiates with stubs, and exposes all 10 expected class methods.
Going further (actually calling chat_ask) requires real imports for
the wasi_snapshot_preview1 syscalls (fd_write etc.) and for the
__wbg_* glue from nobodywho_wasm_bg.js — which is what an npm package
would ship via --target bundler + a WASI polyfill like
@bjorn3/browser_wasi_shim.
Also: pkg-node/ added to .gitignore (build output).
✓ wasi.initialize ran _initialize
✓ wasm wired up
✓ init() ok
Loading model from /tmp/bge-small.gguf… (35 MB GGUF)
✓ model loaded
✓ encoder created
✓ embedding generated: 384 dimensions
first 8: [-0.6244, -0.5940, 0.5545, -0.6085, -0.1348, 0.1800, 0.6621, 0.3490]
This is actual llama.cpp embedding inference running inside V8 from a
real GGUF model loaded as Uint8Array, with the full pipeline end-to-end:
Uint8Array → Model.loadBytes → fmemopen → llama_model_load_from_file_ptr
→ wasi-libc syscalls (via node:wasi) → llama.cpp eval → Float32Array
Three fixes to get here:
1. wasm/src/lib.rs — override __cxa_atexit to a no-op.
rust-lld 22.1's wasm driver doesn't understand --mexec-model=reactor
(the flag is in upstream lld but not in rust-lld's option table). So
the wasm stays in 'command' exec model, where every export is wrapped
with __wasm_call_ctors + __wasm_call_dtors. The dtor walk iterates
atexit-registered handlers and trips on a signature mismatch:
RuntimeError: function signature mismatch
at __funcs_on_exit
at __wasm_call_dtors
at <any export>.command_export
Suppressing the atexit registrations entirely makes the dtor walk a
no-op. Global destructors don't run at module shutdown, which is
harmless for a wasm instance that lives for the lifetime of the
process.
2. core/src/llm.rs — hardcode n_threads=1 on wasm32.
std::thread::available_parallelism() returns Err on
wasm32-unknown-unknown ('the number of hardware threads is not known
for the target platform'). The Worker init unwrapped the error,
failing immediately after model load. cfg the line so wasm32 uses 1.
3. core/src/tokenizer.rs — inline the mtmd_default_marker literal on wasm.
tokenize_text calls llama_cpp_2::mtmd::mtmd_default_marker() to learn
the marker string to split text on. On wasm we don't compile mtmd's
C++, so that resolves to an unresolved env import. Inline the same
'<__media__>' literal llama.cpp returns — wasm consumers never have
media in the text anyway, so the split produces one chunk covering
the whole input.
Also adds wasm/examples/run.mjs — Node runner that:
- Reads the bundler-target .wasm bytes and bg.js glue
- Wires up node:wasi for wasi_snapshot_preview1 imports
- Provides Proxy-based stubs for the remaining env imports (mtmd_*, etc.)
- Calls wasi.initialize, __wbg_set_wasm, __wbindgen_start
- Loads a GGUF and runs Encoder.encode (--encode flag) or Chat.ask
Usage:
node wasm/examples/run.mjs # smoke
node wasm/examples/run.mjs --encode ./model.gguf 'text' # embedding
node wasm/examples/run.mjs ./model.gguf 'prompt' # chat
Chat path with a chat-style GGUF should also work once we have such a
model handy (the bge-small only supports the Encoder API).
- wasm/README.md: rewritten to reflect the working state. Documents the verified embedding-inference result (with first 8 dims of an actual BGE-small embedding), the full build pipeline, the runtime workarounds (and why each), and outstanding follow-ups. - wasm/examples/browser.html: single-file browser demo. Loads @bjorn3/browser_wasi_shim from esm.sh, manually instantiates the wasm with WASI + bg.js glue + env stubs, lets the user upload a GGUF via <input type=file>, runs Encoder.encode, and prints the embedding. Mirrors the working run.mjs but in a browser context. Usage: cd nobodywho/wasm && python3 -m http.server 8000, then open http://localhost:8000/examples/browser.html. - wasm/package.json.tpl: template for the npm package.json that would ship pkg-bundler/. Includes the right `files`, `main`, `types`, and `@bjorn3/browser_wasi_shim` as a peer dep. Becomes pkg-bundler/package.json in the publish step (separate commit will add the script).
Bumps llama-cpp-2 to 10d12300 (wasm fork merged with marek-hradil/main — brings in PRs #7/8/9 watchOS Metal-skip). Drops the explicit iOS Metal block now that the wasm fork has marek's auto-Metal-detect logic. Hash: 18pwz1r43dj6918dajlg61ak9zlhwazsblqj6hv9aj0qaks7rz4n (nix-prefetch-git --fetch-submodules).
b86b598 to
9d120bb
Compare
8d59f88 to
699d318
Compare
125ee55 to
38df4d2
Compare
699d318 to
341ffa3
Compare
28e76e6 to
86327c1
Compare
b021e3a to
000e0b8
Compare
86327c1 to
5cc0384
Compare
Chat.ask path on wasm was failing at `<Prompt as Display>::fmt` because
that function called `llama_cpp_2::mtmd::mtmd_default_marker()` to get
the media-marker string for joining text + media parts — the same
unresolved env import that broke tokenize_text earlier. Two other call
sites (in ProjectionModel::from_path and ProjectionModel::tokenize) had
the same problem but are in unreachable mtmd-only code paths.
Replace all four call sites with a new `mtmd_marker_string()` helper
that's cfg-gated on `target_arch = "wasm32"`:
- native: calls `llama_cpp_2::mtmd::mtmd_default_marker` (unchanged).
- any wasm32 target (unknown-unknown, emscripten, wasip1, ...): returns
the literal "<__media__>". Necessary for wasm32-unknown-unknown which
can't resolve the mtmd C++ symbol; harmless for the other wasm32 OSes
because the literal is the same string llama.cpp's mtmd_default_marker
returns.
Real chat output now works end-to-end:
$ node wasm/examples/run.mjs /tmp/Qwen2.5-0.5B-Instruct-Q4_K_M.gguf 'Hello'
✓ wasi.initialize ran _initialize
✓ wasm wired up
✓ model loaded
✓ chat created
Asking: "Hello"
Response: Hello! How can I assist you today? If you have any questions
or need help with something, feel free to ask.
✓ produced 25 tokens
Done.
That's a real Qwen 2.5 0.5B Instruct response from a 379 MB GGUF loaded
as Uint8Array, running entirely inside V8.
…ction The existing `//` block above `__cxa_atexit` explained the rust-lld 22.1 / command-exec-model / dtor-walk causal chain well, but it lived as a code comment rather than rustdoc. Two consequences: 1. `cargo clippy --no-deps --target wasm32-unknown-unknown -p nobodywho-wasm -- -D warnings` failed on the function because `clippy::missing_safety_doc` requires a `# Safety` section in rustdoc on `pub unsafe fn`. 2. The rationale didn't surface in generated docs or IDE hover. Convert the block to `///` rustdoc verbatim (no rewording of the existing prose) and add a `# Safety` section explaining why this implementation is trivially safe to call: it ignores all three arguments and returns success, so there's no UB path regardless of what handlers libc++ tries to register. The cost (silently dropping every atexit registration) is acceptable for the wasm-instance- lifetime reason the body already documents. Native target unaffected — function is cfg-gated to wasm32. The PR- stated check (`cargo clippy --no-deps -- -D warnings` without `--target`) still passes either way; this fixes the wasm-target clippy gate so a future CI addition won't fail.
…delUrl
Replace modelBytes with modelPath in all Node examples and smoke tests.
Replace Model.loadBytes with Model.load({modelPath}) for Encoder and
CrossEncoder demos. Update browser HTML examples to use
Model.load({modelUrl}) instead of fetchModelBytes + Model.loadBytes.
Fix compilation error: map String errors to JsError in Model.load,
remove stale model_bytes/mmproj_bytes from ChatCreateParsed constructor.
All 14 smoke tests pass.
Update all code examples, status table, smoke test table, and
explanatory sections to reflect:
- Model.load({modelUrl | modelPath}) replacing Model.loadBytes
- Chat.create({modelUrl | modelPath}) replacing modelBytes
- NODEFS for Node disk access
- mmap syscall overrides (CPU_Mapped)
- Cache API tee'd streaming for modelUrl
- Removed Model caching helpers section (now internal)
- Removed >2 GiB readFileSync limitation (no longer applies)
Take main's Docusaurus button layout, add JS (wasm) link.
Replace plain-object sampler specs with a typed SamplerConfig
wasm-bindgen class. Sampler must now come from SamplerBuilder or
SamplerPresets — no more raw { temperature: 0.7 } objects.
- SamplerConfig wraps core's SamplerConfig, has toJSON/fromJSON
- SamplerBuilder terminal methods return SamplerConfig (not Object)
- SamplerPresets methods return SamplerConfig (not Object)
- Delete SamplerSpec, ConstraintSpec, build_sampler
- ChatOptions.sampler deserializes core SamplerConfig directly
- Main thread serializes SamplerConfig before postMessage to worker
- getSamplerConfig/setSamplerConfig use SamplerConfig class
Smoke tests need updating to use the typed API.
- constraint-smoke: use SamplerPresets.constrainWithRegex/JsonSchema
- sampler-smoke: use SamplerPresets.greedy(), SamplerBuilder chain
- sampler-ergo-smoke: assert.ok on opaque SamplerConfig instances
- sampler-extra-smoke: assert.ok on opaque instances, replace
deprecated SamplerPresets.json() with constrainWithJsonSchema({})
- setters-smoke: setSamplerConfig takes SamplerConfig instance
- build-pkg-emscripten.sh: add SamplerConfig.__wrap sed patch
- lib.rs: fix JsCast issue — use toJSON() for sampler serialization
across postMessage boundary
13/14 smoke tests pass. sampler-extra [6] needs a wasm rebuild to
verify the constrainWithJsonSchema({}) replacement.
Split into two jobs: - `lint`: always runs — cargo test -p nobodywho-js (fast, no wasm) - `build-and-test`: only on full_ci — wasm build + model smoke tests Pass full_ci from main.yml (same pattern as swift_ci, python_ci, etc). Update to actions/checkout@v5 and actions/cache@v5.
- get_model_from_bytes + its test in core/llm.rs (nothing calls it after removing Model.loadBytes) - ENOENT constant in syscall_imports.rs (was used by deleted newfstatat) - lstat parameter on stat_into_buf (always false after removing lstat64)
- Cargo.toml: remove unused web-sys features (Request, RequestInit, Headers), update stale comment - build.rs: remove redundant -lnodefs.js (already in post-link script) - lib.rs: remove dead on_progress field from ChatCreateParsed and onDownloadProgress key filter (no caller reads the field)
- README: remove get_model_from_bytes reference - core/llm.rs: update doc comment (modelBytes → modelUrl) - tests/lint.rs: update comments to remove ConstraintSpec/modelBytes references, simplify test descriptions
Replace the single-threaded wasm worker wrapper (separate wasm instance per Web Worker) with Emscripten pthreads (SharedArrayBuffer, real threads via pthread_create). Inference now runs on a background pthread using the same std::thread::spawn code path as native. Build changes: - .cargo/config.toml: +atomics,+bulk-memory,+mutable-globals target features - build.rs: -pthread and -sDEFAULT_PTHREAD_STACK_SIZE=2MB linker flags - build-pkg-emscripten.sh: cargo +nightly -Zbuild-std=std,panic_abort, -pthread in post-link, sed patches for deferred _initialize and wasm-bindgen binding in pthread workers Core unification (-190 lines): - Remove spawn_local wasm path from chat/encoder/crossencoder - Remove wasm-only WorkerGuard variant (unified with JoinHandle) - Use available_parallelism() instead of hardcoded n_threads=1 JS binding rewrite (-1770 lines): - Delete worker dispatcher (runInWorker, message protocol, ChatState, WorkerStreamState, worker-backed Chat/TokenStream) - New Chat wraps ChatHandleAsync directly - New TokenStream wraps TokenStreamAsync - Delete __nbw_spawn_worker, __nbw_wrap_node_worker from pre.js All 8 smoke tests pass. No API changes — Chat.create, chat.ask, for-await streaming, tool calling, getters/setters all unchanged.
Create a persistent ggml threadpool during Worker init and attach it to the llama context. This avoids ggml's disposable threadpool pattern (pthread_create mid-compute) which deadlocks on Emscripten because pthread_create is async. With the persistent pool, ggml worker threads are pre-created during init when the event loop is free, and reused across graph computes. Benchmark (Qwen3-0.6B-Q4_K_M, Node.js, Apple Silicon 10 cores): - Single-threaded (before): 18.5 tok/s, 1 active core - Multi-threaded (now): 57.0 tok/s, 22 active cores (3.1x)
Match the pre-created pthread pool to the actual CPU core count instead of a hardcoded 16. Avoids wasting memory on machines with fewer cores and under-provisioning on machines with more.
Remove js_to_serializable_parts, strip_keys, ChatOptions struct — all were only used by the old Web Worker message protocol. Fix stale comments referencing postMessage, worker dispatcher, RefCell<ChatState>.
ChatOptions struct was removed — Chat.create now parses options via js_sys::Reflect instead of serde deserialization. The deny_unknown_fields lint test is no longer applicable.
The pthreads build requires `cargo +nightly -Zbuild-std=std,panic_abort` to recompile std with atomics. Add nightly toolchain and rust-src component to the js_ci build-and-test job.
The action applies `components` to all toolchains in the list, so `rust-src --toolchain nightly` was mis-parsed as three components. Use plain `components: rust-src` — added to both stable and nightly.
CI's setup-rust-toolchain sets RUSTFLAGS=-D warnings, which overrides the [target.wasm32-unknown-emscripten] rustflags in .cargo/config.toml entirely — silently dropping the +atomics,+bulk-memory,+mutable-globals target features and breaking the shared-memory link (wasm-ld: error: --shared-memory is disallowed ... not compiled with 'atomics'). Set RUSTFLAGS directly in the build script so the build is self-contained and doesn't depend on config.toml being the active rustflags source. Verified by reproducing locally with RUSTFLAGS=-D warnings set.
Contributor
Author
|
/full-ci |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
JavaScript binding for NobodyWho — runs local LLMs in a browser tab (or any wasm host) via llama.cpp compiled to wasm32 with Emscripten. The same core engine that powers the Python, Godot, Flutter, and Uniffi bindings, exposed to JS through
wasm-bindgen.This PR consolidates the entire stack that was previously staged across #526, #529, #530 — those have been closed in favour of this single PR per maintainer request. All commits preserved.
What's delivered
Core surface
API status
Model.load({modelUrl | modelPath, mmprojUrl | mmprojPath})Encoder.encode(text)→Float32ArrayCrossEncoder.rank(query, docs)/rankAndSort(...)Chat.create({modelUrl | modelPath, ...})→ChatChat.ask(prompt)→TokenStreamfor await (const tok of stream)(async iteration)TokenStream.next()/.completed()Chat.terminate()→Promise<void>SamplerConfig/SamplerBuilder/SamplerPresetsconstrainWithJsonSchema()/constrainWithRegex()/constrainWithGrammar()Tool.fromFn(...), sync + async callbacks)Image.fromBytes/Audio.fromBytes)Chat.create({modelUrl})— streaming download + Cache APIChat.create({modelPath})— NODEFS disk access (Node-only)CPU_Mapped)Threading model: Emscripten pthreads
The wasm build uses Emscripten pthreads (
-pthread,+atomics,+bulk-memory,+mutable-globalsRust target features). This enablesstd::thread::spawnon wasm — the same code path as native.ChatHandleAsync::new()spawns a real pthread for inference, and llama.cpp's ggml threadpool usesavailable_parallelism()to pickn_threads(maps tonavigator.hardwareConcurrencyin browser,os.cpus().lengthin Node).Browser requirement: serving origin must set
Cross-Origin-Opener-Policy: same-originandCross-Origin-Embedder-Policy: require-corpheaders forSharedArrayBuffer.Build requirement: Rust nightly with
-Zbuild-std=std,panic_abort(pre-compiled std forwasm32-unknown-emscriptenlacks atomics).The
Chatclass directly wraps the coreChatHandleAsync— no Web Worker wrapper, no message protocol, no RPC bridge. Token streaming works through tokio channels across pthreads, same as native.Build target:
wasm32-unknown-emscriptenBuild pipeline:
Three temporary forks (all will go away when upstream merges):
nobodywho-ooo/wasm-bindgen— descriptor-interpreter fixes + Emscripten output mode + pthreads compatibility (skip thread/multivalue transforms, synthesize stack pointer shim from emscripten exports)nobodywho-ooo/llama-cpp-rs—CMAKE_SYSTEM_PROCESSOR=wasm32,-matomics -mbulk-memoryfor shared-memory link compat,MA_NO_*defines for miniaudio,-fexceptionsfor mtmdwalkingeyerobot/emscripten— the-sWASM_BINDGENflag (PR Add wasm-bindgen support emscripten-core/emscripten#23493)Multimodal (Path A — MEMFS-virtualized)
Vision and audio input work end-to-end through bytes.
Image.fromBytes(uint8)/Audio.fromBytes(uint8)write to content-hashed MEMFS paths; llama.cpp reads them via strong syscall overrides injs/src/syscall_imports.rs.Verified: Qwen3.5-0.8B vision, Qwen2-VL, Gemma 3 vision, Qwen3-ASR audio (WAV/MP3/FLAC).
Tool calling
Tool.fromFn(name, description, jsonSchema, callback)— the callback runs directly in the inference context (no RPC bridge needed since pthreads share the same wasm instance). Both sync and async (Promise-returning) callbacks work.Structured output
Constraints via
SamplerPresets.constrainWithJsonSchema()/constrainWithRegex()/constrainWithGrammar(). llguidance works on Emscripten (itsclock_gettimerequirement is satisfied by Emscripten's libc).Core changes that affect other bindings
tokiofeatures split (rt-multi-threadnative vsrtwasm); native-only deps gated behindcfg(not(target_family = "wasm")).Workerchannel:std::sync::mpsc→tokio::sync::mpsc::unbounded_channel. Public API unchanged.std::thread::spawnused on all targets (including wasm, via Emscripten pthreads). No morespawn_localwasm path.WorkerGuardunified — single struct withJoinHandleon all targets.n_threadsusesavailable_parallelism()on wasm (was hardcoded to 1).mtmdcargo feature (default-on),get_model_from_path,get_model_from_bytes,Tool::new_async,mtmd_marker_string().Native consumers (Python, Godot, Flutter, uniffi) see no API changes.
cargo check --workspacepasses cleanly.v1 limitations
Tested
cargo check --workspace: clean on native.cargo test -p nobodywho-js: lint tests pass.bash js/scripts/build-pkg-emscripten.sh: ~60s on M-series macOS.Type of change