feat: WebAssembly binding (full stack) by gergelyvagujhelyi · Pull Request #531 · nobodywho-ooo/nobodywho

gergelyvagujhelyi · 2026-05-13T13:21:15Z

Description

JavaScript binding for NobodyWho — runs local LLMs in a browser tab (or any wasm host) via llama.cpp compiled to wasm32 with Emscripten. The same core engine that powers the Python, Godot, Flutter, and Uniffi bindings, exposed to JS through wasm-bindgen.

This PR consolidates the entire stack that was previously staged across #526, #529, #530 — those have been closed in favour of this single PR per maintainer request. All commits preserved.

What's delivered

Core surface

import createNobodyWhoModule from '@nobodywho/js';
const m = await createNobodyWhoModule();

// Chat — inference runs on a background pthread (Emscripten pthreads),
// same std::thread::spawn code path as native. Pass modelUrl (browser)
// or modelPath (Node).
const chat = await m.Chat.create({
  modelUrl: 'https://huggingface.co/.../model.gguf',
  systemPrompt: 'You are a helpful assistant.',
  templateVariables: { enable_thinking: false },
  sampler: new m.SamplerBuilder().temperature(0.7).topK(40).topP(0.95, 1).dist(),
  tools: [getWeather, lookupTime],
});

// Streaming: per-token async iteration
for await (const tok of chat.ask('What is the capital of Denmark?')) {
  process.stdout.write(tok);  // each token arrives as the model samples it
}

// Streaming: collect the full response
const reply = await chat.ask('Hi').completed();

// Both return a TokenStream with:
//   .next()       → Promise<{value: string, done: boolean}>  (async iterator)
//   .completed()  → Promise<string>  (buffered full response)
//   [Symbol.asyncIterator] support for `for await`

// Multimodal: pass an array of mixed string | Image | Audio parts
const img = m.Image.fromBytes(jpegBytes);
const desc = await chat.ask(['Describe this:', img]).completed();

// Tools — sync or Promise-returning callbacks both work
const tool = m.Tool.fromFn(
  'get_weather',
  'Get current weather for a city',
  { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
  async (args) => fetch(`https://api.example/weather?city=${args.city}`).then(r => r.text()),
);

// Structured output — constrain generation to a JSON schema
const jsonChat = await m.Chat.create({
  modelUrl: 'https://...',
  sampler: m.SamplerPresets.constrainWithJsonSchema({ type: 'object', properties: { city: { type: 'string' } } }),
});

// Embeddings + reranking — share the same Model instance
const model = await m.Model.load({ modelUrl: 'https://...' });
const encoder = new m.Encoder(model, 2048);
const vec = await encoder.encode('the quick brown fox');
const xenc = new m.CrossEncoder(model, 2048);
const ranked = await xenc.rankAndSort('query', ['doc1', 'doc2']);

API status

Surface	Status
`Model.load({modelUrl \| modelPath, mmprojUrl \| mmprojPath})`	✅ verified
`Encoder.encode(text)` → `Float32Array`	✅ verified
`CrossEncoder.rank(query, docs)` / `rankAndSort(...)`	✅ verified
`Chat.create({modelUrl \| modelPath, ...})` → `Chat`	✅ verified
`Chat.ask(prompt)` → `TokenStream`	✅ verified
`for await (const tok of stream)` (async iteration)	✅ verified
`TokenStream.next()` / `.completed()`	✅ verified
`Chat.terminate()` → `Promise<void>`	✅ verified (no-op, kept for API compat)
`SamplerConfig` / `SamplerBuilder` / `SamplerPresets`	✅ verified
Structured output via `constrainWithJsonSchema()` / `constrainWithRegex()` / `constrainWithGrammar()`	✅ verified
Tool calling (`Tool.fromFn(...)`, sync + async callbacks)	✅ verified
Multimodal vision/audio (`Image.fromBytes` / `Audio.fromBytes`)	✅ verified
`Chat.create({modelUrl})` — streaming download + Cache API	✅ verified
`Chat.create({modelPath})` — NODEFS disk access (Node-only)	✅ verified
mmap-backed tensor loading (`CPU_Mapped`)	✅ verified

Threading model: Emscripten pthreads

The wasm build uses Emscripten pthreads (-pthread, +atomics,+bulk-memory,+mutable-globals Rust target features). This enables std::thread::spawn on wasm — the same code path as native. ChatHandleAsync::new() spawns a real pthread for inference, and llama.cpp's ggml threadpool uses available_parallelism() to pick n_threads (maps to navigator.hardwareConcurrency in browser, os.cpus().length in Node).

Browser requirement: serving origin must set Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers for SharedArrayBuffer.

Build requirement: Rust nightly with -Zbuild-std=std,panic_abort (pre-compiled std for wasm32-unknown-emscripten lacks atomics).

The Chat class directly wraps the core ChatHandleAsync — no Web Worker wrapper, no message protocol, no RPC bridge. Token streaming works through tokio channels across pthreads, same as native.

Build target: `wasm32-unknown-emscripten`

Build pipeline:

   nobodywho/core (Rust)  +  llama-cpp-2 fork (wasm-emscripten branch)
        |
        | cargo +nightly -Zbuild-std=std,panic_abort
        | emcc for C/C++ side, rustc for Rust side
        v
   wasm32-unknown-emscripten .wasm  (with -pthread, SharedArrayBuffer)
        |
        | patched wasm-bindgen-cli (pthreads-compatible)
        | + post-link emcc with -pthread + --js-library
        v
   pkg-bundler/
     ├── nobodywho_js.js          (Emscripten loader with pthread runtime)
     ├── nobodywho_js_bg.wasm     (linked wasm, shared memory, ~10 MB)
     ├── library_bindgen.js       (kept for debugging)
     └── pre.js                   (HEAP_DATA_VIEW shim, inlined by emcc)

Three temporary forks (all will go away when upstream merges):

nobodywho-ooo/wasm-bindgen — descriptor-interpreter fixes + Emscripten output mode + pthreads compatibility (skip thread/multivalue transforms, synthesize stack pointer shim from emscripten exports)
nobodywho-ooo/llama-cpp-rs — CMAKE_SYSTEM_PROCESSOR=wasm32, -matomics -mbulk-memory for shared-memory link compat, MA_NO_* defines for miniaudio, -fexceptions for mtmd
walkingeyerobot/emscripten — the -sWASM_BINDGEN flag (PR Add wasm-bindgen support emscripten-core/emscripten#23493)

Multimodal (Path A — MEMFS-virtualized)

Vision and audio input work end-to-end through bytes. Image.fromBytes(uint8) / Audio.fromBytes(uint8) write to content-hashed MEMFS paths; llama.cpp reads them via strong syscall overrides in js/src/syscall_imports.rs.

Verified: Qwen3.5-0.8B vision, Qwen2-VL, Gemma 3 vision, Qwen3-ASR audio (WAV/MP3/FLAC).

Tool calling

Tool.fromFn(name, description, jsonSchema, callback) — the callback runs directly in the inference context (no RPC bridge needed since pthreads share the same wasm instance). Both sync and async (Promise-returning) callbacks work.

Structured output

Constraints via SamplerPresets.constrainWithJsonSchema() / constrainWithRegex() / constrainWithGrammar(). llguidance works on Emscripten (its clock_gettime requirement is satisfied by Emscripten's libc).

Core changes that affect other bindings

tokio features split (rt-multi-thread native vs rt wasm); native-only deps gated behind cfg(not(target_family = "wasm")).
Worker channel: std::sync::mpsc → tokio::sync::mpsc::unbounded_channel. Public API unchanged.
std::thread::spawn used on all targets (including wasm, via Emscripten pthreads). No more spawn_local wasm path.
WorkerGuard unified — single struct with JoinHandle on all targets.
n_threads uses available_parallelism() on wasm (was hardcoded to 1).
New mtmd cargo feature (default-on), get_model_from_path, get_model_from_bytes, Tool::new_async, mtmd_marker_string().

Native consumers (Python, Godot, Flutter, uniffi) see no API changes. cargo check --workspace passes cleanly.

v1 limitations

4 GiB ceiling — wasm32 hard limit. Models + KV cache + compute must fit.
Browser COOP/COEP — pthreads require cross-origin isolation headers.
Audio: WAV/MP3/FLAC only — Ogg not supported under Emscripten.
OpenAI-typed-content chat templates (SmolVLM, some Phi-3-Vision) — not yet supported.

Tested

cargo check --workspace: clean on native.
cargo test -p nobodywho-js: lint tests pass.
bash js/scripts/build-pkg-emscripten.sh: ~60s on M-series macOS.
All 17 smoke tests pass under Node (forawait, sampler, tool, stop, history, setters, terminate, context-shift, constraint, audio, vision, modelpath, parity-extras, etc.)
Browser demos verified with COOP/COEP headers.

Type of change

New feature (non-breaking change which adds functionality)
Breaking change
Documentation update

Adds a new workspace member nobodywho-wasm that mirrors the Python binding's async API surface (Model, Chat, TokenStream, Encoder) via wasm-bindgen. Status: scaffold only. Compiles cleanly on native (rlib) so cargo check at the workspace root keeps working, but wasm-pack build --target web needs two prerequisites that aren't done yet: 1. The marek-hradil/llama-cpp-rs fork needs a wasm32 build path (Emscripten cmake). We'll carry our own fork as a patch carrier. 2. nobodywho/core needs cfg(target_arch = "wasm32") gating for std::thread, ureq downloads, and tokio rt-multi-thread, plus a get_model_from_bytes API for browser use. Both blockers are tracked in nobodywho/wasm/README.md and as TODO comments in the binding source. The binding's shape is independent of both.

Step 2a of the WASM binding rollout. Splits a few core dependencies so that `cargo check --target wasm32-unknown-unknown -p nobodywho` no longer fails immediately on dep resolution. Native builds are unchanged. - tokio: `rt-multi-thread` only on native; wasm gets plain `rt` (no OS threads on wasm32-unknown-unknown). - ureq: native only (raw TCP/TLS not available in a browser sandbox). - indicatif: native only (no terminal in a browser). - dirs: native only AND non-android (Android already reads /proc/self/cmdline directly; wasm has no filesystem). - `default_progress_callback` in llm.rs and the `indicatif::*` import: gated to non-wasm. Native callers unchanged. Still blocking wasm-pack build (tracked in wasm/README.md): - Step 1: llama-cpp-2 fork needs wasm32 build path - Step 2b: Worker refactor (std::thread::spawn -> spawn_local) - Step 2c: Model::load_bytes constructor cargo check --workspace passes; only pre-existing deprecation warnings in uniffi/godot/flutter remain.

The wasm branch (https://github.com/nobodywho-ooo/llama-cpp-rs/tree/wasm) is forked from marek-hradil/main at 8550f04e, so it inherits the llguidance / EOS-fix / lark / with_logits_ith_mut patches that core depends on today. It adds one commit (a35e66a) that scaffolds wasm32 detection in the build script — a no-op on native targets. Why this swap matters: - Native builds are functionally unchanged (the new branch's only added commit is wasm32-gated, native targets never see it). - The wasm32 build path now lives in a repo nobodywho controls. Step 1 of the WASM rollout (Emscripten cmake support, feature gating, load_from_buffer wrapper) lands on this branch. cargo check --workspace passes; only pre-existing deprecation warnings in uniffi/godot/flutter remain. When the marek-hradil/main moves forward with more patches, we'll rebase the wasm branch on top.

cargo check --target wasm32-unknown-unknown -p nobodywho-wasm now succeeds (with dead-code warnings only). Linking will still fail because the llama-cpp-rs wasm branch doesn't produce a wasm artifact yet, but every Rust crate in the workspace now type-checks under the wasm32 target. Changes: * grammar/gbnf/Cargo.toml — disable jsonschema default features. The only use is jsonschema::meta::is_valid (a pure in-memory metaschema check); defaults pull in resolve-http/resolve-file -> reqwest -> hyper -> tokio with the 'net' feature -> mio, which doesn't build on wasm32. Drops ~12 transitive crates from the dependency graph. * core/Cargo.toml — move monty (Python interpreter) and bashkit (virtual bash) into the not(target_arch = wasm32) deps block. Both have native- only requirements: monty has no wasm32 target; bashkit forces tokio/full which pulls mio + raw sockets. * core/src/tool_calling/mod.rs — cfg-gate Tool::python and Tool::bash (along with their monty/bashkit imports) to non-wasm. Wasm consumers can still build custom tools via Tool::new. * core/src/llm.rs — cfg-gate the model-loader infrastructure (get_model, get_model_async, parse_model_path, resolve_fancy_path_to_fs, the cache dir helpers, download_file, download_model_from_hf, download_model_from_url, ParsedModelPath enum) to non-wasm32. A future Model::load_from_bytes API will handle in-memory loading for wasm (see nobodywho/wasm/README.md Step 2c). * wasm/src/lib.rs — fix init() to use tracing_wasm::try_set_as_global_default instead of constructing a Layer and passing it to set_global_default (which needs a Subscriber).

Update Cargo.lock to point at the rebased nobodywho-ooo/llama-cpp-rs wasm branch (6c35d9a). That branch is now marek-hradil/main + Asbjorn's three cherry-picked Emscripten commits + a fix for two non-exhaustive matches he missed + WASM.md doc. Update wasm/README.md to reflect the new world: - Target is now wasm32-unknown-emscripten (not -unknown-unknown). - Document the emsdk install path so contributors can actually link. - Note that cargo check now panics with a clear sysroot message when emcc isn't installed, which is the correct behavior. - Drop the obsolete Step 1 block ("llama-cpp-2 needs a wasm32 build path"); that's now done. - Expand Step 2a recap with the additional cfg-gates that landed in 0e2f913 (gbnf, monty/bashkit, model-loader infra).

- core/src/llm.rs: add get_model_from_bytes(bytes, gpu_layers) which calls the new LlamaModel::load_from_buffer in the llama-cpp-2 fork (commit 606c4759 on the wasm branch). Bypasses every filesystem and HTTP code path — pure 'bytes in, Model out'. Gated to non-windows-MSVC to match the upstream wrapper. - wasm/src/lib.rs: Model.loadBytes(uint8Array) now delegates to get_model_from_bytes with gpu_layers=0 (wasm32 has no GPU). The placeholder JsError is gone — this is a real path now. - Cargo.lock: pull llama-cpp-2 fork at 606c4759 (load_from_buffer). cargo check --workspace passes on native. cargo check --target wasm32-unknown-emscripten -p nobodywho-wasm goes through the full toolchain and halts at the expected 'Could not detect Emscripten sysroot' message when emcc isn't installed.

…+ cfg Step 2b: makes the background worker pattern (ChatHandleAsync, EncoderAsync, CrossEncoderAsync, plus the sync ChatHandle) compile and behave correctly on wasm32-unknown-emscripten while preserving native behavior bit-for-bit. Changes: * WorkerGuard: channel field changed from std::sync::mpsc::Sender to tokio::sync::mpsc::UnboundedSender. join_handle field cfg-gated to non-wasm. Two constructors: 3-arg on native, 2-arg on wasm. * Worker loops: `std::sync::mpsc::channel()` -> `tokio::sync::mpsc::unbounded_channel()`. Native loops use `blocking_recv()` inside std::thread::spawn — semantically identical to the previous blocking recv. Wasm uses `wasm_bindgen_futures::spawn_local` + `recv().await` — a single-threaded cooperative pump on the JS event loop. Decode work blocks the event loop during each message; Web-Worker parallelism is a follow-up. * core/Cargo.toml: add wasm-bindgen-futures to the wasm32 deps block. * WorkerGuard Drop: native joins the thread (unchanged ordering wrt closing the sender so LLAMA_BACKEND stays alive during teardown). Wasm: dropping the sender causes recv().await to return None and the spawn_local future completes on its next poll — no explicit join. Files touched: - core/Cargo.toml - core/src/llm.rs (WorkerGuard struct, constructors, Drop) - core/src/chat.rs (ChatHandle::new, ChatHandleAsync::new) - core/src/encoder.rs (EncoderAsync::new) - core/src/crossencoder.rs (CrossEncoderAsync::new) cargo check --workspace (native) passes. cargo check --target wasm32-unknown-emscripten -p nobodywho-wasm passes through everything Rust and halts at the expected 'install emcc' Emscripten sysroot check.

cargo build --target wasm32-unknown-emscripten -p nobodywho-wasm produces a 113MB debug nobodywho_wasm.wasm artifact (~10-20MB after release strip). End-to-end: bindgen + cc + cmake + emscripten + wasm-bindgen + wasm-ld all in one pipeline. Two changes here: * wasm/src/lib.rs — rewrite the wasm-bindgen surface to return js_sys::Promise manually rather than `pub async fn`. The macro-generated futures captured non-UnwindSafe types (tokio::sync::Mutex, tokio mpsc receivers, TokenStreamAsync's interior) which made future_to_promise reject them at compile time. The new `promisify` helper wraps each future body in AssertUnwindSafe + catch_unwind, asserting the unwind-safety we get for free in a single-threaded JS environment and turning any panic into a rejected promise rather than tearing down the wasm instance. * core/src/chat.rs — gate `ChatBuilder::build()` and the sync `ChatHandle` type/impl to non-wasm32. The sync variant uses `.completed()` which blocks; there's nothing to block on in a browser tab. Use `build_async` and await on the returned future instead. * Cargo.lock — pull the llama-cpp-rs fork at 85987657, which adds the -fPIC fixes (configure_emscripten_cc for the wrapper shims, and CMAKE_C_FLAGS=-fPIC + CMAKE_POSITION_INDEPENDENT_CODE=ON for llama.cpp's own cmake build). Without those, wasm-ld errors with 'relocation R_WASM_MEMORY_ADDR_LEB cannot be used against symbol …; recompile with -fPIC' for every static symbol in the C/C++ code. Native cargo check --workspace still passes.

…main Three things, one commit: - cargo fmt --all on the two sites the linting CI flagged (core/src/llm.rs:185 get_model_from_bytes signature joined onto one line; wasm/src/lib.rs:57 promisify .map() closure collapsed). - cfg-gate the items that pr1's wasm-target build was warning about, dropping warning count from 9 to 0: * core/src/llm.rs — Read/Write, Duration, info_span imports. * core/src/chat.rs — sync TokenStream + impl (only consumers are the sync ChatHandle, already gated, and the Python binding which is native-only). Doc-comment notes blocking_recv would deadlock the JS event loop, so this is also a footgun guard. * core/src/memory.rs — GgufModelInfo, read_gguf_model_info, estimate_per_layer_bytes, plan_model_loading, plus the now-unused std::path::Path import. plan_context / ContextPlan stay public: they are called from chat-context setup which runs on wasm too. - regenerate Cargo.nix and crate-hashes.json. The pre-existing files pointed at marek-hradil/llama-cpp-rs and didn't know about the nobodywho-wasm workspace member, so nix flake check failed. Plus a small native error-path test for get_model_from_bytes in core/src/llm.rs (gated off Windows MSVC, mirroring the function's own gate). Tests the InvalidModel error path; the success path on Linux hits an upstream llama.cpp issue with load_from_buffer returning NULL, tracked separately. Includes the rebase onto main (PR #527, watchOS Metal-skip): adopts main's visionos/watchos exclusions on the Vulkan target cfg + the updated comment, while keeping pr1's switch to the nobodywho-ooo/llama-cpp-rs#wasm fork. The iOS Metal block stays explicit (with wasm-fork URL) until the wasm fork rebases on top of marek-hradil's main to pick up the upstream watchOS Metal-skip patch. Cargo.lock incorporates the windows-sys 0.52 -> 0.61 transitive bump from main. Two nix bugs surfaced and worked around (worth knowing for downstream PRs): 1. crate2nix's '-h crate-hashes.json' caches by name@version, not by full URL+commit — URL changes silently reuse the old tree's hash. 2. pkgs.fetchgit defaults fetchSubmodules = true. nix-prefetch-git defaults to --no-fetch-submodules. The llama-cpp-rs repo vendors llama.cpp as a submodule, so the with-submodules tree hashes differently. Manually putting the correct (with-submodules) hash into crate-hashes.json before running crate2nix is the workaround.

Combined documentation pass spanning the Emscripten → wasm32-unknown-unknown + wasi-sdk exploration: - first README rewrite once the Emscripten build started working - documented the wasm-bindgen + Emscripten incompatibility that eventually drove the switch to wasm32-unknown-unknown - rewrote the outstanding-work section after the Path B trial gave a clearer picture of remaining tasks - reflected Path B's C++ side completion with Rust mtmd gating in core as the next step

Path B prep: introduce a `mtmd` feature on `nobodywho` (default on) that gates multimodal support. The wasm binding opts out via `default-features = false` so the underlying llama-cpp-2 doesn't enable its own `mtmd` feature for wasm builds. Native unchanged (feature is on by default, so every existing code path stays compiled). Wasm build no longer pulls in mtmd's miniaudio dependency (which uses pthread sched APIs wasi-libc lacks). This commit gates the easy parts: - core/Cargo.toml: mtmd feature definition, propagation to llama-cpp-2 - core/src/errors.rs: FailedReadingMediaEmbeddings variant - core/src/llm.rs: MtmdInputChunks import + read_media_embeddings method - core/src/template.rs: MtmdBitmap import + RenderedChat.bitmaps field - wasm/Cargo.toml: default-features = false on nobodywho dep Still pending for the full wasm build: - core/src/chat.rs: ChatContext::bitmaps field + ~5 methods that use it - core/src/tokenizer.rs: TokenizerChunk::Image/Audio variants (~30 sites) The ChatContext refactor in particular is non-mechanical — bitmaps are the central state for multimodal chat, and gating them out cleanly means deciding whether to factor out a non-multimodal ChatContext or to live with cfg-attributes on every method. That's a design call I want to leave for review.

Reverts the premature `default-features = false` on the nobodywho dep in the wasm crate. The mtmd cargo feature is added on core, but until core/src/chat.rs and core/src/tokenizer.rs are fully gated for mtmd-less builds (the ChatContext bitmaps refactor + TokenizerChunk variant gating), opting out here breaks the Emscripten path that previously worked. Once those two files are gated, switch this back to default-features = false to enable wasm32-unknown-unknown builds. Also: pulled the fork's wasm branch to 2b592da, which tightens the mtmd gate in llama-cpp-2/src/lib.rs to allow Emscripten (was: any wasm32).

cargo build --target wasm32-unknown-unknown -p nobodywho-wasm + wasm-bindgen --target web now produces a complete `pkg/` directory: pkg/ ├── nobodywho_wasm.d.ts 9.1K — TS typings for the public API ├── nobodywho_wasm.js 37K — JS loader / wasm-bindgen glue ├── nobodywho_wasm_bg.wasm 21M — debug; ~5-7M release-stripped └── nobodywho_wasm_bg.wasm.d.ts JS consumers can: import init, { Model, Chat } from './pkg/nobodywho_wasm.js'; await init(); const model = await Model.loadBytes(uint8Array); const chat = new Chat(model, { contextSize: 2048 }); const stream = await chat.ask('Hello'); // ... await stream.nextToken() in a loop The whole pipeline that now runs: bindgen + cc::Build (wasi-sdk clang) + cmake (llama.cpp -> wasm32-wasip1 + wasi-libc, with the fork's source-level patches applied at build time) v wasm-bindgen post-processor v pkg/ ready for npm publish What made it click on this iteration: * core/src/chat.rs: gate the bitmap-construction block in ChatWorker::ask() behind the mtmd feature, and the bitmap-extraction in sync_context_with_render. ChatContext's bitmaps field + the four methods that touch it (add_bitmaps, garbage_collect_bitmaps, create_bitmap_id, remove_bitmaps) are also feature-gated. * Cargo.lock: pulls fork branch at 360e169 — the design pivot that unblocked everything. Earlier the fork gated mtmd off entirely on wasm-unknown, which forced ~30 cfg sites in nobodywho/core/tokenizer.rs. Cleaner approach: keep bindgen running for mtmd headers (FFI types exist in bindings.rs) but skip compiling the C++. The Rust wrapper module compiles normally; mtmd_* symbols become undefined imports in the .wasm — the JS host can polyfill them, but in practice the wasm binding doesn't expose multimodal so they're never called. * wasm/README.md: rewritten to reflect the working pipeline. Emscripten path documented as alternate/fallback (wasm-bindgen-cli doesn't accept Emscripten output, so it's not useful for npm distribution).

Adds three smoke tests under wasm/examples/: - smoke.html: in-browser test page (open via local HTTP server, requires a bundler or manual env-stub for the unresolved imports). - smoke.mjs: ESM test trying wasm-bindgen's web-target glue from Node. Fails because wasm-bindgen-cli emits `import * as ... from 'env'` for non-__wbindgen import groups, and 'env' isn't a real module. Documented as a known wasm-bindgen-cli limitation; npm distribution needs --target bundler + a bundler that aliases 'env'. - smoke-manual.mjs: bypasses wasm-bindgen's auto-init and instantiates the wasm directly with Proxy-based stub imports. THIS WORKS. Output of smoke-manual.mjs after the -fexceptions fix: Wasm size: 20.6 MB Compiling… Imports: 89 ./nobodywho_wasm_bg.js: 53 entries env: 22 entries wasi_snapshot_preview1: 14 entries Instantiating… Exports: 74 Class-like: chat_ask, chat_new, chat_reset, chat_resetHistory, encoder_encode, encoder_new, init, model_loadBytes, tokenstream_completed, tokenstream_nextToken Proves the wasm is well-formed: compiles under V8 (so passes the wasm-validation that previously failed on exception-model mismatch), instantiates with stubs, and exposes all 10 expected class methods. Going further (actually calling chat_ask) requires real imports for the wasi_snapshot_preview1 syscalls (fd_write etc.) and for the __wbg_* glue from nobodywho_wasm_bg.js — which is what an npm package would ship via --target bundler + a WASI polyfill like @bjorn3/browser_wasi_shim. Also: pkg-node/ added to .gitignore (build output).

✓ wasi.initialize ran _initialize ✓ wasm wired up ✓ init() ok Loading model from /tmp/bge-small.gguf… (35 MB GGUF) ✓ model loaded ✓ encoder created ✓ embedding generated: 384 dimensions first 8: [-0.6244, -0.5940, 0.5545, -0.6085, -0.1348, 0.1800, 0.6621, 0.3490] This is actual llama.cpp embedding inference running inside V8 from a real GGUF model loaded as Uint8Array, with the full pipeline end-to-end: Uint8Array → Model.loadBytes → fmemopen → llama_model_load_from_file_ptr → wasi-libc syscalls (via node:wasi) → llama.cpp eval → Float32Array Three fixes to get here: 1. wasm/src/lib.rs — override __cxa_atexit to a no-op. rust-lld 22.1's wasm driver doesn't understand --mexec-model=reactor (the flag is in upstream lld but not in rust-lld's option table). So the wasm stays in 'command' exec model, where every export is wrapped with __wasm_call_ctors + __wasm_call_dtors. The dtor walk iterates atexit-registered handlers and trips on a signature mismatch: RuntimeError: function signature mismatch at __funcs_on_exit at __wasm_call_dtors at <any export>.command_export Suppressing the atexit registrations entirely makes the dtor walk a no-op. Global destructors don't run at module shutdown, which is harmless for a wasm instance that lives for the lifetime of the process. 2. core/src/llm.rs — hardcode n_threads=1 on wasm32. std::thread::available_parallelism() returns Err on wasm32-unknown-unknown ('the number of hardware threads is not known for the target platform'). The Worker init unwrapped the error, failing immediately after model load. cfg the line so wasm32 uses 1. 3. core/src/tokenizer.rs — inline the mtmd_default_marker literal on wasm. tokenize_text calls llama_cpp_2::mtmd::mtmd_default_marker() to learn the marker string to split text on. On wasm we don't compile mtmd's C++, so that resolves to an unresolved env import. Inline the same '<__media__>' literal llama.cpp returns — wasm consumers never have media in the text anyway, so the split produces one chunk covering the whole input. Also adds wasm/examples/run.mjs — Node runner that: - Reads the bundler-target .wasm bytes and bg.js glue - Wires up node:wasi for wasi_snapshot_preview1 imports - Provides Proxy-based stubs for the remaining env imports (mtmd_*, etc.) - Calls wasi.initialize, __wbg_set_wasm, __wbindgen_start - Loads a GGUF and runs Encoder.encode (--encode flag) or Chat.ask Usage: node wasm/examples/run.mjs # smoke node wasm/examples/run.mjs --encode ./model.gguf 'text' # embedding node wasm/examples/run.mjs ./model.gguf 'prompt' # chat Chat path with a chat-style GGUF should also work once we have such a model handy (the bge-small only supports the Encoder API).

- wasm/README.md: rewritten to reflect the working state. Documents the verified embedding-inference result (with first 8 dims of an actual BGE-small embedding), the full build pipeline, the runtime workarounds (and why each), and outstanding follow-ups. - wasm/examples/browser.html: single-file browser demo. Loads @bjorn3/browser_wasi_shim from esm.sh, manually instantiates the wasm with WASI + bg.js glue + env stubs, lets the user upload a GGUF via <input type=file>, runs Encoder.encode, and prints the embedding. Mirrors the working run.mjs but in a browser context. Usage: cd nobodywho/wasm && python3 -m http.server 8000, then open http://localhost:8000/examples/browser.html. - wasm/package.json.tpl: template for the npm package.json that would ship pkg-bundler/. Includes the right `files`, `main`, `types`, and `@bjorn3/browser_wasi_shim` as a peer dep. Becomes pkg-bundler/package.json in the publish step (separate commit will add the script).

Bumps llama-cpp-2 to 10d12300 (wasm fork merged with marek-hradil/main — brings in PRs #7/8/9 watchOS Metal-skip). Drops the explicit iOS Metal block now that the wasm fork has marek's auto-Metal-detect logic. Hash: 18pwz1r43dj6918dajlg61ak9zlhwazsblqj6hv9aj0qaks7rz4n (nix-prefetch-git --fetch-submodules).

Chat.ask path on wasm was failing at `<Prompt as Display>::fmt` because that function called `llama_cpp_2::mtmd::mtmd_default_marker()` to get the media-marker string for joining text + media parts — the same unresolved env import that broke tokenize_text earlier. Two other call sites (in ProjectionModel::from_path and ProjectionModel::tokenize) had the same problem but are in unreachable mtmd-only code paths. Replace all four call sites with a new `mtmd_marker_string()` helper that's cfg-gated on `target_arch = "wasm32"`: - native: calls `llama_cpp_2::mtmd::mtmd_default_marker` (unchanged). - any wasm32 target (unknown-unknown, emscripten, wasip1, ...): returns the literal "<__media__>". Necessary for wasm32-unknown-unknown which can't resolve the mtmd C++ symbol; harmless for the other wasm32 OSes because the literal is the same string llama.cpp's mtmd_default_marker returns. Real chat output now works end-to-end: $ node wasm/examples/run.mjs /tmp/Qwen2.5-0.5B-Instruct-Q4_K_M.gguf 'Hello' ✓ wasi.initialize ran _initialize ✓ wasm wired up ✓ model loaded ✓ chat created Asking: "Hello" Response: Hello! How can I assist you today? If you have any questions or need help with something, feel free to ask. ✓ produced 25 tokens Done. That's a real Qwen 2.5 0.5B Instruct response from a 379 MB GGUF loaded as Uint8Array, running entirely inside V8.

…ction The existing `//` block above `__cxa_atexit` explained the rust-lld 22.1 / command-exec-model / dtor-walk causal chain well, but it lived as a code comment rather than rustdoc. Two consequences: 1. `cargo clippy --no-deps --target wasm32-unknown-unknown -p nobodywho-wasm -- -D warnings` failed on the function because `clippy::missing_safety_doc` requires a `# Safety` section in rustdoc on `pub unsafe fn`. 2. The rationale didn't surface in generated docs or IDE hover. Convert the block to `///` rustdoc verbatim (no rewording of the existing prose) and add a `# Safety` section explaining why this implementation is trivially safe to call: it ignores all three arguments and returns success, so there's no UB path regardless of what handlers libc++ tries to register. The cost (silently dropping every atexit registration) is acceptable for the wasm-instance- lifetime reason the body already documents. Native target unaffected — function is cfg-gated to wasm32. The PR- stated check (`cargo clippy --no-deps -- -D warnings` without `--target`) still passes either way; this fixes the wasm-target clippy gate so a future CI addition won't fail.

…delUrl Replace modelBytes with modelPath in all Node examples and smoke tests. Replace Model.loadBytes with Model.load({modelPath}) for Encoder and CrossEncoder demos. Update browser HTML examples to use Model.load({modelUrl}) instead of fetchModelBytes + Model.loadBytes. Fix compilation error: map String errors to JsError in Model.load, remove stale model_bytes/mmproj_bytes from ChatCreateParsed constructor. All 14 smoke tests pass.

Update all code examples, status table, smoke test table, and explanatory sections to reflect: - Model.load({modelUrl | modelPath}) replacing Model.loadBytes - Chat.create({modelUrl | modelPath}) replacing modelBytes - NODEFS for Node disk access - mmap syscall overrides (CPU_Mapped) - Cache API tee'd streaming for modelUrl - Removed Model caching helpers section (now internal) - Removed >2 GiB readFileSync limitation (no longer applies)

Take main's Docusaurus button layout, add JS (wasm) link.

Replace plain-object sampler specs with a typed SamplerConfig wasm-bindgen class. Sampler must now come from SamplerBuilder or SamplerPresets — no more raw { temperature: 0.7 } objects. - SamplerConfig wraps core's SamplerConfig, has toJSON/fromJSON - SamplerBuilder terminal methods return SamplerConfig (not Object) - SamplerPresets methods return SamplerConfig (not Object) - Delete SamplerSpec, ConstraintSpec, build_sampler - ChatOptions.sampler deserializes core SamplerConfig directly - Main thread serializes SamplerConfig before postMessage to worker - getSamplerConfig/setSamplerConfig use SamplerConfig class Smoke tests need updating to use the typed API.

- constraint-smoke: use SamplerPresets.constrainWithRegex/JsonSchema - sampler-smoke: use SamplerPresets.greedy(), SamplerBuilder chain - sampler-ergo-smoke: assert.ok on opaque SamplerConfig instances - sampler-extra-smoke: assert.ok on opaque instances, replace deprecated SamplerPresets.json() with constrainWithJsonSchema({}) - setters-smoke: setSamplerConfig takes SamplerConfig instance - build-pkg-emscripten.sh: add SamplerConfig.__wrap sed patch - lib.rs: fix JsCast issue — use toJSON() for sampler serialization across postMessage boundary 13/14 smoke tests pass. sampler-extra [6] needs a wasm rebuild to verify the constrainWithJsonSchema({}) replacement.

…ork)

Split into two jobs: - `lint`: always runs — cargo test -p nobodywho-js (fast, no wasm) - `build-and-test`: only on full_ci — wasm build + model smoke tests Pass full_ci from main.yml (same pattern as swift_ci, python_ci, etc). Update to actions/checkout@v5 and actions/cache@v5.

- get_model_from_bytes + its test in core/llm.rs (nothing calls it after removing Model.loadBytes) - ENOENT constant in syscall_imports.rs (was used by deleted newfstatat) - lstat parameter on stat_into_buf (always false after removing lstat64)

- Cargo.toml: remove unused web-sys features (Request, RequestInit, Headers), update stale comment - build.rs: remove redundant -lnodefs.js (already in post-link script) - lib.rs: remove dead on_progress field from ChatCreateParsed and onDownloadProgress key filter (no caller reads the field)

- README: remove get_model_from_bytes reference - core/llm.rs: update doc comment (modelBytes → modelUrl) - tests/lint.rs: update comments to remove ConstraintSpec/modelBytes references, simplify test descriptions

Replace the single-threaded wasm worker wrapper (separate wasm instance per Web Worker) with Emscripten pthreads (SharedArrayBuffer, real threads via pthread_create). Inference now runs on a background pthread using the same std::thread::spawn code path as native. Build changes: - .cargo/config.toml: +atomics,+bulk-memory,+mutable-globals target features - build.rs: -pthread and -sDEFAULT_PTHREAD_STACK_SIZE=2MB linker flags - build-pkg-emscripten.sh: cargo +nightly -Zbuild-std=std,panic_abort, -pthread in post-link, sed patches for deferred _initialize and wasm-bindgen binding in pthread workers Core unification (-190 lines): - Remove spawn_local wasm path from chat/encoder/crossencoder - Remove wasm-only WorkerGuard variant (unified with JoinHandle) - Use available_parallelism() instead of hardcoded n_threads=1 JS binding rewrite (-1770 lines): - Delete worker dispatcher (runInWorker, message protocol, ChatState, WorkerStreamState, worker-backed Chat/TokenStream) - New Chat wraps ChatHandleAsync directly - New TokenStream wraps TokenStreamAsync - Delete __nbw_spawn_worker, __nbw_wrap_node_worker from pre.js All 8 smoke tests pass. No API changes — Chat.create, chat.ask, for-await streaming, tool calling, getters/setters all unchanged.

Create a persistent ggml threadpool during Worker init and attach it to the llama context. This avoids ggml's disposable threadpool pattern (pthread_create mid-compute) which deadlocks on Emscripten because pthread_create is async. With the persistent pool, ggml worker threads are pre-created during init when the event loop is free, and reused across graph computes. Benchmark (Qwen3-0.6B-Q4_K_M, Node.js, Apple Silicon 10 cores): - Single-threaded (before): 18.5 tok/s, 1 active core - Multi-threaded (now): 57.0 tok/s, 22 active cores (3.1x)

Match the pre-created pthread pool to the actual CPU core count instead of a hardcoded 16. Avoids wasting memory on machines with fewer cores and under-provisioning on machines with more.

Remove js_to_serializable_parts, strip_keys, ChatOptions struct — all were only used by the old Web Worker message protocol. Fix stale comments referencing postMessage, worker dispatcher, RefCell<ChatState>.

…adpool)

ChatOptions struct was removed — Chat.create now parses options via js_sys::Reflect instead of serde deserialization. The deny_unknown_fields lint test is no longer applicable.

The pthreads build requires `cargo +nightly -Zbuild-std=std,panic_abort` to recompile std with atomics. Add nightly toolchain and rust-src component to the js_ci build-and-test job.

The action applies `components` to all toolchains in the list, so `rust-src --toolchain nightly` was mis-parsed as three components. Use plain `components: rust-src` — added to both stable and nightly.

CI's setup-rust-toolchain sets RUSTFLAGS=-D warnings, which overrides the [target.wasm32-unknown-emscripten] rustflags in .cargo/config.toml entirely — silently dropping the +atomics,+bulk-memory,+mutable-globals target features and breaking the shared-memory link (wasm-ld: error: --shared-memory is disallowed ... not compiled with 'atomics'). Set RUSTFLAGS directly in the build script so the build is self-contained and doesn't depend on config.toml being the active rustflags source. Verified by reproducing locally with RUSTFLAGS=-D warnings set.

gergelyvagujhelyi · 2026-05-28T06:18:16Z

/full-ci

gergelyvagujhelyi added 17 commits May 13, 2026 13:27

gergelyvagujhelyi force-pushed the wasm-pr4-release branch from b86b598 to 9d120bb Compare May 13, 2026 13:47

gergelyvagujhelyi force-pushed the wasm-pr3-features branch 3 times, most recently from 8d59f88 to 699d318 Compare May 13, 2026 14:07

gergelyvagujhelyi force-pushed the wasm-pr4-release branch 2 times, most recently from 125ee55 to 38df4d2 Compare May 13, 2026 14:17

gergelyvagujhelyi force-pushed the wasm-pr3-features branch from 699d318 to 341ffa3 Compare May 13, 2026 14:18

gergelyvagujhelyi force-pushed the wasm-pr4-release branch 2 times, most recently from 28e76e6 to 86327c1 Compare May 15, 2026 09:16

gergelyvagujhelyi force-pushed the wasm-pr3-features branch from b021e3a to 000e0b8 Compare May 15, 2026 09:52

gergelyvagujhelyi force-pushed the wasm-pr4-release branch from 86327c1 to 5cc0384 Compare May 15, 2026 09:53

gergelyvagujhelyi added 2 commits May 15, 2026 12:06

gergelyvagujhelyi added 29 commits May 27, 2026 14:42

merge: resolve conflict with main in docs/docs/index.md

e08668b

Take main's Docusaurus button layout, add JS (wasm) link.

chore(js): fix stale comment referencing deleted fetchModelBytes/preload

b5ea4c3

chore(js): rustfmt

4e2732d

chore(js): rustfmt syscall_imports.rs

b578746

fix(js): restore deny_unknown_fields on ChatOptions

0f97f9c

chore: remove wasm-bindgen-cli-emscripten.patch (already applied in f…

e327485

…ork)

docs(js): update README and PR desc for typed SamplerConfig API

382317d

fix(js): add 'Constraint' keyword to README status table for lint test

6faf8dd

chore(js): remove dead __nbw_node_file_to_memfs (replaced by NODEFS)

80bf9b4

chore: remove dead code added in this PR

03c22a2

- get_model_from_bytes + its test in core/llm.rs (nothing calls it after removing Model.loadBytes) - ENOENT constant in syscall_imports.rs (was used by deleted newfstatat) - lstat parameter on stat_into_buf (always false after removing lstat64)

chore: rustfmt

c844185

chore: fix stale references to deleted APIs in comments and tests

bd25e93

- README: remove get_model_from_bytes reference - core/llm.rs: update doc comment (modelBytes → modelUrl) - tests/lint.rs: update comments to remove ConstraintSpec/modelBytes references, simplify test descriptions

chore: replace hardcoded local paths with $HOME in build script

a088c9b

chore: rustfmt

83b2754

chore(js): set PTHREAD_POOL_SIZE to navigator.hardwareConcurrency

30b4637

Match the pre-created pthread pool to the actual CPU core count instead of a hardcoded 16. Avoids wasting memory on machines with fewer cores and under-provisioning on machines with more.

chore: remove dead worker-pattern code and stale comments

533fc9e

Remove js_to_serializable_parts, strip_keys, ChatOptions struct — all were only used by the old Web Worker message protocol. Fix stale comments referencing postMessage, worker dispatcher, RefCell<ChatState>.

fix: allow unused_mut on ctx (mut only needed on wasm for attach_thre…

ed55df2

…adpool)

fix(js): remove chat_options_denies_unknown_fields lint test

470c61b

ChatOptions struct was removed — Chat.create now parses options via js_sys::Reflect instead of serde deserialization. The deny_unknown_fields lint test is no longer applicable.

fix(ci): install nightly + rust-src for wasm -Zbuild-std

b80d537

The pthreads build requires `cargo +nightly -Zbuild-std=std,panic_abort` to recompile std with atomics. Add nightly toolchain and rust-src component to the js_ci build-and-test job.

fix(ci): correct rust-src component syntax for setup-rust-toolchain

c7a5255

The action applies `components` to all toolchains in the list, so `rust-src --toolchain nightly` was mis-parsed as three components. Use plain `components: rust-src` — added to both stable and nightly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: WebAssembly binding (full stack)#531

feat: WebAssembly binding (full stack)#531
gergelyvagujhelyi wants to merge 196 commits into
mainfrom
wasm-pr4-release

gergelyvagujhelyi commented May 13, 2026 •

edited

Loading

Uh oh!

gergelyvagujhelyi commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

gergelyvagujhelyi commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What's delivered

Core surface

API status

Threading model: Emscripten pthreads

Build target: wasm32-unknown-emscripten

Multimodal (Path A — MEMFS-virtualized)

Tool calling

Structured output

Core changes that affect other bindings

v1 limitations

Tested

Type of change

Uh oh!

gergelyvagujhelyi commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gergelyvagujhelyi commented May 13, 2026 •

edited

Loading

Build target: `wasm32-unknown-emscripten`