Building Threaded Wasm in 2026: What Changed and What the Flags Mean

If you’re building threaded wasm with Rust nightly and your recipe stopped producing a threaded artifact sometime in the last year, the cause is a behavior change in how rustc and wasm-ld interpret target features. The recipe most people had in their build scripts (the one that’s been canonical since around 2023) no longer produces what it used to, and nothing in the toolchain tells you.

The old recipe was:

RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals"
wasm-pack build --target web -- \
  --features threads -Z build-std=panic_abort,std

This used to produce a .wasm with shared memory and the TLS exports wasm-bindgen needs. Today, the same recipe produces a .wasm that compiles atomic instructions but uses non-shared memory, so the build succeeds while the artifact ships broken, and the fallback is single-threaded execution with nothing in your logs to indicate anything went wrong.

To get a real threaded build, you now need nine additional linker flags, and the reasoning behind each one is worth understanding because they’re not arbitrary additions. This post walks through what changed in the toolchain, what each new flag does, and how the different stages of the wasm build pipeline interact (or fail to) when any of them is missing.

The change

Through mid-2025, rustc with -C target-feature=+atomics caused wasm-ld to infer that the output should have shared memory: the linker would see atomics in the input objects and emit --shared-memory --import-memory defaults on your behalf. This was an implicit coupling between an instruction-set feature and a memory layout decision, and it worked because in practice almost nobody used atomics without wanting threading.

That coupling was removed in favor of cleaner separation of concerns. Atomic instructions are an instruction set capability, while shared memory is a module-level structural property, and you can technically use atomic instructions on non-shared memory (for interruption signaling via SharedArrayBuffer in single-threaded contexts, for example), so the linker no longer assumes you want shared memory just because you use atomics. The new contract is that every memory layout decision must be specified, and the old defaults are gone.

The downside is that every recipe written before the change is now silently producing the wrong artifact, and the toolchain has no way to know whether you meant to be threaded or not.

The flags, grouped by purpose

The full recipe for current nightly looks like this:

RUSTFLAGS="-C target-feature=+atomics,+bulk-memory,+mutable-globals \
  -C link-arg=--shared-memory \
  -C link-arg=--import-memory \
  -C link-arg=--max-memory=4294967296 \
  -C link-arg=--export=__heap_base \
  -C link-arg=--export=__data_end \
  -C link-arg=--export=__wasm_init_tls \
  -C link-arg=--export=__tls_size \
  -C link-arg=--export=__tls_align \
  -C link-arg=--export=__tls_base"

These break into three groups: memory sharing, heap layout, and thread-local storage. Each group addresses a different layer of what threading requires, and missing any one of them produces a different failure mode.

Memory sharing

The first three flags are the ones that actually make the memory threadable. --shared-memory tells wasm-ld to set the shared bit on the memory’s limits byte in the wasm binary, which is the property a runtime checks when instantiating the module: with the bit set, the memory must be backed by a SharedArrayBuffer, and without it you get a regular ArrayBuffer that can’t be transferred to workers. The bit is a single byte in the binary, but it changes everything about how the host can use the memory.

--import-memory makes the memory section into an import rather than a module-internal definition. This matters because the same SharedArrayBuffer needs to be visible across all workers, and the only way to share it is for the JS host to create it once and pass it into each worker’s instantiation. If the memory is internal to the module, every worker that loads the module creates its own private memory and threading is impossible regardless of whether the shared bit is set.

--max-memory=4294967296 is required by the wasm threads spec because shared memories must declare a maximum. The value 2^32 is just the wasm32 addressable limit (you can pick smaller values if you have a reason to), but the cap itself isn’t optional. Omit it and wasm-ld will reject the build with an error, which makes this the one flag in the group that fails loudly rather than silently.

Without these three together, you get a non-threaded wasm: atomic instructions still execute, but they coordinate nothing because there’s no shared memory to coordinate over.

Heap layout exports

--export=__heap_base and --export=__data_end expose where the static data section ends and where the heap begins, which the runtime allocator needs to know what address ranges are safe to allocate from. These were exported by default in older toolchain versions, but the defaults became inconsistent across linker versions and build configurations, so making them explicit avoids a class of bugs where the wasm builds, instantiates, runs basic code, and then crashes the first time the allocator tries to use the heap.

If your build only just started failing on the threading check but the rest of the code worked, your wasm probably had these exports by default. Adding them explicitly is cheap insurance against future linker changes.

Thread-local storage exports

The four TLS exports are the ones wasm-bindgen’s threads-xform pass needs to generate per-worker initialization glue, and they’re the most subtle group because their absence produces failures that look like memory corruption rather than missing functionality.

__wasm_init_tls is the function that allocates and initializes thread-local storage for a new thread. When a worker spawns, the JS glue code calls this with a freshly allocated TLS region pointer, and without exporting it, wasm-bindgen has no way to wire up the per-thread setup, so threads start without functioning thread-locals. This means #[thread_local] statics read whatever was in memory at the address the compiler chose, which is sometimes zero (and your code works) and sometimes garbage (and your code crashes in confusing ways).

__tls_size is the size of the TLS region in bytes, which the glue code needs to allocate the right amount of memory before calling the init function. __tls_align is the alignment requirement for that allocation. And __tls_base is a pointer to the current thread’s TLS base address, which all thread-local accesses go through after init, so each worker has its own __tls_base value pointing at its own TLS region within the shared memory.

These four together make #[thread_local] work across threads. Without them, code that uses thread-locals (including parts of std like the panic runtime and any crate that relies on per-thread state) will read garbage or fault when used from a worker.

What each pipeline stage does

The wasm threading pipeline has four stages, and each one can drop the threading property without flagging anything to the developer.

rustc with +atomics emits atomic instructions wherever the std rebuild calls for them, but it doesn’t mark the memory as shared because that’s the linker’s job. This is the change from the old behavior, where the implicit coupling meant rustc’s flag effectively decided memory layout through the linker’s defaults.

wasm-ld is where the memory layout actually gets decided. With the flags above, it produces a shared imported memory with the required exports. Without them, it produces a non-shared internal memory and atomic instructions that operate on it as if it were a private memory, which is a valid wasm configuration that simply isn’t useful for threading.

wasm-bindgen’s threads-xform pass requires the memory to be imported and asserts mem.import.is_some(). If your wasm has an internal shared memory (you passed --shared-memory but not --import-memory), threads-xform panics. If your wasm has a non-shared memory whether imported or not, threads-xform processes it without complaint and emits a non-threaded glue layer, which is the case that’s hardest to catch because everything looks fine.

wasm-opt (binaryen) strips features it doesn’t recognize, and without --enable-threads it rewrites shared memory back to non-shared during optimization. The only visible change is that the file gets smaller, so if you’re not checking the artifact after optimization you’ll never notice.

wasm-pack orchestrates rustc, wasm-bindgen, and wasm-opt, and exposes a Cargo.toml metadata config for wasm-opt arguments. In v0.13 this config is honored locally but behaves differently in different runner environments, so if your local builds work and your CI builds don’t, the simplest fix is to disable wasm-pack’s wasm-opt step entirely and invoke wasm-opt yourself with the right flags.

Verifying the artifact

None of these layers emit a clear error when the threading property is dropped, which means the only reliable check is reading the wasm bytes and asserting the properties you wanted. It’s about fifty lines of Node:

import { readFileSync } from "node:fs";

const buf = readFileSync(wasmPath);

function findSharedMemory(buf) {
  // Walk wasm sections, find imports section (id 2) and memory
  // section (id 5), inspect each memory's limits flag byte for
  // the shared bit (0x02).
  //   https://webassembly.github.io/threads/core/binary/types.html
  // ... about 50 lines of binary parsing ...
}

const found = findSharedMemory(buf);
if (!found) {
  console.error("FAIL — no shared memory; not a threaded build");
  process.exit(1);
}

You need three constants from the wasm spec and a section walker that reads ULEB128-encoded lengths. The script bypasses the toolchain and reads what’s in the bytes, which is the question the build itself doesn’t answer: does the artifact actually have the shared bit set? Extending the same approach to verify TLS exports are present, that the memory is imported rather than internal, or that the wasm features section advertises threads is a few more lines each.

The general principle

Toolchain composition is fragile in a specific way: each tool takes partial input, makes defensible default decisions for what it wasn’t told, and produces output, and the intersection of those decisions is “an artifact got produced” rather than “the artifact has the properties you wanted.” The fix isn’t to trust the toolchain harder, it’s to read the artifact and check the properties yourself, which is something a fifty-line parser in a different language can do reliably regardless of what the build did or didn’t do.

For wasm threads specifically, write a verifier that opens the .wasm and checks for the shared bit and the required exports, then run it as a gate before publish. If you ship multiple variants (threaded and non-threaded), verify each one matches its name. The same logic applies to any compiled artifact where a build property matters: container images that claim to be slim, native binaries that claim to have stack canaries, signed packages that claim to be signed. Build logs describe what the build tried to do. The artifact is the only ground truth, and writing a parser that reads it directly is usually cheaper than the next incident report.