[web] "Uncaught (in promise) 1991937888" or "RuntimeError: abort(undefined)" when loading multiple sessions of a large model #10957

josephrocca · 2022-03-20T14:56:51Z

Describe the bug
When trying to initialize several sessions of a large model, I get some errors that aren't very helpful. I can create 4 session, but it errors when I try to create a 5th. The model is about 350mb. I'm using the Wasm backend since WebGL backend doesn't work for this model due to operator compatibility/support problems (#10031).

Urgency
Not super urgent.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 20.04
ONNX Runtime installed from (source or binary): https://cdn.jsdelivr.net/npm/onnxruntime-web@1.10.0/dist/ort.js
ONNX Runtime version: 1.10.0

To Reproduce
Minimal reproduction: https://josephrocca.github.io/clip-image-sorter/debug-onnx-several-image-sessions-at-once.html

Expected behavior
I expect to be able to create as many sessions as the browser's memory limits allow. The memory usage according to Chrome's Task Manager is about ~2GB when the error is thrown. The browser allows much more than ~2GB of memory usage when allocated using ArrayBuffers (up to 16GB in Chrome according to this answer), so I pretty sure this isn't hitting a browser memory limit, and the wasm module memory limit is 4GB until we get wasm64/memory64, IIUC. But my knowledge here is limited.

Screenshots

Additional context
The reason I'm trying to create several instances of the same model is because I'm using the model to process a large folder of images, and I'd like to process several images at once. It seems like performance will scale better if I have several sessions with only a single thread each, instead of having a single session with all the threads.

The text was updated successfully, but these errors were encountered:

hanbitmyths · 2022-03-21T20:40:33Z

From below comment, the maximum wasm memory will be 2GB by default unless it's built with MAXIMUM_MEMORY option. ONNX Runtime Web is not built with that option, so it's limited to 2GB memory.

// Set the maximum size of memory in the wasm module (in bytes). This is only
// relevant when ALLOW_MEMORY_GROWTH is set, as without growth, the size of
// INITIAL_MEMORY is the final size of memory anyhow.
//
// Note that the default value here is 2GB, which means that by default if you
// enable memory growth then we can grow up to 2GB but no higher. 2GB is a
// natural limit for several reasons:
//
// * If the maximum heap size is over 2GB, then pointers must be unsigned in
// JavaScript, which increases code size. We don't want memory growth builds
// to be larger unless someone explicitly opts in to >2GB+ heaps.
// * Historically no VM has supported more >2GB+, and only recently (Mar 2020)
// has support started to appear. As support is limited, it's safer for
// people to opt into >2GB+ heaps rather than get a build that may not
// work on all VMs.
//
// To use more than 2GB, set this to something higher, like 4GB.
//
// (This option was formerly called WASM_MEM_MAX and BINARYEN_MEM_MAX.)
// [link]
var MAXIMUM_MEMORY = 2147483648;

josephrocca · 2022-03-22T09:10:01Z

@hanbitmyths Thanks for looking into this! A few questions:

Is it possible to improve the error message here so it's not confusing for other/future users of the web runtime?
Since the default 2GB choice is based on Emscripten glue code size, and since users of ONNX Runtime Web will load vastly more data over the network due to the .onnx model compared to Emscripten's glue code, can this be switched to 4GB by default? (Note: There is some discussion here on setting the memory limit at load time, but I think in this case it would make more sense to just set 4GB as the max at build time.)
Is it possible to load each session as a completely separate wasm module (with its own set of workers)? If not, how hard would this be to implement? This would be ideal because it would basically remove the memory limit for most practical usage.
- Would this also improve parallelization if I'm running several inference sessions at once, each with only one thread? In my initial tests of this it doesn't seems to scale very well, and I figure that this could be due to all the inference sessions all sharing the same wasm module?

fs-eire · 2022-03-29T05:09:41Z

1> probably this can be improved by using a "debug" web assembly build ( with -s ASSERTIONS=1 ). However, including another 4 .wasm files into the NPM package will significantly increase the package size. This need to be think of carefully.

2> I see no harm to increase max_size to 4GB.

3> this cannot be done based on current code, so need code changes to make it happen. However, using a singleton wasm instance in one JS context is a design decision which is based on the requirement of saving memory usage, as there is no need to duplicate the memory used by ORT itself (and also simply the implementation of status management). The only benefit seems to be allowing bypassing the 4GB limit, which may be supported by wasm64 in future.
> so far ort.min.js cannot be loaded in web-worker directly, so there is no parallelization would happen for different models. It needs a change to allow the ORT Web js to be loaded in a web worker.

josephrocca · 2022-03-29T08:16:21Z

Thanks for looking into this! Would the changes required to support parallel execution of several sessions also allow those individual sessions to have their own separate 4 GB limits?

In this OpenAI CLIP demo I'm using ONNX Runtime Web to get the embeddings for a user-provided directory of images, and if the user's machine has 16 GB of RAM and 16 threads, then I'd love it if it were possible to process images at up to ~16x the speed (the models takes ~400 MB of RAM IIRC, so there'd be leftover RAM for the OS and other processes).

As an example, ideally it would be as simple as something like:

let session1 = await ort.InferenceSession.create(imageModelUrl, { resourceGroup: "foo" });
let session2 = await ort.InferenceSession.create(imageModelUrl, { resourceGroup: "foo" });
let session3 = await ort.InferenceSession.create(imageModelUrl, { resourceGroup: "bar" });

So in this case session1 and session2 share workers and RAM, and session3 has its own workers and RAM. Each resource group would (if I understand correctly) only be able to execute a single model at a time, and would have up to 4GB of memory (until wasm64, at least). There'll likely be a better name than resourceGroup - this is just to illustrate.

josephrocca · 2022-08-29T02:54:28Z

@fs-eire @hanbitmyths Wondering if there are any updates on this issue? The main two things are:

Raising Emscripten's MAXIMUM_MEMORY setting from 2GB to 4GB (here)
Allowing for multiple inference sessions to be created that run in parallel (i.e. each with their own web worker).

The first one seems like it just involves changing a single number, and it would unblock and others on a few projects. Any chance a pull request would be accepted for that?

guschmue · 2024-05-23T18:08:59Z

this should be fixed.

RandySheriffH added the core runtime issues related to core runtime label Mar 21, 2022

hanbitmyths added component:ort-web platform:web issues related to ONNX Runtime web; typically submitted using template and removed core runtime issues related to core runtime labels Mar 21, 2022

fs-eire self-assigned this Mar 29, 2022

josephrocca mentioned this issue Jul 6, 2022

Add ORT worker proxy to prevent main thread locking, and service worker to cross-origin isolate which allows wasm threads juharris/train-pytorch-in-js#9

Open

josephrocca mentioned this issue Jul 15, 2022

Currently converting VIT-L/14 model josephrocca/openai-clip-js#3

Closed

sophies927 removed the component:ort-web label Aug 12, 2022

josephrocca mentioned this issue Aug 29, 2022

RWKV-4 169m/430m in browser with ORT Web / TF.js / tfjs-tflite? BlinkDL/RWKV-LM#7

Open

fs-eire mentioned this issue Aug 31, 2022

increase max memory to 4G for wasm #12798

Merged

guschmue closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[web] "Uncaught (in promise) 1991937888" or "RuntimeError: abort(undefined)" when loading multiple sessions of a large model #10957

[web] "Uncaught (in promise) 1991937888" or "RuntimeError: abort(undefined)" when loading multiple sessions of a large model #10957

josephrocca commented Mar 20, 2022 •

edited

Loading

hanbitmyths commented Mar 21, 2022

josephrocca commented Mar 22, 2022

fs-eire commented Mar 29, 2022

josephrocca commented Mar 29, 2022 •

edited

Loading

josephrocca commented Aug 29, 2022

guschmue commented May 23, 2024

[web] "Uncaught (in promise) 1991937888" or "RuntimeError: abort(undefined)" when loading multiple sessions of a large model #10957

[web] "Uncaught (in promise) 1991937888" or "RuntimeError: abort(undefined)" when loading multiple sessions of a large model #10957

Comments

josephrocca commented Mar 20, 2022 • edited Loading

hanbitmyths commented Mar 21, 2022

josephrocca commented Mar 22, 2022

fs-eire commented Mar 29, 2022

josephrocca commented Mar 29, 2022 • edited Loading

josephrocca commented Aug 29, 2022

guschmue commented May 23, 2024

josephrocca commented Mar 20, 2022 •

edited

Loading

josephrocca commented Mar 29, 2022 •

edited

Loading