Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[web] "Uncaught (in promise) 1991937888" or "RuntimeError: abort(undefined)" when loading multiple sessions of a large model #10957

Closed
josephrocca opened this issue Mar 20, 2022 · 6 comments
Assignees
Labels
platform:web issues related to ONNX Runtime web; typically submitted using template

Comments

@josephrocca
Copy link
Contributor

josephrocca commented Mar 20, 2022

Describe the bug
When trying to initialize several sessions of a large model, I get some errors that aren't very helpful. I can create 4 session, but it errors when I try to create a 5th. The model is about 350mb. I'm using the Wasm backend since WebGL backend doesn't work for this model due to operator compatibility/support problems (#10031).

Urgency
Not super urgent.

System information

To Reproduce
Minimal reproduction: https://josephrocca.github.io/clip-image-sorter/debug-onnx-several-image-sessions-at-once.html

Expected behavior
I expect to be able to create as many sessions as the browser's memory limits allow. The memory usage according to Chrome's Task Manager is about ~2GB when the error is thrown. The browser allows much more than ~2GB of memory usage when allocated using ArrayBuffers (up to 16GB in Chrome according to this answer), so I pretty sure this isn't hitting a browser memory limit, and the wasm module memory limit is 4GB until we get wasm64/memory64, IIUC. But my knowledge here is limited.

Screenshots
image

Additional context
The reason I'm trying to create several instances of the same model is because I'm using the model to process a large folder of images, and I'd like to process several images at once. It seems like performance will scale better if I have several sessions with only a single thread each, instead of having a single session with all the threads.

@RandySheriffH RandySheriffH added the core runtime issues related to core runtime label Mar 21, 2022
@hanbitmyths hanbitmyths added component:ort-web platform:web issues related to ONNX Runtime web; typically submitted using template and removed core runtime issues related to core runtime labels Mar 21, 2022
@hanbitmyths
Copy link
Contributor

From below comment, the maximum wasm memory will be 2GB by default unless it's built with MAXIMUM_MEMORY option. ONNX Runtime Web is not built with that option, so it's limited to 2GB memory.

// Set the maximum size of memory in the wasm module (in bytes). This is only
// relevant when ALLOW_MEMORY_GROWTH is set, as without growth, the size of
// INITIAL_MEMORY is the final size of memory anyhow.
//
// Note that the default value here is 2GB, which means that by default if you
// enable memory growth then we can grow up to 2GB but no higher. 2GB is a
// natural limit for several reasons:
//
// * If the maximum heap size is over 2GB, then pointers must be unsigned in
// JavaScript, which increases code size. We don't want memory growth builds
// to be larger unless someone explicitly opts in to >2GB+ heaps.
// * Historically no VM has supported more >2GB+, and only recently (Mar 2020)
// has support started to appear. As support is limited, it's safer for
// people to opt into >2GB+ heaps rather than get a build that may not
// work on all VMs.
//
// To use more than 2GB, set this to something higher, like 4GB.
//
// (This option was formerly called WASM_MEM_MAX and BINARYEN_MEM_MAX.)
// [link]
var MAXIMUM_MEMORY = 2147483648;

@josephrocca
Copy link
Contributor Author

@hanbitmyths Thanks for looking into this! A few questions:

  1. Is it possible to improve the error message here so it's not confusing for other/future users of the web runtime?
  2. Since the default 2GB choice is based on Emscripten glue code size, and since users of ONNX Runtime Web will load vastly more data over the network due to the .onnx model compared to Emscripten's glue code, can this be switched to 4GB by default? (Note: There is some discussion here on setting the memory limit at load time, but I think in this case it would make more sense to just set 4GB as the max at build time.)
  3. Is it possible to load each session as a completely separate wasm module (with its own set of workers)? If not, how hard would this be to implement? This would be ideal because it would basically remove the memory limit for most practical usage.
    • Would this also improve parallelization if I'm running several inference sessions at once, each with only one thread? In my initial tests of this it doesn't seems to scale very well, and I figure that this could be due to all the inference sessions all sharing the same wasm module?

@fs-eire fs-eire self-assigned this Mar 29, 2022
@fs-eire
Copy link
Contributor

fs-eire commented Mar 29, 2022

1> probably this can be improved by using a "debug" web assembly build ( with -s ASSERTIONS=1 ). However, including another 4 .wasm files into the NPM package will significantly increase the package size. This need to be think of carefully.

2> I see no harm to increase max_size to 4GB.

3> this cannot be done based on current code, so need code changes to make it happen. However, using a singleton wasm instance in one JS context is a design decision which is based on the requirement of saving memory usage, as there is no need to duplicate the memory used by ORT itself (and also simply the implementation of status management). The only benefit seems to be allowing bypassing the 4GB limit, which may be supported by wasm64 in future.
> so far ort.min.js cannot be loaded in web-worker directly, so there is no parallelization would happen for different models. It needs a change to allow the ORT Web js to be loaded in a web worker.

@josephrocca
Copy link
Contributor Author

josephrocca commented Mar 29, 2022

Thanks for looking into this! Would the changes required to support parallel execution of several sessions also allow those individual sessions to have their own separate 4 GB limits?

In this OpenAI CLIP demo I'm using ONNX Runtime Web to get the embeddings for a user-provided directory of images, and if the user's machine has 16 GB of RAM and 16 threads, then I'd love it if it were possible to process images at up to ~16x the speed (the models takes ~400 MB of RAM IIRC, so there'd be leftover RAM for the OS and other processes).

As an example, ideally it would be as simple as something like:

let session1 = await ort.InferenceSession.create(imageModelUrl, { resourceGroup: "foo" });
let session2 = await ort.InferenceSession.create(imageModelUrl, { resourceGroup: "foo" });
let session3 = await ort.InferenceSession.create(imageModelUrl, { resourceGroup: "bar" });

So in this case session1 and session2 share workers and RAM, and session3 has its own workers and RAM. Each resource group would (if I understand correctly) only be able to execute a single model at a time, and would have up to 4GB of memory (until wasm64, at least). There'll likely be a better name than resourceGroup - this is just to illustrate.

@josephrocca
Copy link
Contributor Author

@fs-eire @hanbitmyths Wondering if there are any updates on this issue? The main two things are:

  • Raising Emscripten's MAXIMUM_MEMORY setting from 2GB to 4GB (here)
  • Allowing for multiple inference sessions to be created that run in parallel (i.e. each with their own web worker).

The first one seems like it just involves changing a single number, and it would unblock and others on a few projects. Any chance a pull request would be accepted for that?

@guschmue
Copy link
Contributor

this should be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:web issues related to ONNX Runtime web; typically submitted using template
Projects
None yet
Development

No branches or pull requests

6 participants