-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infinity embed crashes too easily #517
Comments
Same problem here. |
Note:
40GB does not make sense. however 8192tokens x 8 will cause a decent ulitzation which might be what you are seeing. The restriction would be split batches for with a max_num_tokens parameter. As this happens before tokenizations, it would be a |
same problem using bge-m3 for embedding and rerank, oom. how to deal with this ! :( |
I reduced the batch and it stopped happening, but I don't think this is a good solution. I'm still looking for a better solution. |
M3 should not have this issue at all. Can you send the logs here? |
@michaelfeil Thanks for the reply. Could you share with the options for |
System Info
0.0.74
Information
Tasks
Reproduction
Initially, the GPU memory usage starts at just a few gigabytes. However, after running hundreds of calls, the memory consumption gradually increases to over 40GB, eventually resulting in an OOM (Out of Memory) error.
The API should be robust enough to handle heavy usage without crashing or becoming unresponsive, as such issues hinder its usability and reliability. A potential solution could involve implementing a restriction, such as automatically truncating documents that exceed a specified size.
The text was updated successfully, but these errors were encountered: