-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: enable_prefix_caching leads to persistent illegal memory access error #6833
Comments
Can you share the exact prompts you are sending? This issue occurs sporadically, so detailed reproduction instructions would be very beneficial for us |
@robertgshaw2-neuralmagic thanks for fast reply, here's the link to a file with 5000 prompts generated as with open('/Volumes/qa/tv_segmentation_bronze/misc/formatted_prompts.txt', 'w') as f:
for item in formatted_prompts:
f.write("%s\n" % item) This is what went into the input output = llm.generate(formatted_prompts, sampling_params) |
BTW @robertgshaw2-neuralmagic if you have access to Databricks, one option to easily and fully reproduce environment is running in notebook on 15.4 LTS ML Beta (15.4.x-gpu-ml-scala2.12) runtime, as that's where I ran it. |
@robertgshaw2-neuralmagic - regarding your comment about the prompts content above, any suggestions as to which properties of prompts might be causing the error. I have rerun by re-using only the first prompt as an example # other code as before
output = llm.generate(formatted_prompts[0]*len(formatted_prompts), sampling_params) and it completed fine. |
mark. met same problem |
mark, met same problem in v0.5.0post1 |
same |
Also seeing the same problem and I found the issues arises at the time when a cached prefill request scheduled together with non-cached request. The problem is gone if I force it to only schedule one prefill request. Still debugging. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Your current environment
🐛 Describe the bug
After running the code
I get an error
The error seems to happen randomly and sometimes I don't get an error running the same command in the same environment and versions,
I have done the following investigations and can confirm:
enable_prefix_caching=False
removes the errorThe Python process exited with exit code 139 (SIGSEGV: Segmentation fault)
I have seen quite a few different issues with
enable_prefix_caching
, could anyone comment if the feature actually worked for them? We have a lot of 80-90% repetitive prompts in our use cases so prefix caching provides dramatic speed-up. Would be grateful for any suggestions!Full error detail
The text was updated successfully, but these errors were encountered: