-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance] Support xgrammar
for faster constrained decoding
#1680
Conversation
Nice work! May u resolve the conflicts? Thanks! |
outlines
with xgrammar
in constrained decodingxgrammar
for faster constrained decoding
Thank you for your feedback! We’ve resolved the conflicts in the latest commits. By the way, we also have plans to implement a new version that will support both |
Isn't that a problem to remove the regex support ? Not that I really mind for my use cases, but it's been only since end of august with my PR #1125 that support for json has been added, any idea if some users may still require / need the support for regex constrained decoding ? |
Wait, is there any link to the |
Hi! The |
nice, looking forward to the release and to trying it out! |
|
Hi! Thanks for your feedback.
|
You can make the import of outlines and xgrammar optional when they are not used. |
moved to #1752 |
Motivation
We conducted experiments to compare the end-to-end performance of
outlines
andxgrammar
libraries in constrained decoding.Experiment Setup
We ran the experiment with the following command:
For the dataset, we selected 389 out of 400 questions from bfcl_v3_simple.
Settings
Experiment Results
Latency refers to the end-to-end time for single requests and the average time for batch requests. Output tokens refer to the average number of tokens in the output.
Modifications
We plan to support both
xgrammar
andoutlines
as the backend for constrained decoding in the future.Checklist