Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark json schema #2030

Merged
merged 12 commits into from
Nov 15, 2024
Merged

Conversation

DarkSharpness
Copy link
Contributor

Motivation

Modifications

Add a simple benchmark about json schema constrained decoding, based on dataset NousResearch/json-mode-eval

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

benchmark/json_schema/bench_sglang.py Outdated Show resolved Hide resolved
@merrymercy
Copy link
Contributor

Please fix the lint error and paste some benchmark results here. Then we can merge this!

@DarkSharpness
Copy link
Contributor Author

Please fix the lint error and paste some benchmark results here. Then we can merge this!

Sure! Here's some experiment results on my machine:

Configuration: AMD EPYC 7302 16-Core Processor + NVIDIA A100 40G

Settings Latency (s) Overall output tokens
outlines + disk_cache + jump_forward 59.788 5580
outlines + disk_cache - jump_forward 57.645 5081
outlines - disk_cache + jump_forward 456.94 5580
outlines - disk_cache - jump_forward 451.79 5081
  1. the overall output tokens should be identical in all cases, but due to this bug, the actual output differs slightly
  2. disk_cache means that the compiled regex/json_schema is already in disk cache before running, so we don't need to recompile that when running (which results in a significant speed up).

@zhyncs
Copy link
Member

zhyncs commented Nov 15, 2024

disk_cache means that the compiled regex/json_schema is already in disk cache before running, so we don't need to recompile that when running (which results in a significant speed up).

This is basically done in all current implementations, including TensorRT LLM, and the disk_cache should be enabled by default.

@zhyncs
Copy link
Member

zhyncs commented Nov 15, 2024

BTW how about this https://github.com/guidance-ai/llgtrt
It can reduce the first time compilation overhead.

@DarkSharpness
Copy link
Contributor Author

BTW how about this https://github.com/guidance-ai/llgtrt It can reduce the first time compilation overhead.

Thanks for the suggestion!

Currently, we plan to support xgrammar as a faster grammar backend alternative.

Maybe we can give it llgtrt a try in future PRs.

benchmark/json_schema/README.md Outdated Show resolved Hide resolved
benchmark/json_schema/bench_sglang.py Outdated Show resolved Hide resolved
@merrymercy
Copy link
Contributor

@zhyncs disk cache is turned on by default. xgrammar will solve the preprocess slowness

@merrymercy merrymercy merged commit 954f4e6 into sgl-project:main Nov 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants