Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for guided decoding (fixes #288) #2815

Closed
wants to merge 3 commits into from

Conversation

br3no
Copy link
Contributor

@br3no br3no commented Feb 8, 2024

This pull request extends the api_server.py with optional guided decoding with regex and JSON schema resolving issue #288.

The API is extended with the optional parameters regex and schema. The functionality is opt-in and needs to be activated by the CLI parameter --guided-decoding-engine that defaults to None and accepts the value outlines.

The new module guided_decoding.py implements the integration of the outlines logits processors.

This pull request also changes the API of the vLLM LogitsProcessors to add a sequence id for each sequence, to support stateful logits processors.

Starting the api_server with the extra CLI arguments --guided-decoding-engine outlines allows one to issue these kinds of requests:

response = requests.post("http://hal9000:1984/generate", json={
    "prompt" : "The best language for type-safe systems programming is ",
    "regex" : "(Python|Java|C|C\+\+|C#|JavaScript|PHP|Swift|Go|Ruby|TypeScript|Kotlin|Rust)",
    "max_tokens" : 10,
    "n" : 2
})
response.json()

Producing

 {'text': ['The best language for type-safe systems programming is C++',
  'The best language for type-safe systems programming is Go']}
response = requests.post("http://hal9000:1984/generate", json={
    "prompt" : """Return a json object for the following schema: {"type": "object", "properties": {"name": {"type": "string", "maxLength" : 20}, "age": {"type": "integer"}}}""",
    "schema" : {"type": "object", "properties": {"name": {"type": "string", "maxLength" : 20}, "age": {"type": "integer"}}}
})
response.json()

Producing

{'text': ['Return a json object for the following schema: {"type": "object", "properties": {"name": {"type": "string", "maxLength" : 20}, "age": {"type": "integer"}}}: {"name": "John", "age": 30}']}

The changes here are partially inspired by the integration of vLLM in outlines (https://github.com/outlines-dev/outlines/blob/main/outlines/serve/serve.py).

@simon-mo simon-mo self-assigned this Feb 8, 2024
@esmeetu
Copy link
Collaborator

esmeetu commented Feb 9, 2024

Hi, @br3no
What's the difference between https://github.com/noamgat/lm-format-enforcer and outlines? IMK, They could be done json and regex structure output.

@simon-mo
Copy link
Collaborator

simon-mo commented Feb 9, 2024

I think outlines has lower runtime overhead as compared to lmformatenforcer.

@br3no
Copy link
Contributor Author

br3no commented Feb 9, 2024

@esmeetu I actually have a branch with support for lm-format-enforcer. I chose not to add it to this PR because I couldn't yet find a way to reach reasonable speed. You can follow the discussion here: noamgat/lm-format-enforcer#65 (comment)

@br3no
Copy link
Contributor Author

br3no commented Feb 13, 2024

@simon-mo let me know if you feel this goes in the right direction and if there is anything I can do to help in the review process.

@simon-mo
Copy link
Collaborator

Hi @br3no, thank you sooo much for this PR. However, I'm bummed to say that we are not going to add more functionality to the simple api server, rather focusing the complex features in open ai compatible server. I think we will end up merging #2819 (which is based off your commit and you are one of the co-author!). Any review on that PR is appreciated.

@simon-mo simon-mo closed this Feb 13, 2024
@br3no
Copy link
Contributor Author

br3no commented Feb 14, 2024

Hi @simon-mo, no worries. I'll chime in on issue #2819.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants