Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Guided Decoding in LLM entrypoint #3536

Closed
simon-mo opened this issue Mar 20, 2024 · 15 comments · Fixed by #6878
Closed

[Feature]: Support Guided Decoding in LLM entrypoint #3536

simon-mo opened this issue Mar 20, 2024 · 15 comments · Fixed by #6878

Comments

@simon-mo
Copy link
Collaborator

🚀 The feature, motivation and pitch

Currently we support guided decoding of (JSON, Regex, Choice, Grammar, and arbitrary JSON) in OpenAI inference server. It would be great that we expose the same functionality in the offline interface as well.

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters

Concretely this would mean adding the support here as a new parameter to generate call. Using methods introduced in #2819.

Do make sure to add test and examples.

Alternatives

No response

Additional context

No response

@kevinbu233
Copy link
Contributor

Will work on this.

@BedirT
Copy link

BedirT commented Apr 9, 2024

Any update on this?

@kevinbu233
Copy link
Contributor

still working on this, sorry about the delay

@maxdebayser
Copy link
Contributor

maxdebayser commented Jun 13, 2024

I'm trying to understand why this support is considered missing if you can already do this:

from vllm import LLM, SamplingParams
from outlines.serve.vllm import JSONLogitsProcessor
from pydantic import BaseModel, conlist
import datetime as dt

class Output(BaseModel):
    names: conlist(str, max_length=5)
    organizations: conlist(str, max_length=5)
    locations: conlist(str, max_length=5)
    miscellanous: conlist(str, max_length=5)

llm = LLM('mistralai/Mistral-7B-v0.1', max_model_len=10_000, gpu_memory_utilization=0.9)
logits_processor = JSONLogitsProcessor(schema=Output, llm=llm.llm_engine)
logits_processor.fsm.vocabulary = list(logits_processor.fsm.vocabulary)
prompt = """
Locate all the names, organizations, locations and other miscellaneous entities in the following sentence: 
"Charles went and saw Anna at the coffee shop Starbucks, which was based in a small town in Germany called Essen."
"""
sampling_params = SamplingParams(max_tokens=128, temperature=0, logits_processors=[logits_processor])

t0 = dt.datetime.now()
llm.generate([prompt] * 256, sampling_params=sampling_params)
time_elapsed = (dt.datetime.now() - t0).total_seconds()
print(f"Generation took {time_elapsed:,} seconds.")

(Example taken from #3087).

Is the point of this issue to make the use of guided decoding more intuitive for the user?

@simon-mo
Copy link
Collaborator Author

Yes just for ease of use, and a way to better reset the processor.

@datalee
Copy link

datalee commented Nov 13, 2024

how to use in openai api([OpenAI Compatible Server])?

@DarkLight1337
Copy link
Member

Please use the extra parameters listed here

@datalee
Copy link

datalee commented Nov 13, 2024

Please use the extra parameters listed here

Is there a version limit? Exception, is there a complete call example, thank you.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 13, 2024

Sorry I don't have time to come up with an full example. You can take a look at the tests (e.g. tests/entrypoints/openai/test_chat.py) for some inspiration.

As for the version limit, you can set the version of the docs via the bottom right menu and find the earliest one that documents these parameters.

@datalee
Copy link

datalee commented Nov 14, 2024

Sorry I don't have time to come up with an full example. You can take a look at the tests (e.g. tests/entrypoints/openai/test_chat.py) for some inspiration.

As for the version limit, you can set the version of the docs via the bottom right menu and find the earliest one that documents these parameters.

ok,thanks.
If version 0.4.2[https://docs.vllm.ai/en/v0.4.2/serving/openai_compatible_server.html] is used, an error is reported:
{'object': 'error', 'message': "[{'type': 'extra_forbidden', 'loc': ('body', 'extra_body'), 'msg': 'Extra inputs are not permitted', 'input': {'guided_json': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string', 'maxLength': 10}, 'minItems': 3}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'required': ['name', 'age', 'skills', 'work_history']}, 'guided_decoding_backend': 'outlines'}}]", 'type': 'BadRequestError', 'param': None, 'code': 400}

input:
{'model': 'qwen1_5_7b', 'temperature': 0.3, 'top_p': 0.3, 'max_tokens': 1024, 'messages': [{'role': 'system', 'content': 'you are a helpful assistant'}, {'role': 'user', 'content': "Give an example JSON for an employee profile that fits this schema: {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string', 'maxLength': 10}, 'minItems': 3}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'required': ['name', 'age', 'skills', 'work_history']}"}], 'extra_body': {'guided_json': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string', 'maxLength': 10}, 'minItems': 3}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'required': ['name', 'age', 'skills', 'work_history']}, 'guided_decoding_backend': 'outlines'}}

@DarkLight1337
Copy link
Member

How are you accessing the endpoint? extra_body is only for OpenAI client.

@datalee
Copy link

datalee commented Nov 20, 2024

How are you accessing the endpoint? extra_body is only for OpenAI client.

yes,my endpoint is using vllm openai server,like :xxxx/v1/chat/completions

@DarkLight1337
Copy link
Member

Can you show your code?

@2U1
Copy link

2U1 commented Dec 6, 2024

@DarkLight1337

SQL_GRAMMER = """
    ?start: select_statement

    ?select_statement: "SELECT " column_list " FROM " table_name (join_clause)? (where_clause)? (order_by_clause)? (limit_clause)?

    ?column_list: column_name ("," column_name)*

    ?table_name: identifier

    ?join_clause: "JOIN " table_name " ON " condition

    ?where_clause: "WHERE " condition

    ?order_by_clause: "ORDER BY " column_name ("ASC" | "DESC")?

    ?limit_clause: "LIMIT " INT

    ?condition: identifier "=" value

    ?column_name: identifier
    ?value: STRING | INT

    ?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
    ?STRING: /'[^']*'/
    ?INT: /[0-9]+/
"""

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": system_prompt},
        {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt
            }
        ],
        }
    ],
    model=model,
    temperature=0,
    top_p=0.1,
    extra_body={
        "guided_grammer": SQL_GRAMMER
    },
)

I'm having the same issue with it. I'm using 0.6.4post1

@DarkLight1337
Copy link
Member

@DarkLight1337

SQL_GRAMMER = """
    ?start: select_statement

    ?select_statement: "SELECT " column_list " FROM " table_name (join_clause)? (where_clause)? (order_by_clause)? (limit_clause)?

    ?column_list: column_name ("," column_name)*

    ?table_name: identifier

    ?join_clause: "JOIN " table_name " ON " condition

    ?where_clause: "WHERE " condition

    ?order_by_clause: "ORDER BY " column_name ("ASC" | "DESC")?

    ?limit_clause: "LIMIT " INT

    ?condition: identifier "=" value

    ?column_name: identifier
    ?value: STRING | INT

    ?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
    ?STRING: /'[^']*'/
    ?INT: /[0-9]+/
"""

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": system_prompt},
        {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt
            }
        ],
        }
    ],
    model=model,
    temperature=0,
    top_p=0.1,
    extra_body={
        "guided_grammer": SQL_GRAMMER
    },
)

I'm having the same issue with it. I'm using 0.6.4post1

You made a typo. The field is called "guided_grammar", not "guided_grammer".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
@maxdebayser @BedirT @datalee @simon-mo @DarkLight1337 @2U1 @kevinbu233 and others