[Feature]: Support Guided Decoding in `LLM` entrypoint #3536

simon-mo · 2024-03-20T22:19:01Z

🚀 The feature, motivation and pitch

Currently we support guided decoding of (JSON, Regex, Choice, Grammar, and arbitrary JSON) in OpenAI inference server. It would be great that we expose the same functionality in the offline interface as well.

https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#extra-parameters

Concretely this would mean adding the support here as a new parameter to generate call. Using methods introduced in #2819.

Do make sure to add test and examples.

Alternatives

No response

Additional context

No response

kevinbu233 · 2024-03-20T22:27:25Z

Will work on this.

BedirT · 2024-04-09T21:37:40Z

Any update on this?

kevinbu233 · 2024-04-12T03:01:20Z

still working on this, sorry about the delay

maxdebayser · 2024-06-13T16:47:18Z

I'm trying to understand why this support is considered missing if you can already do this:

from vllm import LLM, SamplingParams
from outlines.serve.vllm import JSONLogitsProcessor
from pydantic import BaseModel, conlist
import datetime as dt

class Output(BaseModel):
    names: conlist(str, max_length=5)
    organizations: conlist(str, max_length=5)
    locations: conlist(str, max_length=5)
    miscellanous: conlist(str, max_length=5)

llm = LLM('mistralai/Mistral-7B-v0.1', max_model_len=10_000, gpu_memory_utilization=0.9)
logits_processor = JSONLogitsProcessor(schema=Output, llm=llm.llm_engine)
logits_processor.fsm.vocabulary = list(logits_processor.fsm.vocabulary)
prompt = """
Locate all the names, organizations, locations and other miscellaneous entities in the following sentence: 
"Charles went and saw Anna at the coffee shop Starbucks, which was based in a small town in Germany called Essen."
"""
sampling_params = SamplingParams(max_tokens=128, temperature=0, logits_processors=[logits_processor])

t0 = dt.datetime.now()
llm.generate([prompt] * 256, sampling_params=sampling_params)
time_elapsed = (dt.datetime.now() - t0).total_seconds()
print(f"Generation took {time_elapsed:,} seconds.")

(Example taken from #3087).

Is the point of this issue to make the use of guided decoding more intuitive for the user?

simon-mo · 2024-06-13T22:52:59Z

Yes just for ease of use, and a way to better reset the processor.

datalee · 2024-11-13T02:35:33Z

how to use in openai api（[OpenAI Compatible Server]）？

DarkLight1337 · 2024-11-13T02:37:39Z

Please use the extra parameters listed here

datalee · 2024-11-13T08:17:07Z

Please use the extra parameters listed here

Is there a version limit? Exception, is there a complete call example, thank you.

DarkLight1337 · 2024-11-13T08:20:47Z

Sorry I don't have time to come up with an full example. You can take a look at the tests (e.g. tests/entrypoints/openai/test_chat.py) for some inspiration.

As for the version limit, you can set the version of the docs via the bottom right menu and find the earliest one that documents these parameters.

datalee · 2024-11-14T00:45:46Z

Sorry I don't have time to come up with an full example. You can take a look at the tests (e.g. tests/entrypoints/openai/test_chat.py) for some inspiration.

As for the version limit, you can set the version of the docs via the bottom right menu and find the earliest one that documents these parameters.

ok,thanks.
If version 0.4.2[https://docs.vllm.ai/en/v0.4.2/serving/openai_compatible_server.html] is used, an error is reported：
{'object': 'error', 'message': "[{'type': 'extra_forbidden', 'loc': ('body', 'extra_body'), 'msg': 'Extra inputs are not permitted', 'input': {'guided_json': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string', 'maxLength': 10}, 'minItems': 3}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'required': ['name', 'age', 'skills', 'work_history']}, 'guided_decoding_backend': 'outlines'}}]", 'type': 'BadRequestError', 'param': None, 'code': 400}

input：
{'model': 'qwen1_5_7b', 'temperature': 0.3, 'top_p': 0.3, 'max_tokens': 1024, 'messages': [{'role': 'system', 'content': 'you are a helpful assistant'}, {'role': 'user', 'content': "Give an example JSON for an employee profile that fits this schema: {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string', 'maxLength': 10}, 'minItems': 3}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'required': ['name', 'age', 'skills', 'work_history']}"}], 'extra_body': {'guided_json': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'age': {'type': 'integer'}, 'skills': {'type': 'array', 'items': {'type': 'string', 'maxLength': 10}, 'minItems': 3}, 'work_history': {'type': 'array', 'items': {'type': 'object', 'properties': {'company': {'type': 'string'}, 'duration': {'type': 'number'}, 'position': {'type': 'string'}}, 'required': ['company', 'position']}}}, 'required': ['name', 'age', 'skills', 'work_history']}, 'guided_decoding_backend': 'outlines'}}

DarkLight1337 · 2024-11-14T03:05:09Z

How are you accessing the endpoint? extra_body is only for OpenAI client.

datalee · 2024-11-20T00:50:15Z

How are you accessing the endpoint? extra_body is only for OpenAI client.

yes，my endpoint is using vllm openai server，like ：xxxx/v1/chat/completions

DarkLight1337 · 2024-11-20T02:43:22Z

Can you show your code?

2U1 · 2024-12-06T06:04:52Z

@DarkLight1337

SQL_GRAMMER = """
    ?start: select_statement

    ?select_statement: "SELECT " column_list " FROM " table_name (join_clause)? (where_clause)? (order_by_clause)? (limit_clause)?

    ?column_list: column_name ("," column_name)*

    ?table_name: identifier

    ?join_clause: "JOIN " table_name " ON " condition

    ?where_clause: "WHERE " condition

    ?order_by_clause: "ORDER BY " column_name ("ASC" | "DESC")?

    ?limit_clause: "LIMIT " INT

    ?condition: identifier "=" value

    ?column_name: identifier
    ?value: STRING | INT

    ?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
    ?STRING: /'[^']*'/
    ?INT: /[0-9]+/
"""

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": system_prompt},
        {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt
            }
        ],
        }
    ],
    model=model,
    temperature=0,
    top_p=0.1,
    extra_body={
        "guided_grammer": SQL_GRAMMER
    },
)

I'm having the same issue with it. I'm using 0.6.4post1

DarkLight1337 · 2024-12-06T06:19:47Z

@DarkLight1337

SQL_GRAMMER = """
    ?start: select_statement

    ?select_statement: "SELECT " column_list " FROM " table_name (join_clause)? (where_clause)? (order_by_clause)? (limit_clause)?

    ?column_list: column_name ("," column_name)*

    ?table_name: identifier

    ?join_clause: "JOIN " table_name " ON " condition

    ?where_clause: "WHERE " condition

    ?order_by_clause: "ORDER BY " column_name ("ASC" | "DESC")?

    ?limit_clause: "LIMIT " INT

    ?condition: identifier "=" value

    ?column_name: identifier
    ?value: STRING | INT

    ?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
    ?STRING: /'[^']*'/
    ?INT: /[0-9]+/
"""

chat_completion = client.chat.completions.create(
    messages=[
        {"role": "system", "content": system_prompt},
        {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": user_prompt
            }
        ],
        }
    ],
    model=model,
    temperature=0,
    top_p=0.1,
    extra_body={
        "guided_grammer": SQL_GRAMMER
    },
)

I'm having the same issue with it. I'm using 0.6.4post1

You made a typo. The field is called "guided_grammar", not "guided_grammer".

simon-mo added the feature request label Mar 20, 2024

simon-mo mentioned this issue Mar 20, 2024

Add guided decoding for OpenAI API server #2819

Merged

kevinbu233 mentioned this issue Apr 16, 2024

Added Support for guided decoding in offline interface #4130

Closed

kevinbu233 mentioned this issue Jul 28, 2024

Support for guided decoding for offline LLM #6878

Merged

DarkLight1337 closed this as completed in #6878 Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support Guided Decoding in `LLM` entrypoint #3536

[Feature]: Support Guided Decoding in `LLM` entrypoint #3536

simon-mo commented Mar 20, 2024

kevinbu233 commented Mar 20, 2024

BedirT commented Apr 9, 2024

kevinbu233 commented Apr 12, 2024

maxdebayser commented Jun 13, 2024 •

edited

Loading

simon-mo commented Jun 13, 2024

datalee commented Nov 13, 2024

DarkLight1337 commented Nov 13, 2024

datalee commented Nov 13, 2024

DarkLight1337 commented Nov 13, 2024 •

edited

Loading

datalee commented Nov 14, 2024

DarkLight1337 commented Nov 14, 2024

datalee commented Nov 20, 2024

DarkLight1337 commented Nov 20, 2024

2U1 commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

[Feature]: Support Guided Decoding in LLM entrypoint #3536

[Feature]: Support Guided Decoding in LLM entrypoint #3536

Comments

simon-mo commented Mar 20, 2024

🚀 The feature, motivation and pitch

Alternatives

Additional context

kevinbu233 commented Mar 20, 2024

BedirT commented Apr 9, 2024

kevinbu233 commented Apr 12, 2024

maxdebayser commented Jun 13, 2024 • edited Loading

simon-mo commented Jun 13, 2024

datalee commented Nov 13, 2024

DarkLight1337 commented Nov 13, 2024

datalee commented Nov 13, 2024

DarkLight1337 commented Nov 13, 2024 • edited Loading

datalee commented Nov 14, 2024

DarkLight1337 commented Nov 14, 2024

datalee commented Nov 20, 2024

DarkLight1337 commented Nov 20, 2024

2U1 commented Dec 6, 2024

DarkLight1337 commented Dec 6, 2024

[Feature]: Support Guided Decoding in `LLM` entrypoint #3536

[Feature]: Support Guided Decoding in `LLM` entrypoint #3536

maxdebayser commented Jun 13, 2024 •

edited

Loading

DarkLight1337 commented Nov 13, 2024 •

edited

Loading