-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend] API support for beam search for MQLLMEngine #9117
Changes from 7 commits
5add5d7
1375b59
b57bff2
8384451
f9843df
a33ea39
ac5520c
b3c5d05
b22e8bd
097479d
44334ee
4918f6a
e65ed31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
|
||
from vllm.config import ModelConfig | ||
from vllm.engine.async_llm_engine import AsyncLLMEngine | ||
from vllm.engine.multiprocessing.client import MQLLMEngineClient | ||
from vllm.engine.protocol import EngineClient | ||
from vllm.entrypoints.logger import RequestLogger | ||
# yapf conflicts with isort for this block | ||
|
@@ -150,15 +151,16 @@ async def create_completion( | |
log_tracing_disabled_warning() | ||
|
||
if isinstance(sampling_params, BeamSearchParams): | ||
if not isinstance(self.engine_client, AsyncLLMEngine): | ||
raise ValueError( | ||
"Beam search in the API server is only supported" | ||
" with AsyncLLMEngine. please add " | ||
"`--disable-frontend-multiprocessing` to " | ||
"use beam search.") | ||
assert isinstance(self.engine_client, | ||
(AsyncLLMEngine, | ||
MQLLMEngineClient)), \ | ||
"Beam search is only supported with" \ | ||
"AsyncLLMEngine and MQLLMEngineClient." | ||
generator = self.engine_client.beam_search( | ||
prompt_inputs["prompt_token_ids"], request_id_item, | ||
sampling_params) | ||
prompt_inputs["prompt_token_ids"], | ||
request_id, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh yeah it's supposed to be |
||
sampling_params, | ||
) | ||
else: | ||
generator = self.engine_client.generate( | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, you don't actually need the
assert
You can
beam_search
to theEngineClientProtocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the base
EngineClientProtocol
has abeam_search
function. I can add it in though.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not. But the
EngineClientProtocol
defines the behavior ofAsyncLLMEngine
andMQLLMEngine
. So now that both support it, you can expandEngineClientProtocol
to include thebeam_search
apiThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot find
EngineClientProtocol
. does it exist now? @robertgshaw2-neuralmagicThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, it should be
EngineClient
class. butMQLLMEngineClient
does not inherit fromEngineClient
. we can make it a future step to absorb beam search implementation into theEngineClient
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EngineClient is a protocol. MQLLMEngine should inherit from this. If it doesn’t, I’ll submit a PR to make it (since we support the full API). On train so AFK
Either way, we are about to collapse MQLLMEngine and AsyncLLMEngine once we have PP working, so the concept of an EngineClient will be removed once this is done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good. I'll go ahead and merge this pr after it is ready. and after you make MQLLMEngine inherit from EngineClient, we can merge separate beam search implementation in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that it's currently a
Protocol
,MQLLMEngine
technically doesn't need to subclass it directly. But it would probably be good to anyhow and actually we could consider changing it to an ABC instead.I agree with @robertgshaw2-neuralmagic that this method should just be added to
EngineClient
though and we should not need these type assertions.Not directly related to this PR but I also think we should consider renaming it to something like
AsyncEngineClient
, and have a way to obtain an instance of anAsyncEngineClient
which doesn't involve explicit construction. And that would replace explicit use ofAsyncLLMEngine
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes are welcome on this part!