-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace #6971
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -295,7 +295,10 @@ def _configure_model_sampler_for_spec_decode(self): | |
""" | ||
(self.scorer_worker.model_runner.model.sampler.include_gpu_probs_tensor | ||
) = True | ||
(self.scorer_worker.model_runner.model.sampler. | ||
should_modify_greedy_probs_inplace) = True | ||
Comment on lines
+298
to
+299
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we put this behind an interface so refactors to the worker / model / sampler do spread everywhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if there is a clean way to do this, as the line above sets There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh right.. sorry I must have misread. |
||
self.proposer_worker.set_include_gpu_probs_tensor() | ||
self.proposer_worker.set_should_modify_greedy_probs_inplace() | ||
|
||
def determine_num_available_blocks(self) -> Tuple[int, int]: | ||
"""Determine the number of cache blocks to use. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused why this is here, why does this layer need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, I just greped for all the places that
include_gpu_probs_tensor
was used and added the change there to keep semantics identical. See the lines above.