-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: obtain logits #11397
Comments
A bit of a hack, but right now you can initialize llm = LLM(..., task="embed", override_pooler_config=PoolerConfig(pooling_type="ALL")) and call Note that this is not intended usage of the |
Thank you for your response! but I am wondering if there can be a way that can both generate and return logits. Since you already know all the logits during the generating process, obtaining them from another instance seems unefficient. Maybe there can be a field like log probs that return top k logit values. but still, thanks for the temporary bypass method. |
for those who have also come across this problem, I found the key logic is at: vllm/vllm/model_executor/layers/sampler.py Lines 265 to 275 in 5bfb30a
you may do anything you want to logits here, and if you want to obtain it, you can modify: vllm/vllm/model_executor/layers/sampler.py Lines 313 to 322 in 5bfb30a
e.g. replace logprobs with logits. the type of for val, lst in zip(something, sample_logprobs):
for d in lst:
for k in d.keys():
d[k].logprob = anything_you_want_to_obtain this is a bit destructive but can work more efficiently than running another model. |
Kindly help me on this! This the outcome for logprobs=5 from the qwen2-vl-7b model [{785: Logprob(logprob=0.0, rank=1, decoded_token='The'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {4462: Logprob(logprob=0.0, rank=1, decoded_token=' member'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {594: Logprob(logprob=0.0, rank=1, decoded_token="'s"), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {2400: Logprob(logprob=0.0, rank=1, decoded_token=' date'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {315: Logprob(logprob=0.0, rank=1, decoded_token=' of'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {7194: Logprob(logprob=0.0, rank=1, decoded_token=' birth'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {320: Logprob(logprob=0.0, rank=1, decoded_token=' ('), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {96576: Logprob(logprob=0.0, rank=1, decoded_token='DOB'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {8: Logprob(logprob=0.0, rank=1, decoded_token=')'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {374: Logprob(logprob=0.0, rank=1, decoded_token=' is'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {220: Logprob(logprob=0.0, rank=1, decoded_token=' '), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {17: Logprob(logprob=0.0, rank=1, decoded_token='2'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {22: Logprob(logprob=0.0, rank=1, decoded_token='7'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {14: Logprob(logprob=0.0, rank=1, decoded_token='/'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {15: Logprob(logprob=0.0, rank=1, decoded_token='0'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {23: Logprob(logprob=0.0, rank=1, decoded_token='8'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {14: Logprob(logprob=0.0, rank=1, decoded_token='/'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {16: Logprob(logprob=0.0, rank=1, decoded_token='1'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {24: Logprob(logprob=0.0, rank=1, decoded_token='9'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {21: Logprob(logprob=0.0, rank=1, decoded_token='6'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {19: Logprob(logprob=0.0, rank=1, decoded_token='4'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {13: Logprob(logprob=0.0, rank=1, decoded_token='.'), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}, {151645: Logprob(logprob=0.0, rank=1, decoded_token=''), 2: Logprob(logprob=-inf, rank=2, decoded_token='#'), 0: Logprob(logprob=-inf, rank=3, decoded_token='!'), 3: Logprob(logprob=-inf, rank=4, decoded_token='$'), 1: Logprob(logprob=-inf, rank=5, decoded_token='"')}] |
Please follow the above comment to obtain the logits instead of the logprobs. |
@DarkLight1337, Is there any mistakes This is the code right sir,
I am getting, before logits: tensor([[ 50.8878, 80.5463, 81.7951, ..., -18.3415, -18.3415, -18.3415]], after top p&k: tensor([[-inf, -inf, -inf, ..., -inf, -inf, -inf]], device='cuda:0') probs: tensor([[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0') |
I think you just need to insert this
before |
I just commented this piece of code. After that, This is the outcome But these value are not proper format and also unwanted tokens are also present in this outcome |
The values in |
Apologise Cyrus @DarkLight1337 sir. But i don't saw the logits for this token value - 27/08/1964 How to get this. This was sampling params, |
Actually, maybe you can set |
This is more what I want! I think this method can save computation if one only wants to get logits of prompt, However, there are two problems of which I wonder if @DarkLight1337 have any solutions:
|
This is by design since we don't currently support running both generate and pooling model workers at the same time.
For the TP=1 case, you can access the model instance directly via
You can compute the logits by calling |
Thank you a lot @DarkLight1337! But the workflow is still not right. In order to compute_logits, I need hidden_states, which I can get from llm.encode(), and sampling_metadata. However, sampling_metadata is not an attribute of vllm.worker.pooling_model_runner.PoolingModelRunner. This means that if I work with task="embed", I cannot get sampling_metadata which is required by model.compute_logits. Can you help me out with this? And with the first question
What I need is to start one model by llm = LLM(...) so that I only have one GPU location occupied, and then be able to both "generate" and "encode". I know maybe I should modify sampling_params.py but I am not sure. Can you give me a hint on how to save GPU so that I can generate whereas get logits of prompts at the same time? |
I see. In that case, you can just call |
You can apply the change as suggested in a previous comment: #11397 (comment) |
Thank you a lot for staying with me here! @DarkLight1337 I have tested in the code and realized that promp_logprobs are lists of None's and sample_logprobs, as suggested in #11397 (comment), are lists of lists of Logprob dicts, so sample_logprobs indeed have values. But sample_logprobs are not what I want, because I want to obtain logits of prompts and sample_logprobs have variable lengths compared to prompt token sequences. I do notice that with "sampling_params = SamplingParams(temperature=0., prompt_logprobs=1)", llm.generate() generates "prompt_logprobs=[None, PromptLogprobs, PromptLogprobs, ...]". Why does it start with a "None", as it is not the case for logprobs when I set "sampling_params = SamplingParams(temperature=0., logprobs=1)"? At the same time, the prompt_logprobs in sampler.py are still lists of None's, instead of lists of PromptLogprobs. This is also weird to me, which makes me wonder if the computation of promp_logprobs is done somewhere else. This can also be validated that the logits in sampler.py are of shapes (1, vocab_size), and I notice that the logits, which have only a shape of 1 token, have been allocated to sample_logprobs instead of prompt_logprobs. |
At this point, your guess is as good as mine (I'm not familiar with this part of the code). It is possible that the first element is None because no logprobs are generated for the first token (the logprobs outputted by the model from the first token are for the second token). I suggest looking into the code further to better understand the details. |
Alrighty. Let me dig deeper and hopefully I can update to community some neat solutions |
Overall, this is quite helpful as a starting point if you only want to get prompt logits. You just need to modify the ModelRunner class a little bit |
🚀 The feature, motivation and pitch
same as issue #185 , which is not solved but closed.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: