Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

Open
zhangtia16 opened this issue Oct 9, 2024 · 7 comments
Open

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

zhangtia16 opened this issue Oct 9, 2024 · 7 comments

Comments

@zhangtia16
Copy link

Hi EAGLE Team,

Thank you for your contributions to the community!

I downloaded the released weights for EAGLE on Qwen2-7B-Instruct from https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct. However, while testing the weights on the MT-Bench dataset, I noticed that the accept length is relatively low as follows:

Model:Qwen2-7B-Instruct
Dataset:MT-Bench
EAGLE version: EAGLE-1

accept length tree draft chain draft
t=0.0 2.14 1.69
t=1.0 1.71 1.50

For your information, I successfully reproduced the EAGLE1-Vicuna-7B results, achieving an accept length of over 3. Additionally, I have utilized your newly released Qwen2-related codes (modeling_qwen2_kv.py) from the EAGLE-2 code branch; however, I was unable to run it successfully with the EAGLE-2 code branch, as mentioned in issue 136. Consequently, I adapted the Qwen2-related codes to the EAGLE-1 code branch for testing.

I'm curious about the low accept length I'm experiencing with EAGLE-Qwen2. I see that only the weights for EAGLE-Qwen2 were released, without accompanying results. Could you please share the accept length or any other results for EAGLE-Qwen2 on MT-Bench?

Thank you!

@Liyuhui-12
Copy link
Collaborator

Thank you for your interest. Could you please provide more detailed error information from Qwen on the main branch?

@zhangtia16
Copy link
Author

As for the error on the main branch:
1.the function “initialize_tree” in utils_alpha.py returns 5 arguments, whereas the “forward” function in ea_model.py outputs only 3 arguments.
2.I noticed that the authors removed the “logits_processor” argument from the “forward” function in ea_model.py in the main branch, compared to the code branch of EAGLE-1. Could the authors please explain why this argument was deleted? I see that “logits_processor” is still being passed into the function call in evaluation/gen_ea_alpha_vicuna.py in the main branch.

@quanliu1991
Copy link

@Liyuhui-12 @zhangtia16 Hello, can you provide the test benchmarks for EAGLE Qwen2? The alpha value I tested on the EAGLE-Qwen2-72B-Instruct model is relatively low.

  1. Add the modeling_qwen2_kv.py model file on the v1 branch.
  2. When loading the EAGLE-Qwen2-72B-Instruct model parameters, set torch_dtype=torch.bfloat16.
  3. Use the gen_ea_alpha_llama2chat.py script to test on the mt_bench dataset.
  4. Perform inference in a Chain.
  5. Obtain the alpha value through the alpha.py script.

The alpha of EAGLE-Qwen2-72B-Instruct is [0.5 0.34 0.32 0.33 0.47],
and under the same conditions, the alpha of EAGLE-Vicuna-7B-v1.3 is [0.79 0.74 0.72 0.73 0.72].
I don't know if the test results of EAGLE-Qwen2-72B-Instruct can be consistent with your internal results. Thank you.

@zhangtia16
Copy link
Author

I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).

@quanliu1991
Copy link

I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).

How is the accept length of 1.87 calculated?
I use computational methods: (the total number of tokens accepted + the inference steps of the base model) / the inference steps of the base model

forward_numbers = alphas_num[0]
 accept_lengths = []
 for i in range(len(alphas)):
     accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1))

 print((sum(accept_lengths) + forward_numbers) / forward_numbers)

alphas_num and alphas are obtained from alpha.py

I did a test on EAGLE-Qwen2-7B-Instruct, and the result is as follows:
chain-draft
temperature=0: [0.43 0.35 0.42 0.5 0.8 ] accept length:2.55
temperature=1:[0.36 0.28 0.31 0.31 0.45] accept length:2.47
tree-draft
temperature=0: [0.66 0.46 0.47 0.34 0.71] accept length:2.98
temperature=1:[0.4 0.3 0.27 0.19 0.37] accept length:2.54

@zhangtia16
Copy link
Author

Since the authors did not directly output the acceptance length, I modified the code to calculate it. For details on the modification, please refer to issue #146. In summary, we record the number of accepted tokens at each step for every sample. Finally, the average number of accepted tokens (first averaged across the steps of a single sample, and then averaged across all samples) represents the acceptance length of the dataset.

As for your implementation, I think the right version should be accept_lengths.append((alphas_num[i] - alphas[i]) * (i)) rather than accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1)). Take an simple example of a [right,wrong,wrong,wrong,wrong] chain draft, the accept length should be 1 while your codes produce 2 with alpha=[1,0,0,0,0] and alpha_num=[1,1,0,0,0].

Btw, did you used the released checkpoints on MT-bench to get the alpha results?

@quanliu1991
Copy link

quanliu1991 commented Nov 17, 2024

@zhangtia16 You are correct. I modified accept length method and got new results for the EAGLE-Qwen2-7B-Instruct model, which are generally consistent with yours.

chain-draft
temperature=0: accept length:1.70
temperature=1:accept length:1.50
tree-draft
temperature=0: accept length:2.19
temperature=1:accept length:1.55

Here is the new calculation method for the accept length:

forward_numbers = alphas_num[0]
accept_lengths = []
for i in range(len(alphas)):
    accept_lengths.append((alphas_num[i] - alphas[i]) * (i))
accept_lengths.append(alphas[4] * 5)

print((sum(accept_lengths) + forward_numbers) / forward_numbers)

The checkpoints used are those published by the author at https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct.

If our testing results are correct, the model's performance does not appear to surpass that of the Vicuna and Llama models released by the author.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants