Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

zhangtia16 · 2024-10-09T18:14:09Z

Hi EAGLE Team,

Thank you for your contributions to the community!

I downloaded the released weights for EAGLE on Qwen2-7B-Instruct from https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct. However, while testing the weights on the MT-Bench dataset, I noticed that the accept length is relatively low as follows:

Model:Qwen2-7B-Instruct
Dataset:MT-Bench
EAGLE version: EAGLE-1

accept length	tree draft	chain draft
t=0.0	2.14	1.69
t=1.0	1.71	1.50

For your information, I successfully reproduced the EAGLE1-Vicuna-7B results, achieving an accept length of over 3. Additionally, I have utilized your newly released Qwen2-related codes (modeling_qwen2_kv.py) from the EAGLE-2 code branch; however, I was unable to run it successfully with the EAGLE-2 code branch, as mentioned in issue 136. Consequently, I adapted the Qwen2-related codes to the EAGLE-1 code branch for testing.

I'm curious about the low accept length I'm experiencing with EAGLE-Qwen2. I see that only the weights for EAGLE-Qwen2 were released, without accompanying results. Could you please share the accept length or any other results for EAGLE-Qwen2 on MT-Bench?

Thank you!

Liyuhui-12 · 2024-10-21T09:26:37Z

Thank you for your interest. Could you please provide more detailed error information from Qwen on the main branch?

zhangtia16 · 2024-10-21T10:00:01Z

As for the error on the main branch:
1.the function “initialize_tree” in utils_alpha.py returns 5 arguments, whereas the “forward” function in ea_model.py outputs only 3 arguments.
2.I noticed that the authors removed the “logits_processor” argument from the “forward” function in ea_model.py in the main branch, compared to the code branch of EAGLE-1. Could the authors please explain why this argument was deleted? I see that “logits_processor” is still being passed into the function call in evaluation/gen_ea_alpha_vicuna.py in the main branch.

quanliu1991 · 2024-11-14T11:57:04Z

@Liyuhui-12 @zhangtia16 Hello, can you provide the test benchmarks for EAGLE Qwen2? The alpha value I tested on the EAGLE-Qwen2-72B-Instruct model is relatively low.

Add the modeling_qwen2_kv.py model file on the v1 branch.
When loading the EAGLE-Qwen2-72B-Instruct model parameters, set torch_dtype=torch.bfloat16.
Use the gen_ea_alpha_llama2chat.py script to test on the mt_bench dataset.
Perform inference in a Chain.
Obtain the alpha value through the alpha.py script.

The alpha of EAGLE-Qwen2-72B-Instruct is [0.5 0.34 0.32 0.33 0.47],
and under the same conditions, the alpha of EAGLE-Vicuna-7B-v1.3 is [0.79 0.74 0.72 0.73 0.72].
I don't know if the test results of EAGLE-Qwen2-72B-Instruct can be consistent with your internal results. Thank you.

zhangtia16 · 2024-11-15T01:38:43Z

I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).

quanliu1991 · 2024-11-15T16:10:54Z

I have configured my setup similarly to your points 1-5 (modified v1-branch, bf16, mt-bench, chain-draft, temperature=0), with the only difference being that I am using the EAGLE-Qwen2-7B-Instruct checkpoints provided by the authors. Here are my alpha results: [0.31, 0.24, 0.25, 0.31, 0.31], corresponding with an accept length of 1.87 (already considering the +1 token issue).

How is the accept length of 1.87 calculated？
I use computational methods: (the total number of tokens accepted + the inference steps of the base model) / the inference steps of the base model

forward_numbers = alphas_num[0]
 accept_lengths = []
 for i in range(len(alphas)):
     accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1))

 print((sum(accept_lengths) + forward_numbers) / forward_numbers)

alphas_num and alphas are obtained from alpha.py

I did a test on EAGLE-Qwen2-7B-Instruct, and the result is as follows:
chain-draft
temperature=0: [0.43 0.35 0.42 0.5 0.8 ] accept length：2.55
temperature=1：[0.36 0.28 0.31 0.31 0.45] accept length：2.47
tree-draft
temperature=0: [0.66 0.46 0.47 0.34 0.71] accept length：2.98
temperature=1：[0.4 0.3 0.27 0.19 0.37] accept length：2.54

zhangtia16 · 2024-11-16T07:58:14Z

Since the authors did not directly output the acceptance length, I modified the code to calculate it. For details on the modification, please refer to issue #146. In summary, we record the number of accepted tokens at each step for every sample. Finally, the average number of accepted tokens (first averaged across the steps of a single sample, and then averaged across all samples) represents the acceptance length of the dataset.

As for your implementation, I think the right version should be accept_lengths.append((alphas_num[i] - alphas[i]) * (i)) rather than accept_lengths.append((alphas_num[i] - alphas[i]) * (i + 1)). Take an simple example of a [right,wrong,wrong,wrong,wrong] chain draft, the accept length should be 1 while your codes produce 2 with alpha=[1,0,0,0,0] and alpha_num=[1,1,0,0,0].

Btw, did you used the released checkpoints on MT-bench to get the alpha results?

quanliu1991 · 2024-11-17T10:48:54Z

@zhangtia16 You are correct. I modified accept length method and got new results for the EAGLE-Qwen2-7B-Instruct model, which are generally consistent with yours.

chain-draft
temperature=0: accept length：1.70
temperature=1：accept length：1.50
tree-draft
temperature=0: accept length：2.19
temperature=1：accept length：1.55

Here is the new calculation method for the accept length:

forward_numbers = alphas_num[0]
accept_lengths = []
for i in range(len(alphas)):
    accept_lengths.append((alphas_num[i] - alphas[i]) * (i))
accept_lengths.append(alphas[4] * 5)

print((sum(accept_lengths) + forward_numbers) / forward_numbers)

The checkpoints used are those published by the author at https://huggingface.co/yuhuili/EAGLE-Qwen2-7B-Instruct.

If our testing results are correct, the model's performance does not appear to surpass that of the Vicuna and Llama models released by the author.

YunSeoHwan mentioned this issue Oct 14, 2024

The acceptance length is not being reproduced #146

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

zhangtia16 commented Oct 9, 2024

Liyuhui-12 commented Oct 21, 2024

zhangtia16 commented Oct 21, 2024

quanliu1991 commented Nov 14, 2024

zhangtia16 commented Nov 15, 2024

quanliu1991 commented Nov 15, 2024

zhangtia16 commented Nov 16, 2024

quanliu1991 commented Nov 17, 2024 •

edited

Loading

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

Inquiry about accept length results for EAGLE-Qwen2-7B-Instruct #143

Comments

zhangtia16 commented Oct 9, 2024

Liyuhui-12 commented Oct 21, 2024

zhangtia16 commented Oct 21, 2024

quanliu1991 commented Nov 14, 2024

zhangtia16 commented Nov 15, 2024

quanliu1991 commented Nov 15, 2024

zhangtia16 commented Nov 16, 2024

quanliu1991 commented Nov 17, 2024 • edited Loading

quanliu1991 commented Nov 17, 2024 •

edited

Loading