Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I failed to reproduce the Llama2-7b-4k (w/o SFT) in the paper #17

Open
WNQzhu opened this issue Jul 31, 2024 · 1 comment
Open

I failed to reproduce the Llama2-7b-4k (w/o SFT) in the paper #17

WNQzhu opened this issue Jul 31, 2024 · 1 comment

Comments

@WNQzhu
Copy link

WNQzhu commented Jul 31, 2024

Hi, I failed to reproduce the Llama2-7b-4k (w/o SFT) in the paper.

Here is our result:

Methods Tokens Coursera GSM QuALITY TOEFL CodeU SFiction Avg
(L-Eval)Llama2-7b-4k (w/o SFT) 4k 20.05 2.0 28.71 24.53 0.00 40.62 19.31
(Ours) Llama2-7b-4k (w/o SFT) 4k 15.26 19.0 30.69 13.01 3.33 35.93 19.54

Here is our experimental setting:
We change the llama2-chat-test.py file, disable the NTK parameters and using LLama2-7b to conduct the evaluation.
And run like this:
python3 Baselines/llama2-chat-test.py
--scale 7b
--max_length 4k
--metric exam_eval

What's the possible reason for that ? Should I adjust the prompt or other pamameters?

@ChenxinAn-fdu
Copy link
Collaborator

I did not use the chat format of Llama2-chat to test the base model. The prompt is very simple:
long ctx \nQ: instruction \nA:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants