Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support phi 3 #27

Merged
merged 3 commits into from
Jun 20, 2024
Merged

support phi 3 #27

merged 3 commits into from
Jun 20, 2024

Conversation

ZX-ModelCloud
Copy link
Contributor

No description provided.

@Qubitium
Copy link
Collaborator

@davidgxue We have validated this PR with modfications on real model quant with pre/post PPL. Ready for merge. Thank you for the Phi3 code!

@Qubitium Qubitium merged commit fd27156 into main Jun 20, 2024
2 of 3 checks passed
@Qubitium Qubitium deleted the zx_support_phi_3 branch June 20, 2024 13:10
@Qubitium
Copy link
Collaborator

Test Result:

Phi-3-medium-4k-instruct
Native PPL: 4.189669132232666
Format gptq_v2, Quantized PPL: 4.834439754486084

@davidgxue
Copy link
Contributor

Awesome work and thank you for adding me on here haha! AutoGPTQ library still seems glitchy? Both LLAMA 3 and Phi3 quants from AutoGPTQ has gibberish output issues when using with transformers... Hopefully this repo is better maintained!

@Qubitium
Copy link
Collaborator

Qubitium commented Jun 21, 2024

@davidgxue Phi3 appears to contain fused modules and from our experience, any fused modules, will result in less than ideal calibration. That's why you see the Phi-3 post quant much higher (diff vs pre-quant) than other models. As such, worse output is expected but I do see an issue with bad/gibberish.

Can you provide me with examples of gibberish (are these generated on native or gptq qunated) models? With our team, expert in gptq quants, we may be able to src the bug.

I would highly recommend you to re-quant them use GPTQModel as our code vs AutoGPTQ will in many cases by default generate higher quality quants due to the optimizations we have made. We also added lots of protection against setting toggles that generate bad quants.

Hopefully this repo is better maintained!

This is our pledge, mission statement, and core reason why I pushed our team to fork AutoGPTQ.

@davidgxue
Copy link
Contributor

Oh yeah the fused modules part is definitely expected.

What I meant is LLAMA 3 and Phi 3 (well i guess they are both llama family maybe) literally not working. I have had this issue open for a long time on AutoGPTQ repo, and im not the only one cuz I released a few GPTQ quants on huggingface and im being spammed in DM or huggingface issues about people experiencing the exact same thing.
AutoGPTQ/AutoGPTQ#657

This is our pledge, mission statement, and core reason why I pushed our team to fork AutoGPTQ.

Love it! Will follow you guys and see if there are stuff I can help out on!

DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
DeJoker pushed a commit to DeJoker/GPTQModel that referenced this pull request Jul 19, 2024
* initial attempt to add phi 3 (may need unfuse)

* fix Phi3GPTQ base_modules

---------

Co-authored-by: David Xue <xuegdxw@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants