Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GPTQ quantization #467

Merged
merged 65 commits into from
Aug 9, 2024
Merged

Implement GPTQ quantization #467

merged 65 commits into from
Aug 9, 2024

Conversation

EricLBuehler
Copy link
Owner

This PR adds GPTQ quantization (paper here) support.

Refs: #418, #448.

Copy link

github-actions bot commented Jun 23, 2024

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   11          102          101            0            1
 Python                 45         1993         1695           62          236
 TOML                   19          574          506           11           57
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               25         1842            0         1385          457
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 6          408          365           19           24
 |- TOML                 2           75           63            0           12
 (Total)                           2530          620         1404          506
-------------------------------------------------------------------------------
 Rust                  173        55983        50835         1000         4148
 |- Markdown            92          864           13          801           50
 (Total)                          56847        50848         1801         4198
===============================================================================
 Total                 282        61005        53559         2458         4988
===============================================================================
  

@EricLBuehler EricLBuehler added the new feature New feature or request label Jun 24, 2024
@EricLBuehler EricLBuehler added the backend Backend work label Aug 8, 2024
@EricLBuehler
Copy link
Owner Author

cargo run --features cuda -- -i plain -m kaitchup/Phi-3-mini-4k-instruct-gptq-4bit -a phi3

@EricLBuehler EricLBuehler merged commit 1269bd8 into master Aug 9, 2024
14 of 15 checks passed
@EricLBuehler EricLBuehler deleted the gptq branch August 9, 2024 17:49
@BuildBackBuehler
Copy link

Broke my heart when I went to try Mistral_Large 2-bit EQAT (AutoGPTQ) on my M1 and only then saw no Mac support 😭. Wondering when might that come around? If adding that support for MPS/Metal was not too high-level expert knowledge prereq'd of a task I wouldn't mind taking a swing at it 😂

@EricLBuehler
Copy link
Owner Author

@BuildBackBuehler If you could add this, it would be amazing!

I haven't seen GPTQ kernels on Mac though, if you can find any it shouldn't be too hard to add it and I would appreciate it if you take a shot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend work new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants