Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UQFF: The uniquely powerful quantized file format. #770

Merged
merged 25 commits into from
Sep 16, 2024
Merged

Conversation

EricLBuehler
Copy link
Owner

@EricLBuehler EricLBuehler commented Sep 12, 2024

A major limitation of ISQ right now is that it is ephemeral: the tensors must be requantized on every run.

To solve this, this PR enables saving and loading ISQ artifacts into safetensors files.

Progress

  • Saving (serialization)
    • GGUF ISQ quantized
    • HQQ ISQ quantized
  • Loading (deserialization)
    • GGUF ISQ quantized
    • HQQ ISQ quantized

@EricLBuehler EricLBuehler added new feature New feature or request backend Backend work labels Sep 12, 2024
Copy link

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           34           25            0            9
 Happy                   1          442          369            0           73
 JSON                   12          105          104            0            1
 Python                 47         2043         1741           62          240
 TOML                   20          600          539            2           59
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          196          169            1           26
 (Total)                            273          201           32           40
-------------------------------------------------------------------------------
 Markdown               30         2085            0         1585          500
 |- BASH                 5          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               5           92           82            0           10
 |- Rust                 7          442          396           22           24
 |- TOML                 2           75           63            0           12
 (Total)                           2807          651         1607          549
-------------------------------------------------------------------------------
 Rust                  203        63009        57195         1157         4657
 |- Markdown           104          961           13          896           52
 (Total)                          63970        57208         2053         4709
===============================================================================
 Total                 323        68374        60020         2808         5546
===============================================================================
  

@EricLBuehler
Copy link
Owner Author

RUST_BACKTRACE=1 cargo run --features cuda -- -i plain -m microsoft/Phi-3.5-mini-instruct --load-isq out.safetensors
RUST_BACKTRACE=1 cargo run --features cuda -- --isq Q4K -i plain -m microsoft/Phi-3.5-mini-instruct --isq-artifact out.safetensors

@EricLBuehler EricLBuehler changed the title Serializing and deserializing ISQ artifacts UQFF: The uniquely powerful quantized file format. Sep 13, 2024
@EricLBuehler EricLBuehler merged commit eadebac into master Sep 16, 2024
12 checks passed
@EricLBuehler EricLBuehler deleted the isq_serde branch September 16, 2024 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Backend work new feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant