-
Notifications
You must be signed in to change notification settings - Fork 248
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add the kernels * Remove include of aten or torch * Add the ffi bindings * Sketch the forward method * Handle input and output reshapes * Add some features * Improve compat of build.rs * Fix workspace dep * Finish merge * Fixes * Finish gptq gemm and add trait * Add the cuda gptq matmul stub * Remove default feature * Correct conditional comp * Add gguf qmatmul quantized support * Implement matmul with qmethod in qllama * Update readme of mistralrs quant * int* to int64* in q_gemm.cu * Add model and pipeline * Add gptq loader selector * Rename quantized_config -> quantization_config * Fix g_idx shape * Ensure WNA16 * Broadcast add * Format * int64_t* -> int* rollback * Prep for correct types * Finish merge * Complete merge * Update cargo lock * Integrate with new i32 type * Fixes * More progress * It doesnt crash * Oops * It works! * Remove some todos * Testing isq support * Add to all non adapter models * Clippy * Avoid reallocating * Add support for gptq to adapter models * Add docs and logging * Update docs * Clippy * Remove a todo
- Loading branch information
1 parent
249299b
commit 1269bd8
Showing
77 changed files
with
6,009 additions
and
1,244 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Quantization in mistral.rs | ||
|
||
Mistral.rs supports the following quantization: | ||
- GGUF/GGML | ||
- Q, K type | ||
- Supported in GGUF/GGML and GGUF/GGML adapter models | ||
- I quants coming! | ||
- CPU, CUDA, Metal (all supported devices) | ||
- GPTQ | ||
- Supported in all plain and adapter models | ||
- CUDA only | ||
- ISQ | ||
- Q, K type GGUF quants | ||
- Supported in all plain and adapter models | ||
- I quants coming! | ||
- GPTQ quants coming! | ||
- CPU, CUDA, Metal (all supported devices) | ||
|
||
## Using a GGUF quantized model | ||
- Use the `gguf` (cli) / `GGUF` (Python) model selector | ||
- Provide the GGUF file | ||
|
||
``` | ||
cargo run --features cuda -- -i gguf -f my-gguf-file.gguf | ||
``` | ||
|
||
## Using ISQ | ||
See the [docs](ISQ.md) | ||
|
||
``` | ||
cargo run --features cuda -- -i --isq Q4K plain -m microsoft/Phi-3-mini-4k-instruct -a phi3 | ||
``` | ||
|
||
## Using a GPTQ quantized model | ||
- Use the `plain` (cli) / `Plain` (Python) model selector | ||
- Provide the model ID for the GPTQ model | ||
- Mistral.rs will automatically detect and use GPTQ quantization. | ||
|
||
``` | ||
cargo run --features cuda -- -i plain -m kaitchup/Phi-3-mini-4k-instruct-gptq-4bit -a phi3 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.