-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : remove bit shuffling #1405
Merged
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
5fa47bf
ggml : remove Q4_0 bit shufling (ARM NEON)
ggerganov 844d2af
ggml : remove Q4_1 bit shuffling (ARM NEON + reference)
ggerganov fd2a137
ggml : nibbles_from_floats() + bytes_from_nibbles() (ARM NEON)
ggerganov 9f3285f
ggml : remove Q4_2 bit shuffling (WIP, BROKEN)
ggerganov aa78dfe
ggml : remove Q5_0 bit shuffling (ARM NEON)
ggerganov b37a08f
ggml : 2x faster scalar implementations
ggerganov 292a778
ggml : remove Q5_1 bit shuffling (ARM NEON + scalar)
ggerganov caaacd5
ggml : simplify scalar dot
ggerganov 0add640
ggml : remove WASM SIMD bit shuffling + remove vzip for ARM 32-bit
ggerganov 9472d0e
ggml : fix Q4_1 quantization
ggerganov cdc9607
ggml : update cuBLAS + normalize variable names
ggerganov 4bf1c8a
ggml : remove Q4_2 mode
ggerganov b08c39b
ggml : minor formatting
ggerganov 8367455
ggml : fix Q5_0 quantization
ggerganov 928d2f3
scripts : add script for measuring the time per token
ggerganov 9e49d20
AVX implementations (#1370)
sw 489bd13
ggml : uniform 5th bit extraction
ggerganov d52172a
llama : produce error upon loading old model files
ggerganov 09032e0
llama : fix model magic/version write
ggerganov b7ad385
ggml : speed-up Q5_0 + Q5_1 at 4 threads
ggerganov 695f396
ggml : preserve old Q4 and Q5 formats
ggerganov 582a39f
ggml : simplify Q8_1 - no need for low / high sums anymore
ggerganov 6680244
ggml : fix Q8_0 and Q8_1 rounding
ggerganov bd5e373
Revert "AVX implementations (#1370)"
ggerganov 5bc286a
ggml : fix AVX2 implementation
ggerganov e038e01
sha : update hashes for 7B and 13B
ggerganov 51c25fd
readme : update timings + remove warning banner
ggerganov 1c87847
llama : update v2 PR number to 1405
ggerganov 832c53f
ggml : fix WASM comments
ggerganov ca7f069
ggml : back to original bit order
ggerganov b58b1f4
readme : add note that Q4 and Q5 have been changed
ggerganov cbb6a3a
llama : fix return for unknown version
ggerganov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,5 +44,6 @@ zig-cache/ | |
|
||
ppl-*.txt | ||
qnt-*.txt | ||
perf-*.txt | ||
|
||
examples/jeopardy/results.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "qauntization" a typo? 🤔