Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Expose hqq through
int4_weight_only
API
Summary: att, this is a follow up for pytorch#605 to make hqq available in quantize_ API `quantize_(model, int4_weight_only(group_size, use_hqq=True)` Test Plan: python generate.py --compile --quantization int4wo-hqq-64 --precision bfloat16 Average tokens/sec: 195.24 Average Bandwidth: 729.40 GB/s Peak Memory Usage: 5.09 GB Model Size: 3.74 GB python eval.py --compile --quantization int4wo-hqq-64 --precision bfloat16 wikitext: {'word_perplexity,none': 12.823631773497512, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.611400903914048, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6883154699192412, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Reviewers: Subscribers: Tasks: Tags:
- Loading branch information