GPTQModel v0.9.11
What's Changed
Added LG EXAONE 3.0 model support. New dynamic per layer/module flexible quantization where each layer/module may have different bits/params. Added proper sharding support to backend.BITBLAS. Auto-heal quantization errors due to small damp values.
- [CORE] add support for pack and shard to bitblas by @LRL-ModelCloud in #316
- Add
dynamic
bits by @PZS-ModelCloud in #311, #319, #321, #323, #327 - [MISC] Adjust the validate order of QuantLinear when BACKEND is AUTO by @ZX-ModelCloud in #318
- add save_quantized log model total size by @PZS-ModelCloud in #320
- Auto damp recovery by @CSY-ModelCloud in #326
- [FIX] add missing original_infeatures by @CSY-ModelCloud in #337
- Update Transformers to 4.44.0 by @Qubitium in #336
- [MODEL] add exaone model support by @LRL-ModelCloud in #340
- [CI] Upload wheel to local server by @CSY-ModelCloud in #339
- [MISC] Fix assert by @CSY-ModelCloud in #342
Full Changelog: v0.9.10...v0.9.11