GPTQModel v1.1.0
What's Changed
IBM Granite model support. Full auto-buildless wheel install from pypi. Reduce max cpu memory usage by >20% during quantization. 100% CI model/feature coverage. Updated hf-integration support with latest transformers.
Full deprecations: liger-kernel support and exllama v1 quant kernel.
- Fix deprecated by @CSY-ModelCloud in #447
- [COMPAT] [FIX] vllm params by @ZYC-ModelCloud in #448
- add estimate-vram by @PZS-ModelCloud in #452
- add field uri by @ZYC-ModelCloud in #449
- auto infer model base name from model files by @ZYC-ModelCloud in #451
- remove exllama v1 by @PZS-ModelCloud in #453
- [SECURITY] drop support of loading unsafe .bin weights by @ZYC-ModelCloud in #460
- [MODEL] add granite support by @LRL-ModelCloud in #466
- Split base.py file by @ZYC-ModelCloud in #465
- Move save_quantized function into saver.py by @ZYC-ModelCloud in #467
- remove deprecated exllama v1 code by @Qubitium in #473
- [MISC] move model def file to model_def folder by @PZS-ModelCloud in #479
- [FIX] Fix unit test by @PZS-ModelCloud in #480
- Download whl in setup.py by @CSY-ModelCloud in #481
- [Fix] cpu memory leak by @ZX-ModelCloud in #485
- [CI] set ninja threads to 4 by @CSY-ModelCloud in #487
- [FIX] sharded model loading error by @ZX-ModelCloud in #490
- add internlm test by @PZS-ModelCloud in #491
- remove needless function by @ZYC-ModelCloud in #494
- Fix unit test by @ZYC-ModelCloud in #495
- [FIX] fix test_integration by @PZS-ModelCloud in #497
- [Test] add codegen and xverse test by @PZS-ModelCloud in #496
Full Changelog: v1.0.9...v1.1.0