Release GPTQModel v1.1.0 · ModelCloud/GPTQModel

What's Changed

IBM Granite model support. Full auto-buildless wheel install from pypi. Reduce max cpu memory usage by >20% during quantization. 100% CI model/feature coverage. Updated hf-integration support with latest transformers.

Full deprecations: liger-kernel support and exllama v1 quant kernel.

Fix deprecated by @CSY-ModelCloud in #447
[COMPAT] [FIX] vllm params by @ZYC-ModelCloud in #448
add estimate-vram by @PZS-ModelCloud in #452
add field uri by @ZYC-ModelCloud in #449
auto infer model base name from model files by @ZYC-ModelCloud in #451
remove exllama v1 by @PZS-ModelCloud in #453
[SECURITY] drop support of loading unsafe .bin weights by @ZYC-ModelCloud in #460
[MODEL] add granite support by @LRL-ModelCloud in #466
Split base.py file by @ZYC-ModelCloud in #465
Move save_quantized function into saver.py by @ZYC-ModelCloud in #467
remove deprecated exllama v1 code by @Qubitium in #473
[MISC] move model def file to model_def folder by @PZS-ModelCloud in #479
[FIX] Fix unit test by @PZS-ModelCloud in #480
Download whl in setup.py by @CSY-ModelCloud in #481
[Fix] cpu memory leak by @ZX-ModelCloud in #485
[CI] set ninja threads to 4 by @CSY-ModelCloud in #487
[FIX] sharded model loading error by @ZX-ModelCloud in #490
add internlm test by @PZS-ModelCloud in #491
remove needless function by @ZYC-ModelCloud in #494
Fix unit test by @ZYC-ModelCloud in #495
[FIX] fix test_integration by @PZS-ModelCloud in #497
[Test] add codegen and xverse test by @PZS-ModelCloud in #496

Full Changelog: v1.0.9...v1.1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTQModel v1.1.0

What's Changed

Contributors