[SparseAutoModelForCausalLM Deprecation] Feature change #881

horheynm · 2024-10-31T14:02:38Z

SUMMARY:
Deprecate SparseAutoModelForCausalLM and use AutoModelForCausalLM

SparseAutoModelForCausalLM was responsible for adding custom load and save logic.
Load logic is no longer needed - taken care of in HF transfomers - HFQuantizer
Save logic is kept, wrapping save_pretrained.
Two saving cases -> fsdp model and non-fsdp model.

Key changes:

models' save_pretrained in wrapped if and only if the model uses the optimization pipeline (train, oneshot, etc). If not, then no changes so vanilla save_pretained will be used.
if model is passed as a string, then model will be automatically saved in oneshot, elif passed as a pretrained_model instance, then users can choose to save it manually using save_pretrained
SparseAutoModelForCausalLM and SparseAutoModel are now deprecated - will trigger a warning when used. Will be removed in the appropriate feature release

TEST PLAN:
"please outline how the changes were tested"

src/llmcompressor/transformers/finetune/README.md

dsikka

Is the feature branch opened up in a PR so that testing can run?

dsikka

Couple of key points:

Why are we changing/commenting valid test cases?
From our offline discussion, keeping output_dir is fine, we just want to only be saving once, which it seems like we're partly doing now with these changes (i.e only save as part of the one_shot call if an output_dir is defined). We should still be able to provide a string for the model/not require just a model object.
I still need to pull these changes to test locally.

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

src/llmcompressor/transformers/finetune/text_generation.py

horheynm · 2024-11-04T14:12:16Z

@dsikka
This pr will be merged to a feature branch where tests will run. Tests runner there can validate.

…rovided

dsikka

From our offline discussion, keeping output_dir is fine, we just want to only be saving once, which it seems like we're partly doing now with these changes (i.e only save as part of the one_shot call if an output_dir is defined). We should still be able to provide a string for the model/not require just a model object

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

tests/llmcompressor/transformers/obcq/test_mask_structure_preservation.py

tests/llmcompressor/transformers/gptq/test_oneshot.py

…tion

…precate-SparseAutoModelForCausalLM/deprecation

…tion

…thub.com:vllm-project/llm-compressor into deprecate-SparseAutoModelForCausalLM/deprecation

tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

…tion

rahul-tuli

Lgtm pending @dsikka's comments!

kylesayrs · 2024-11-12T16:39:18Z

tests/llmcompressor/transformers/compression/test_run_compressed.py
ests/llmcompressor/transformers/finetune/test_finetune_no_recipe_custom_dataset.py

These tests are newly failing

…tion

horheynm · 2024-11-12T19:04:22Z

tests/llmcompressor/transformers/compression/test_run_compressed.py ests/llmcompressor/transformers/finetune/test_finetune_no_recipe_custom_dataset.py

These tests are newly failing

Good catch, thank you

dsikka · 2024-11-12T19:49:40Z

also seems like sparsification test now have an issue with loading some of the weights

kylesayrs

Looks good to me, I made a few comments about places to investigate after this lands

tests/llmcompressor/transformers/compression/test_run_compressed.py

kylesayrs · 2024-11-14T21:31:07Z

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

@@ -75,50 +146,35 @@ def save_pretrained_wrapper(
            # https://github.com/huggingface/transformers/pull/30488
            transformers.modeling_utils.dtype_byte_size = new_dtype_byte_size

-            model = model_ref()


Note that we should investigate why we don't need model_ref anymore

src/llmcompressor/transformers/sparsification/compressed_tensors_utils.py

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py

…tion

dsikka

LGTM. Just last two comments/questions

src/llmcompressor/transformers/finetune/README.md

src/llmcompressor/transformers/finetune/runner.py

dsikka

Nice job on getting all the flows working.

* src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * clearer warning Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use Linear default Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update accelerate version (#899) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * 2of4 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * revert change to unrelated example Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * rename test file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * names Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * named args all around Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * in place function Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * tickle Signed-off-by: andy-neuma <andy@neuralmagic.com> * let's give it a try Signed-off-by: andy-neuma <andy@neuralmagic.com> * whitespace Signed-off-by: andy-neuma <andy@neuralmagic.com> * delete unneeded workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * adjust trigger Signed-off-by: andy-neuma <andy@neuralmagic.com> --------- Signed-off-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * WIP, observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use minmax observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use user-specified observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <dipikasikka1@gmail.com> * quality --------- Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bump version (#907) Signed-off-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add default mappings (#906) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * correct typo (#888) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: George <george@neuralmagic.com>

* src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* set targets default earlier, remove QuantizationScheme.default_scheme Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * clearer warning Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update docstring, use default factory for mutable default Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use Linear default Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update accelerate version (#899) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * 2of4 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * revert change to unrelated example Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * rename test file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * names Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * named args all around Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * in place function Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * tickle Signed-off-by: andy-neuma <andy@neuralmagic.com> * let's give it a try Signed-off-by: andy-neuma <andy@neuralmagic.com> * whitespace Signed-off-by: andy-neuma <andy@neuralmagic.com> * delete unneeded workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * adjust trigger Signed-off-by: andy-neuma <andy@neuralmagic.com> --------- Signed-off-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * WIP, observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use minmax observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use user-specified observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <dipikasikka1@gmail.com> * quality --------- Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bump version (#907) Signed-off-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add default mappings (#906) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * correct typo (#888) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use default factory, since default does not trigger field validator Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: George <george@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * no cache context Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * support mllamaconfig Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix typo Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add docstring Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * make docstring runnable Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * 2of4 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * revert change to unrelated example Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * rename test file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * names Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * named args all around Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * in place function Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * tickle Signed-off-by: andy-neuma <andy@neuralmagic.com> * let's give it a try Signed-off-by: andy-neuma <andy@neuralmagic.com> * whitespace Signed-off-by: andy-neuma <andy@neuralmagic.com> * delete unneeded workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * adjust trigger Signed-off-by: andy-neuma <andy@neuralmagic.com> --------- Signed-off-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * update accelerate version (#899) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [GPTQ] Iterative Parameter Updating (#863) * Implement iterative parameter updating Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * 2of4 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * revert change to unrelated example Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * rename test file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * names Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * named args all around Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * in place function Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * tickle Signed-off-by: andy-neuma <andy@neuralmagic.com> * let's give it a try Signed-off-by: andy-neuma <andy@neuralmagic.com> * whitespace Signed-off-by: andy-neuma <andy@neuralmagic.com> * delete unneeded workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * adjust trigger Signed-off-by: andy-neuma <andy@neuralmagic.com> --------- Signed-off-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * WIP, observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use minmax observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use user-specified observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Small fixes for release (#901) * fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use smaller portion of dataset (#902) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update example to not fail hessian inversion (#904) * update Signed-off-by: Dipika <dipikasikka1@gmail.com> * quality --------- Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * bump version (#907) Signed-off-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add default mappings (#906) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [SparseAutoModelForCausalLM Deprecation] Feature change (#881) * src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * correct typo (#888) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * print config for better debugging Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: George <george@neuralmagic.com>

src and tests updates

5c8ff83

horheynm changed the title ~~src and tests updates~~ [SparseAutoModelForCausalLM Deprecation] Feature change Oct 31, 2024

kylesayrs reviewed Oct 31, 2024

View reviewed changes

src/llmcompressor/transformers/finetune/README.md Outdated Show resolved Hide resolved

horheynm added 4 commits November 1, 2024 14:07

save model if output_dir is provided

4d9f4df

save model if provided as a string

588ee7e

typo

17d4a9c

save if model was provided as a string or custom output_dir was set

bd98f6d

dsikka reviewed Nov 3, 2024

View reviewed changes

dsikka requested changes Nov 3, 2024

View reviewed changes

comments

51e1ada

save tokenizer also if model passed as a string or custom outputdir p…

7b8247b

…rovided

dsikka reviewed Nov 4, 2024

View reviewed changes

dsikka requested a review from kylesayrs November 4, 2024 14:53

horheynm force-pushed the deprecate-SparseAutoModelForCausalLM/deprecation branch from ccd2d02 to 7b8247b Compare November 4, 2024 15:45

horheynm added 2 commits November 4, 2024 16:10

revert to True

2f6d9ef

revert model to string

0425124

horheynm changed the base branch from deprecate-SparseAutoModelForCausalLM/feature to main November 4, 2024 20:04

horheynm added 7 commits November 4, 2024 15:05

Merge branch 'main' into deprecate-SparseAutoModelForCausalLM/depreca…

ce77540

…tion

merge main

1ec6974

merge main

ff55775

Merge branch 'main' of github.com:vllm-project/llm-compressor into de…

aae73e2

…precate-SparseAutoModelForCausalLM/deprecation

Merge branch 'main' into deprecate-SparseAutoModelForCausalLM/depreca…

a66147e

…tion

fix transformers tests

8d146ad

Merge branch 'deprecate-SparseAutoModelForCausalLM/deprecation' of gi…

d3073fe

…thub.com:vllm-project/llm-compressor into deprecate-SparseAutoModelForCausalLM/deprecation

kylesayrs requested changes Nov 4, 2024

View reviewed changes

tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Outdated Show resolved Hide resolved

horheynm and others added 4 commits November 5, 2024 09:05

Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py

6d02fd5

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

Merge branch 'main' into deprecate-SparseAutoModelForCausalLM/depreca…

30123c3

…tion

lint:

9b869f8

fix bug

1d29417

rahul-tuli reviewed Nov 12, 2024

View reviewed changes

dsikka and others added 2 commits November 12, 2024 11:54

Merge branch 'main' into deprecate-SparseAutoModelForCausalLM/depreca…

0ce72a5

…tion

comments

2602648

horheynm added 8 commits November 13, 2024 14:42

comments

0284f0c

lint

c7c951e

fix test_quantization.py

5a1cc95

fix bugs

acc2776

revert to default

a2992ab

revert to default

4bcbe03

draft

5dbb911

fix test

9418de1

kylesayrs previously approved these changes Nov 14, 2024

View reviewed changes

logging output fix

5bc9a25

horheynm dismissed kylesayrs’s stale review via 5bc9a25 November 15, 2024 14:29

dsikka reviewed Nov 15, 2024

View reviewed changes

tests/llmcompressor/transformers/sparsification/test_compress_tensor_utils.py Show resolved Hide resolved

Merge branch 'main' into deprecate-SparseAutoModelForCausalLM/depreca…

a516950

…tion

dsikka reviewed Nov 17, 2024

View reviewed changes

src/llmcompressor/transformers/finetune/README.md Show resolved Hide resolved

src/llmcompressor/transformers/finetune/runner.py Show resolved Hide resolved

dsikka approved these changes Nov 18, 2024

View reviewed changes

rahul-tuli approved these changes Nov 18, 2024

View reviewed changes

dsikka merged commit 3d60221 into main Nov 18, 2024
6 of 7 checks passed

dsikka deleted the deprecate-SparseAutoModelForCausalLM/deprecation branch November 18, 2024 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SparseAutoModelForCausalLM Deprecation] Feature change #881

[SparseAutoModelForCausalLM Deprecation] Feature change #881

horheynm commented Oct 31, 2024 •

edited

Loading

dsikka left a comment

dsikka left a comment •

edited

Loading

horheynm commented Nov 4, 2024

dsikka left a comment

rahul-tuli left a comment

kylesayrs commented Nov 12, 2024

horheynm commented Nov 12, 2024

dsikka commented Nov 12, 2024

kylesayrs left a comment

kylesayrs Nov 14, 2024

dsikka left a comment

dsikka left a comment

[SparseAutoModelForCausalLM Deprecation] Feature change #881

[SparseAutoModelForCausalLM Deprecation] Feature change #881

Conversation

horheynm commented Oct 31, 2024 • edited Loading

dsikka left a comment

Choose a reason for hiding this comment

dsikka left a comment • edited Loading

Choose a reason for hiding this comment

horheynm commented Nov 4, 2024

dsikka left a comment

Choose a reason for hiding this comment

rahul-tuli left a comment

Choose a reason for hiding this comment

kylesayrs commented Nov 12, 2024

horheynm commented Nov 12, 2024

dsikka commented Nov 12, 2024

kylesayrs left a comment

Choose a reason for hiding this comment

kylesayrs Nov 14, 2024

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

horheynm commented Oct 31, 2024 •

edited

Loading

dsikka left a comment •

edited

Loading