-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] DisableKVCache Context #834
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test?
There are likely many ways to achieve this, including using the |
Pending testing with llava models |
Works with llava models from llmcompressor.utils import DisableKVCache
from transformers import AutoProcessor, LlavaForConditionalGeneration
model = LlavaForConditionalGeneration.from_pretrained("llava-hf/llava-1.5-7b-hf")
with DisableKVCache(model):
assert model.config.text_config.use_cache == False |
9c70cf6
to
efaa48e
Compare
* load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
… steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* Implement iterative parameter updating Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Use weight parameter of linear layer (#836) * use weight parameter of linear layer * add weight attribute check Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Rename files to remove colons (#846) * rename files to remove colons Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] Workaround tied tensors bug (#659) * load offload state dict * add test * remove merge duplication * prepare to fix tie_word_embeddings * add full tests * patch second bug * comment out failing tests, point to next pr * link to issue * accomodate offloaded models in test * add back passing test * WIP * add error if not in expected list * apply style * update passing failing list * add shared tensors tests * clean up * add comment with link * make failing tests a todo * Remove failing tests * explicitly set safe_serialization * separate out gpu tests, apply style --------- Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * only untie word embeddings (#839) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * check for config hidden size (#840) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use float32 for Hessian dtype (#847) * use float32 for hessian dtype * explicitly set inp dtype as well * float precision for obcq hessian Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * GPTQ: Depreciate non-sequential update option (#762) * remove from gptq, apply style * remove instances of sequential_update argument in GPTQ tests * update examples * update example tests * documentation, remove from example * apply style * revert back to auto type * apply style --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Typehint nits (#826) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [ DOC ] Remove version restrictions in W8A8 exmaple (#849) The latest compressored-tensor 0.8.0 removed some API, https://github.com/neuralmagic/compressed-tensors/pull/156/files If installed the older llmcompressor from pip, it would throw the error like: ``` ImportError: cannot import name 'update_layer_weight_quant_params' from 'compressed_tensors.quantization' ``` Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix inconsistence (#80) Use group strategy with 128 group size instead of channel Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * 2of4 Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * revert change to unrelated example Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * rename test file Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * fix fwd func call (#845) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cover all 3.9-3.12 in commit testing (#864) Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Add marlin-24 recipe/configs for e2e testing (#866) * add marlin-24 recipe/configs for e2e testing * update Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Bugfix] onload during sparsity calculation (#862) * onload during sparsity calculation * fix sparsity --------- Co-authored-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix HFTrainer overloads (#869) * add missing arguments Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * names Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * style Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * named args all around Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Support Model Offloading Tied Tensors Patch (#872) * update parameter of offloaded modules Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * in place function Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * add advice about dealing with non-invertable hessians (#875) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * seed commit workflow (#877) * seed commit workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * tickle Signed-off-by: andy-neuma <andy@neuralmagic.com> * let's give it a try Signed-off-by: andy-neuma <andy@neuralmagic.com> * whitespace Signed-off-by: andy-neuma <andy@neuralmagic.com> * delete unneeded workflow Signed-off-by: andy-neuma <andy@neuralmagic.com> * adjust trigger Signed-off-by: andy-neuma <andy@neuralmagic.com> --------- Signed-off-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` (#837) * update functioon * wip * clean-up; fix imports * clean-up * more clean-up * bug fix * update for kvcache * get kv_cache to work * docstring * fix comment * fix condition for dynamic * update * update tests * add observer tests * add flake8 skip * apply updated mse fixes * fix import * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * Update src/llmcompressor/modifiers/quantization/calibration.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * PR comments * clean-up * move hook check to observer call * update * separate out calibration step --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * WIP, observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use minmax observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Bugfix get observer from name (#883) Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> * BugFix: Fix Sparsity Reload Testing (#882) * fix * fix remaining test cases * add comments * fix Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Move config["testconfig_path"] assignment (#895) * Use custom unique test names for e2e tests (#892) * Include `testconfig_path` in parsed config data Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use custom unique names for e2e tests Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Revert "Use custom unique test names for e2e tests (#892)" (#893) This reverts commit 10facf2. Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Move config["testconfig_path"] assignment Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> * Use a function name generator for e2e test names Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> --------- Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * cap accelerate version to avoid bug (#897) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Fix observing offloaded weight (#896) * load weight within onloading Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * remove moving activation to execution device, since this is already done since activation calibration always happens within forward pass Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * Update image in README.md (#861) Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> * use user-specified observer Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> --------- Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Signed-off-by: andy-neuma <andy@neuralmagic.com> Signed-off-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Kyle Sayers <kylesayers@sophon-3.mynetworksettings.com> Co-authored-by: Kyle Sayers <kyle@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Jincheng Miao <jincheng.miao@intel.com> Co-authored-by: 黄石 <yzlnew@gmail.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuangnm <dhuang@MacBook-Pro-2.local> Co-authored-by: Andy Linfoot <78757007+andy-neuma@users.noreply.github.com> Co-authored-by: andy-neuma <andy@neuralmagic.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Domenic Barbuzzi <domenic@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* fix device map * expose one gpu for finetune; update to use a better moodel and show generation for completeness * more fixes * typo fix * dont just run unit tests Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* update Signed-off-by: Dipika <dipikasikka1@gmail.com> * quality --------- Signed-off-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
* src and tests updates * save model if output_dir is provided * save model if provided as a string * typo * save if model was provided as a string or custom output_dir was set * comments * save tokenizer also if model passed as a string or custom outputdir provided * revert to True * merge main * merge main * fix transformers tests * Update tests/llmcompressor/transformers/obcq/test_consecutive_runs.py Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> * lint: * fix bug * fix bug * comments * comments * fix saving bug on example script and comments * fix test failure * comments * comments * comments * lint * fix test_quantization.py * fix bugs * revert to default * revert to default * draft * fix test * logging output fix --------- Co-authored-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
1630cd6
to
2d6ad4a
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
DCO can't be fixed in usual way for this branch, it'll have to be manually approved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks for update - leaving to @dsikka for final pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. A quick test would be nice.
@dsikka I’ve tested with a few models locally, and this is a precursor to another PR which I will add a test for |
Purpose
Changes
use_cache
andtext_config.use_cache
in cases likeMllamaConfig
Testing
meta-llama/Llama-3.2-11B-Vision-Instruct
would lead to attribute error. Now runs normally withmodel.config.text_config.use_cache == False
meta-llama/Meta-Llama-3-8B-Instruct