Support for `targets` and `ignore` in `Sparsity Compressors` #182

rahul-tuli · 2024-10-06T23:25:57Z

This PR introduces support for using targets and ignore in sparsity compressors. It has been tested against the llm-compressor repository at commit a47137d8 (on main).

Changes Made

Cleaned up several utilities and added corresponding tests.
Updated the BaseSparsity.compress(...) methods to accept a new compression_targets argument.
Enhanced the ModelCompressor to directly populate the compression_targets argument.

Verification

The functionality was verified using the following script:

Verification Script

"""
Usage: python verification.py
Tested against llm-compressor commit a47137d8
"""


from transformers import AutoTokenizer, AutoModelForCausalLM
from llmcompressor.transformers.compression.sparsity_config import SparsityConfigMetadata
from llmcompressor.transformers import oneshot
from safetensors import safe_open

MODEL_ID = "nm-testing/llama2.c-stories42M-pruned2.4"

def check_first_layer(save_dir, check_compressed=True):
    with safe_open(f"{save_dir}/model.safetensors", framework="pt", device=0) as f:
        layer_0_keys = [key for key in f.keys() if "model.layers.0" in key]
        if check_compressed:
            assert any("compressed" in key for key in layer_0_keys), "First layer is not compressed as expected."
        else:
            assert not any("compressed" in key for key in layer_0_keys), "First layer is compressed unexpectedly."

def main():
    model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", torch_dtype="auto")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    # Apply oneshot to wrap save_pretrained 
    oneshot(model=model)

    # Compress and save the model
    sparsity_config = SparsityConfigMetadata.from_pretrained(model, compress=True)
    save_dir_compressed = f"{MODEL_ID.split('/')[1]}-2of4-compressed"
    model.save_pretrained(save_dir_compressed, sparsity_config=sparsity_config)
    tokenizer.save_pretrained(save_dir_compressed)

    # Verify the first layer is compressed
    check_first_layer(save_dir_compressed, check_compressed=True)

    # Ignore the first layer and save the model again
    sparsity_config.ignore.append("re:model.layers.0.*")
    save_dir_ignored = f"{MODEL_ID.split('/')[1]}-2of4-ignored-first-layer"
    model.save_pretrained(save_dir_ignored, sparsity_config=sparsity_config)
    tokenizer.save_pretrained(save_dir_ignored)

    # Verify the first layer is not compressed
    check_first_layer(save_dir_ignored, check_compressed=False)

if __name__ == "__main__":
    main()

The script passes successfully without any assertions.

Script Output

2024-11-27T10:18:45.295223+0000 | one_shot | INFO - *** One Shot ***
2024-11-27T10:18:45.295382+0000 | initialize | INFO - Compression lifecycle initialized for 0 modifiers
2024-11-27T10:18:45.295428+0000 | finalize | INFO - Compression lifecycle finalized for 0 modifiers
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1068.21it/s]
Checking whether model follows 2:4 sparsity structure: 100%|████████████████████████████████████████████████████████████████| 57/57 [00:00<00:00, 1572.75it/s]
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1562.54it/s]
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1205.22it/s]
2024-11-27T10:18:46.694477+0000 | get_serialized_recipe | WARNING - Recipe not found in session - it may have been reset
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1892.06it/s]
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1701.64it/s]
2024-11-27T10:18:47.582677+0000 | get_serialized_recipe | WARNING - Recipe not found in session - it may have been reset

tests/test_quantization/lifecycle/test_apply.py

src/compressed_tensors/compressors/sparse_compressors/base.py

kylesayrs

I'm not sure how/if this is related to #822 (it's listed as a dependency)

Doesn't this list of targets need to be accounted for during decompression?
Don't these changes throw away any weights which are not targeted for sparse compression?

src/compressed_tensors/quantization/lifecycle/apply.py

src/compressed_tensors/compressors/model_compressors/model_compressor.py

src/compressed_tensors/quantization/lifecycle/apply.py

src/compressed_tensors/compressors/model_compressors/model_compressor.py

src/compressed_tensors/compressors/sparse_compressors/base.py

rahul-tuli · 2024-10-23T14:52:45Z

I'm not sure how/if this is related to #822 (it's listed as a dependency)

Doesn't this list of targets need to be accounted for during decompression?

Don't these changes throw away any weights which are not targeted for sparse compression?

Point 1: Decompression takes care of that using COMPRESSION_PARAM_NAMES
Point 2: Fixed

It is listed as a dependency for #822 because without this we cannot enable sparse compression + quantization compression. These changes are needed for #822 to work fine.

src/compressed_tensors/quantization/lifecycle/apply.py

src/compressed_tensors/utils/safetensors_load.py

tests/test_quantization/lifecycle/test_apply.py

src/compressed_tensors/compressors/sparse_compressors/base.py

kylesayrs

LGTM!

src/compressed_tensors/compressors/sparse_compressors/base.py

src/compressed_tensors/utils/safetensors_load.py

src/compressed_tensors/quantization/lifecycle/apply.py

src/compressed_tensors/compressors/sparse_compressors/base.py

…ly.py Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

Add: tests for get_nested_weight_mappings Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

rahul-tuli mentioned this pull request Oct 7, 2024

Enable Sparse compression vllm-project/llm-compressor#822

Closed

3 tasks

rahul-tuli requested review from mgoin, kylesayrs, dsikka and horheynm October 7, 2024 13:59

rahul-tuli marked this pull request as ready for review October 7, 2024 13:59

kylesayrs reviewed Oct 7, 2024

View reviewed changes

tests/test_quantization/lifecycle/test_apply.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 7, 2024

View reviewed changes

src/compressed_tensors/compressors/sparse_compressors/base.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 7, 2024

View reviewed changes

src/compressed_tensors/compressors/sparse_compressors/base.py Outdated Show resolved Hide resolved

kylesayrs reviewed Oct 7, 2024

View reviewed changes

dsikka reviewed Oct 7, 2024

View reviewed changes

src/compressed_tensors/quantization/lifecycle/apply.py Outdated Show resolved Hide resolved

src/compressed_tensors/compressors/model_compressors/model_compressor.py Outdated Show resolved Hide resolved

markurtz self-requested a review October 14, 2024 13:35

markurtz requested changes Oct 18, 2024

View reviewed changes

rahul-tuli force-pushed the add-targets-and-ignore-support branch from 400c6c3 to e5bfd8a Compare October 23, 2024 14:50

rahul-tuli force-pushed the add-targets-and-ignore-support branch from 1a7cdba to a528334 Compare November 27, 2024 10:14

rahul-tuli requested review from markurtz November 27, 2024 12:05

This was referenced Nov 27, 2024

Bump version to v0.8.1 #216

Closed

Add 24 compressor #167

Closed

kylesayrs previously approved these changes Nov 27, 2024

View reviewed changes

rahul-tuli dismissed kylesayrs’s stale review via c54699a November 27, 2024 18:09

kylesayrs previously approved these changes Nov 27, 2024

View reviewed changes

dsikka reviewed Nov 28, 2024

View reviewed changes

rahul-tuli dismissed kylesayrs’s stale review via 6936121 December 3, 2024 07:43

horheynm reviewed Dec 3, 2024

View reviewed changes

src/compressed_tensors/compressors/sparse_compressors/base.py Show resolved Hide resolved

kylesayrs previously approved these changes Dec 4, 2024

View reviewed changes

rahul-tuli dismissed kylesayrs’s stale review via 9eeede7 December 17, 2024 20:14

rahul-tuli added 2 commits December 17, 2024 20:16

Add expand_targets and is_target functions for target matching in app…

0ff87df

…ly.py Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

Update: get_nested_weight_mappings to optionally return other params

b23cc21

Add: tests for get_nested_weight_mappings Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

rahul-tuli added 5 commits December 17, 2024 20:16

Enable: Sparse Compression with targets and ignores

ce5c586

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>

Bugfix a typo

3066ea6

Address: Review comments from @kylesayrs

4a6e876

review suggestions from @dsikka

a189dd1

Update a docstring

f80a45e

rahul-tuli force-pushed the add-targets-and-ignore-support branch from 9eeede7 to f80a45e Compare December 17, 2024 20:17

rahul-tuli closed this Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for `targets` and `ignore` in `Sparsity Compressors` #182

Support for `targets` and `ignore` in `Sparsity Compressors` #182

rahul-tuli commented Oct 6, 2024 •

edited

Loading

kylesayrs left a comment

rahul-tuli commented Oct 23, 2024 •

edited

Loading

kylesayrs left a comment

Support for targets and ignore in Sparsity Compressors #182

Support for targets and ignore in Sparsity Compressors #182

Conversation

rahul-tuli commented Oct 6, 2024 • edited Loading

Changes Made

Verification

kylesayrs left a comment

Choose a reason for hiding this comment

rahul-tuli commented Oct 23, 2024 • edited Loading

kylesayrs left a comment

Choose a reason for hiding this comment

Support for `targets` and `ignore` in `Sparsity Compressors` #182

Support for `targets` and `ignore` in `Sparsity Compressors` #182

rahul-tuli commented Oct 6, 2024 •

edited

Loading

rahul-tuli commented Oct 23, 2024 •

edited

Loading