Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for targets and ignore in Sparsity Compressors #182

Closed
wants to merge 7 commits into from

Conversation

rahul-tuli
Copy link
Member

@rahul-tuli rahul-tuli commented Oct 6, 2024

This PR introduces support for using targets and ignore in sparsity compressors. It has been tested against the llm-compressor repository at commit a47137d8 (on main).

Changes Made

  • Cleaned up several utilities and added corresponding tests.
  • Updated the BaseSparsity.compress(...) methods to accept a new compression_targets argument.
  • Enhanced the ModelCompressor to directly populate the compression_targets argument.

Verification

The functionality was verified using the following script:

Verification Script
"""
Usage: python verification.py
Tested against llm-compressor commit a47137d8
"""


from transformers import AutoTokenizer, AutoModelForCausalLM
from llmcompressor.transformers.compression.sparsity_config import SparsityConfigMetadata
from llmcompressor.transformers import oneshot
from safetensors import safe_open

MODEL_ID = "nm-testing/llama2.c-stories42M-pruned2.4"

def check_first_layer(save_dir, check_compressed=True):
    with safe_open(f"{save_dir}/model.safetensors", framework="pt", device=0) as f:
        layer_0_keys = [key for key in f.keys() if "model.layers.0" in key]
        if check_compressed:
            assert any("compressed" in key for key in layer_0_keys), "First layer is not compressed as expected."
        else:
            assert not any("compressed" in key for key in layer_0_keys), "First layer is compressed unexpectedly."

def main():
    model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", torch_dtype="auto")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

    # Apply oneshot to wrap save_pretrained 
    oneshot(model=model)

    # Compress and save the model
    sparsity_config = SparsityConfigMetadata.from_pretrained(model, compress=True)
    save_dir_compressed = f"{MODEL_ID.split('/')[1]}-2of4-compressed"
    model.save_pretrained(save_dir_compressed, sparsity_config=sparsity_config)
    tokenizer.save_pretrained(save_dir_compressed)

    # Verify the first layer is compressed
    check_first_layer(save_dir_compressed, check_compressed=True)

    # Ignore the first layer and save the model again
    sparsity_config.ignore.append("re:model.layers.0.*")
    save_dir_ignored = f"{MODEL_ID.split('/')[1]}-2of4-ignored-first-layer"
    model.save_pretrained(save_dir_ignored, sparsity_config=sparsity_config)
    tokenizer.save_pretrained(save_dir_ignored)

    # Verify the first layer is not compressed
    check_first_layer(save_dir_ignored, check_compressed=False)

if __name__ == "__main__":
    main()

The script passes successfully without any assertions.

Script Output
2024-11-27T10:18:45.295223+0000 | one_shot | INFO - *** One Shot ***
2024-11-27T10:18:45.295382+0000 | initialize | INFO - Compression lifecycle initialized for 0 modifiers
2024-11-27T10:18:45.295428+0000 | finalize | INFO - Compression lifecycle finalized for 0 modifiers
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1068.21it/s]
Checking whether model follows 2:4 sparsity structure: 100%|████████████████████████████████████████████████████████████████| 57/57 [00:00<00:00, 1572.75it/s]
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1562.54it/s]
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1205.22it/s]
2024-11-27T10:18:46.694477+0000 | get_serialized_recipe | WARNING - Recipe not found in session - it may have been reset
Calculating model sparsity: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1892.06it/s]
Compressing model: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:00<00:00, 1701.64it/s]
2024-11-27T10:18:47.582677+0000 | get_serialized_recipe | WARNING - Recipe not found in session - it may have been reset

@rahul-tuli rahul-tuli marked this pull request as ready for review October 7, 2024 13:59
Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how/if this is related to #822 (it's listed as a dependency)

  1. Doesn't this list of targets need to be accounted for during decompression?
  2. Don't these changes throw away any weights which are not targeted for sparse compression?

@markurtz markurtz self-requested a review October 14, 2024 13:35
@rahul-tuli rahul-tuli force-pushed the add-targets-and-ignore-support branch from 400c6c3 to e5bfd8a Compare October 23, 2024 14:50
@rahul-tuli
Copy link
Member Author

rahul-tuli commented Oct 23, 2024

I'm not sure how/if this is related to #822 (it's listed as a dependency)

  1. Doesn't this list of targets need to be accounted for during decompression?
  2. Don't these changes throw away any weights which are not targeted for sparse compression?

Point 1: Decompression takes care of that using COMPRESSION_PARAM_NAMES
Point 2: Fixed

It is listed as a dependency for #822 because without this we cannot enable sparse compression + quantization compression. These changes are needed for #822 to work fine.

@rahul-tuli rahul-tuli force-pushed the add-targets-and-ignore-support branch from 1a7cdba to a528334 Compare November 27, 2024 10:14
This was referenced Nov 27, 2024
kylesayrs
kylesayrs previously approved these changes Nov 27, 2024
src/compressed_tensors/utils/safetensors_load.py Outdated Show resolved Hide resolved
src/compressed_tensors/utils/safetensors_load.py Outdated Show resolved Hide resolved
tests/test_quantization/lifecycle/test_apply.py Outdated Show resolved Hide resolved
tests/test_quantization/lifecycle/test_apply.py Outdated Show resolved Hide resolved
kylesayrs
kylesayrs previously approved these changes Nov 27, 2024
Copy link
Contributor

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

kylesayrs
kylesayrs previously approved these changes Dec 4, 2024
…ly.py

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
Add: tests for get_nested_weight_mappings

Signed-off-by: Rahul Tuli <rahul@neuralmagic.com>
@rahul-tuli rahul-tuli force-pushed the add-targets-and-ignore-support branch from 9eeede7 to f80a45e Compare December 17, 2024 20:17
@rahul-tuli rahul-tuli closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants