Naive Run Compressed Support #109

Satrat · 2024-07-11T13:40:00Z

Summary of Changes

Refactor compressors to run on pytorch modules and single layer state_dicts. Only the base class implements the loop through the state dict
Add new CompressedLinear layer type for running in compressed mode
Rather than decompressing on model load, we decompress on each forward pass. This can later be swapped out for specific kernels but decompressing will be the fallback case
Allow for being able to run a forward pass even if a zero point isn't loaded (its assumed to be 0)

Example

from transformers import AutoTokenizer
from llmcompressor.transformers import SparseAutoModelForCausalLM
import torch

model_dir = "nm-testing/Meta-Llama-3-8B-Instruct-fp8-compressed"
model = SparseAutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.float16, device_map="auto", run_compressed=True)

tokenizer = AutoTokenizer.from_pretrained(model_dir)
sample_input = ["I love 4 bit quantization because"]
inputs = tokenizer(sample_input, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_length=50)
outputs = tokenizer.batch_decode(generated_ids)
print(outputs)

Output: ["<|begin_of_text|>I love 4 bit quantization because it's so simple and yet so effective. It's a great example of how you can achieve a good balance between complexity and performance in a model.\n\nI think it's also worth noting that 4-bit"]

…sors into rename_config

src/compressed_tensors/compressors/base.py

src/compressed_tensors/quantization/lifecycle/apply.py

src/compressed_tensors/linear/compressed_linear.py

src/compressed_tensors/quantization/lifecycle/apply.py

bfineran

LGTM pending unresolved comment

Sara Adkins and others added 14 commits June 4, 2024 15:22

fix constant def

eba80e0

fix constant def

3f558a3

fix config load from dict

3ef57b2

fix assignment of parameters on meta device

b8ade2f

Merge branch 'main' into rename_config

e357ff6

initial commit

27e7d2b

refactor compressor

c55b2e0

fix circular imports

99d3d2d

Merge branch 'rename_config' of github.com:neuralmagic/compressed-ten…

d9c46a4

…sors into rename_config

Merge branch 'main' into rename_config

c8d526f

Merge branch 'rename_config' into sa/naive_run_compressed

e6c1f18

fixes for hfquantizer

be6f58c

fix tests

fc08911

update to compressed state

afa963a

Satrat changed the title ~~Sa/naive run compressed~~ [Don't Merge Yet] Naive Run Compressed Jul 11, 2024

Sara Adkins added 6 commits July 11, 2024 13:56

update imports

06a2eb9

Merge branch 'main' into sa/naive_run_compressed

d61e1c0

fix rebase errors

7acdf2f

fixing tests

c335bb9

fixes

7789554

fix input compression

8dff9f5

Satrat changed the title ~~[Don't Merge Yet] Naive Run Compressed~~ Naive Run Compressed Support Aug 7, 2024

Sara Adkins added 4 commits August 8, 2024 17:48

Merge branch 'main' into sa/naive_run_compressed

50357ab

style

67596da

docstrings and cleanup

d3cc494

Merge branch 'main' into sa/naive_run_compressed

1920dcc

Satrat mentioned this pull request Aug 12, 2024

Naive Run Compressed Pt. 2 vllm-project/llm-compressor#62

Merged

Satrat requested review from bfineran and dsikka and removed request for bfineran August 12, 2024 15:53

Satrat requested review from bfineran, horheynm, kylesayrs, dsikka, rahul-tuli and robertgshaw2-neuralmagic and removed request for dsikka August 12, 2024 15:53

clarity comments

5bdb7d3

bfineran reviewed Aug 19, 2024

View reviewed changes

Sara Adkins added 6 commits August 21, 2024 18:14

Merge branch 'main' into sa/naive_run_compressed

eede715

Merge branch 'main' into sa/naive_run_compressed

ca4fa3e

Merge branch 'main' into sa/naive_run_compressed

db8cec9

PR comments

e5afcd6

fix shape

9d8cf80

fix packed:

8de28ba

bfineran previously approved these changes Aug 28, 2024

View reviewed changes

wrap linear

45609ff

Satrat dismissed bfineran’s stale review via 45609ff August 30, 2024 04:40

Satrat merged commit fe01c5c into main Aug 30, 2024
1 check passed

Satrat deleted the sa/naive_run_compressed branch August 30, 2024 04:44

markmc mentioned this pull request Nov 13, 2024

Fix uninitialized variable in quantized compressors #205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Naive Run Compressed Support #109

Naive Run Compressed Support #109

Satrat commented Jul 11, 2024 •

edited

Loading

bfineran left a comment

Naive Run Compressed Support #109

Naive Run Compressed Support #109

Conversation

Satrat commented Jul 11, 2024 • edited Loading

Summary of Changes

Example

bfineran left a comment

Choose a reason for hiding this comment

Satrat commented Jul 11, 2024 •

edited

Loading