Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naive Run Compressed Support #109

Merged
merged 32 commits into from
Aug 30, 2024
Merged

Naive Run Compressed Support #109

merged 32 commits into from
Aug 30, 2024

Conversation

Satrat
Copy link

@Satrat Satrat commented Jul 11, 2024

Summary of Changes

  • Refactor compressors to run on pytorch modules and single layer state_dicts. Only the base class implements the loop through the state dict
  • Add new CompressedLinear layer type for running in compressed mode
  • Rather than decompressing on model load, we decompress on each forward pass. This can later be swapped out for specific kernels but decompressing will be the fallback case
  • Allow for being able to run a forward pass even if a zero point isn't loaded (its assumed to be 0)

Example

from transformers import AutoTokenizer
from llmcompressor.transformers import SparseAutoModelForCausalLM
import torch

model_dir = "nm-testing/Meta-Llama-3-8B-Instruct-fp8-compressed"
model = SparseAutoModelForCausalLM.from_pretrained(model_dir, torch_dtype=torch.float16, device_map="auto", run_compressed=True)

tokenizer = AutoTokenizer.from_pretrained(model_dir)
sample_input = ["I love 4 bit quantization because"]
inputs = tokenizer(sample_input, return_tensors="pt").to(model.device)
generated_ids = model.generate(**inputs, max_length=50)
outputs = tokenizer.batch_decode(generated_ids)
print(outputs)

Output: ["<|begin_of_text|>I love 4 bit quantization because it's so simple and yet so effective. It's a great example of how you can achieve a good balance between complexity and performance in a model.\n\nI think it's also worth noting that 4-bit"]

@Satrat Satrat changed the title Sa/naive run compressed [Don't Merge Yet] Naive Run Compressed Jul 11, 2024
@Satrat Satrat changed the title [Don't Merge Yet] Naive Run Compressed Naive Run Compressed Support Aug 7, 2024
@Satrat Satrat requested review from bfineran and dsikka and removed request for bfineran August 12, 2024 15:53
bfineran
bfineran previously approved these changes Aug 28, 2024
Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending unresolved comment

@Satrat Satrat merged commit fe01c5c into main Aug 30, 2024
1 check passed
@Satrat Satrat deleted the sa/naive_run_compressed branch August 30, 2024 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants