eval script fixes #414

HDCharles · 2024-06-21T20:48:49Z

Additional script fixes

Summary:

int4wo had an issue with device swap after quantization api (need to set device before quantize)
int4wo-gptq had an issue with kv_cache model var not being set correctly (now set in GPTQ code)
eval in general had an issue with lm_eval 0.4.2 (updates to tokenizer and eval harness) 
   https://github.com/pytorch/ao/issues/404
[not eval] autoquant docs not showing up (added __all__ to autoquant), made autoquant low level apis priviate

Test Plan:

python eval.py -q int4wo-64 --compile
wikitext: {'word_perplexity,none': 12.842987954345306, 'word_perplexity_stderr,none': 'N/A', 'byte
_perplexity,none': 1.611855472207904, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.68  
87223897240059, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-06-21T20:48:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/414

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 79b0c1d with merge base ef1e745 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: int4wo had an issue with device swap after quantization int4wo-gptq had an issue with.... Test Plan: python eval.py -q int4wo-64 --compile wikitext: {'word_perplexity,none': 12.842987954345306, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.611855472207904, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6887223897240059, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} python eval.py -q int4wo-64-gptq --compile Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Summary: Test Plan: two python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --quantization autoquant --write_result benchmark_results.txt two python eval.py -q int4wo Reviewers: Subscribers: Tasks: Tags:

jerryzh168 · 2024-06-21T22:05:39Z

torchao/_models/llama/eval.py

@@ -60,17 +60,18 @@ def run_evaluation(

    if quantization:
        if "int8wo" in quantization:
-            quantize(model, int8wo())
+            quantize(model, int8_weight_only())


does this need to be compatible with torch 2.3 and below? if so we could define similar helpers:

ao/test/integration/test_integration.py

Lines 99 to 118 in bc8599f

def _int8wo_api(mod):

if TORCH_VERSION_AFTER_2_4:

quantize(mod, int8_weight_only())

unwrap_tensor_subclass(mod)

else:

change_linear_weights_to_int8_woqtensors(mod)

def _int8da_int8w_api(mod):

if TORCH_VERSION_AFTER_2_4:

quantize(mod, int8_dynamic_activation_int8_weight())

unwrap_tensor_subclass(mod)

else:

change_linear_weights_to_int8_dqtensors(mod)

def _int4wo_api(mod):

if TORCH_VERSION_AFTER_2_4:

quantize(mod, int4_weight_only())

unwrap_tensor_subclass(mod)

else:

change_linear_weights_to_int4_woqtensors(mod)

i think its mostly for our own testing, not sure if that's needed

jerryzh168 · 2024-06-21T22:06:30Z

torchao/_models/llama/generate.py

@@ -189,21 +189,21 @@ def main(
    if quantization:
        from torchao.quantization.quant_api import (


can we dedup the quant code in eval and generate.py?

only a bit, its probably more trouble that its worth given the differences and needing to handle autoquant vs gptq ...etc

jerryzh168 · 2024-06-21T22:43:47Z

torchao/_models/llama/generate.py

        if "int4wo" in quantization:
            groupsize=int(quantization.split("-")[-1])
            assert groupsize in [32,64,128,256], f"int4wo groupsize needs to be one of [32,64,128,256] but got {groupsize}"
-            quantize(model, int4wo(groupsize=groupsize))
+            quantize(model, int4_weight_only(groupsize=groupsize))


this is group_size since last update I think

cc @HDCharles

i'll fix it in another PR

Additional script fixes Summary: int4wo had an issue with device swap after quantization api (need to set device before quantize) int4wo-gptq had an issue with kv_cache model var not being set correctly (now set in GPTQ code) eval in general had an issue with lm_eval 0.4.2 (updates to tokenizer and eval harness) pytorch#404 [not eval] autoquant docs not showing up (added __all__ to autoquant), made autoquant low level apis priviate Test Plan: python eval.py -q int4wo-64 --compile wikitext: {'word_perplexity,none': 12.842987954345306, 'word_perplexity_stderr,none': 'N/A', 'byte _perplexity,none': 1.611855472207904, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.68 87223897240059, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 21, 2024

HDCharles requested review from msaroufim and jerryzh168 June 21, 2024 20:49

HDCharles added 4 commits June 21, 2024 13:50

fix use_index_put_for_kv_cache

4b70e56

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

final tests

cf8a362

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

updating quantize apis

79b0c1d

Summary: Test Plan: two python generate.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --compile --quantization autoquant --write_result benchmark_results.txt two python eval.py -q int4wo Reviewers: Subscribers: Tasks: Tags:

HDCharles force-pushed the 070_script_fixes branch from 98aeee5 to 79b0c1d Compare June 21, 2024 21:18

HDCharles requested a review from supriyar June 21, 2024 21:19

HDCharles mentioned this pull request Jun 21, 2024

torchao/_models/llama/eval.py does not work with latest lm_eval #404

Closed

jerryzh168 reviewed Jun 21, 2024

View reviewed changes

HDCharles requested a review from jerryzh168 June 21, 2024 22:40

jerryzh168 reviewed Jun 21, 2024

View reviewed changes

jerryzh168 approved these changes Jun 21, 2024

View reviewed changes

HDCharles merged commit 9dc2c11 into main Jun 21, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval script fixes #414

eval script fixes #414

HDCharles commented Jun 21, 2024 •

edited

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading

jerryzh168 Jun 21, 2024

HDCharles Jun 21, 2024 •

edited

Loading

jerryzh168 Jun 21, 2024

HDCharles Jun 21, 2024 •

edited

Loading

jerryzh168 Jun 21, 2024

supriyar Jun 21, 2024

HDCharles Jun 22, 2024

	def _int8wo_api(mod):
	if TORCH_VERSION_AFTER_2_4:
	quantize(mod, int8_weight_only())
	unwrap_tensor_subclass(mod)
	else:
	change_linear_weights_to_int8_woqtensors(mod)

	def _int8da_int8w_api(mod):
	if TORCH_VERSION_AFTER_2_4:
	quantize(mod, int8_dynamic_activation_int8_weight())
	unwrap_tensor_subclass(mod)
	else:
	change_linear_weights_to_int8_dqtensors(mod)

	def _int4wo_api(mod):
	if TORCH_VERSION_AFTER_2_4:
	quantize(mod, int4_weight_only())
	unwrap_tensor_subclass(mod)
	else:
	change_linear_weights_to_int4_woqtensors(mod)

		@@ -189,21 +189,21 @@ def main(
		if quantization:
		from torchao.quantization.quant_api import (

eval script fixes #414

eval script fixes #414

Conversation

HDCharles commented Jun 21, 2024 • edited Loading

pytorch-bot bot commented Jun 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/414

✅ No Failures

jerryzh168 Jun 21, 2024

Choose a reason for hiding this comment

HDCharles Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 Jun 21, 2024

Choose a reason for hiding this comment

HDCharles Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 Jun 21, 2024

Choose a reason for hiding this comment

supriyar Jun 21, 2024

Choose a reason for hiding this comment

HDCharles Jun 22, 2024

Choose a reason for hiding this comment

HDCharles commented Jun 21, 2024 •

edited

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading

HDCharles Jun 21, 2024 •

edited

Loading

HDCharles Jun 21, 2024 •

edited

Loading