Fixing linear_activation_tensor dynamic quant #622

HDCharles · 2024-08-07T00:05:08Z

Summary: dynamic quant was broken for generate due to no repr function

Test Plan: sh benchmarks.sh

20240806170037, tok/s= 9.54, mem/s= 63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: dynamic quant was broken for generate due to no repr function Test Plan: sh benchmarks.sh 20240806170037, tok/s= 9.54, mem/s= 63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8 Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-08-07T00:05:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/622

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b8099aa with merge base 04e5a9e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: dynamic quant was broken for generate due to no repr function Test Plan: sh benchmarks.sh 20240806170037, tok/s= 9.54, mem/s= 63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8 Reviewers: Subscribers: Tasks: Tags:

* add desktop.json * add fast * remove embedding

* executable README * fix title of CI workflow * markup commands in markdown * extend the markup-markdown language * Automatically identify cuda from nvidia-smi in install-requirements (pytorch#606) * Automatically identify cuda from nvidia-smi in install-requirements * Update README.md --------- Co-authored-by: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com> * Unbreak zero-temperature sampling (pytorch#599) Fixes pytorch#581. * Improve process README * [retake] Add sentencepiece tokenizer (pytorch#626) * Add sentencepiece tokenizer Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Add white space Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle white space: Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Handle control ids Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * More cleanup Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Lint Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use unique_ptr Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Use a larger runner Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Debug Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: * Cleanup * Update install_utils.sh to use python3 instead of python (pytorch#636) As titled. On some devices `python` and `python3` are pointing to different environments so good to unify them. * Fix quantization doc to specify dytpe limitation on a8w4dq (pytorch#629) Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Co-authored-by: Kimish Patel <kimishpatel@fb.com> * add desktop.json (pytorch#622) * add desktop.json * add fast * remove embedding * improvements * update readme from doc branch * tab/spc * fix errors in updown language * fix errors in updown language, and [skip]: begin/end * fix errors in updown language, and [skip]: begin/end * a storied run * stories run on readme instructions does not need HF token * increase timeout * check for hang un hf_login * executable README improvements * typo * typo --------- Co-authored-by: Ian Barber <ian.barber@gmail.com> Co-authored-by: Scott Wolchok <swolchok@meta.com> Co-authored-by: Mengwei Liu <larryliu0820@users.noreply.github.com> Co-authored-by: Kimish Patel <kimishpatel@fb.com> Co-authored-by: Scott Roy <161522778+metascroy@users.noreply.github.com>

HDCharles requested a review from jerryzh168 August 7, 2024 00:05

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 7, 2024

jerryzh168 approved these changes Aug 7, 2024

View reviewed changes

HDCharles merged commit c2f5399 into main Aug 7, 2024
13 checks passed

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

add desktop.json (pytorch#622)

6c0b25f

* add desktop.json * add fast * remove embedding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing linear_activation_tensor dynamic quant #622

Fixing linear_activation_tensor dynamic quant #622

HDCharles commented Aug 7, 2024

pytorch-bot bot commented Aug 7, 2024 •

edited

Loading

Fixing linear_activation_tensor dynamic quant #622

Fixing linear_activation_tensor dynamic quant #622

Conversation

HDCharles commented Aug 7, 2024

pytorch-bot bot commented Aug 7, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/622

✅ No Failures

pytorch-bot bot commented Aug 7, 2024 •

edited

Loading