Skip to content

V3rdant-byte/partial-head-quantization

Repository files navigation

Partial Head Quantization

Preface

This is a repo for an undergraduate thesis by Jacob Yang at University of British Columbia supervised by Professor Prashant Nair and a PhD student Muhammad Abdullah Adnan. The thesis in pdf is in the root folder ./CPEN499 Jacob's thesis.pdf.

How to set up environment

I encountered some config problem when setting up smoothquant environment. Here is how I solve those problems.

CUDA driver 12.3

CUDA Toolkit 11.6 Tried 12.3, 11.3, 11.8 cuda toolkit cannot be easily downgraded. just reinstall the system

Anaconda

Clone Smoothquant

Install smoothquant following the readme

Use pytorch 1.2.1 and cuda 11.6 Install torch-int following the readme

Install cutlass check out to feature/2.10

Some issue fixes

Convert the demo jupyter notebook smoothquant_opt_real_int8_demo.ipynb into python script and then run it in the virtual environment with smoothquant and torch-int installed.

How to generate act scales

According to Issue 60 in the smoothquant repo, generate activation scales with lambda dataset “ dataset = load_dataset(‘lambda’, split = ‘validation[:1000]’) ” in smoothquant/calibration.py get_act_scales

Disable the dataset_path for the generate_act_scales.py and run the default setting

Change the path to the newly generated activation scales in test_quant.py

Usage

To run the test, run

python ./examples/test_quant.py

There are an argument to modify in the examples/test_quant.py. For example, how many less significant heads to quantize.

Also, you need to add the head matrices to one of the outputs of the attention layers for the python code to run. Add the line

attn_weights_reshaped = attn_output

In between the line

attn_output = attn_output.view(bsz, self.num_heads, tgt_len, self.head_dim)

and

attn_output = attn_output.transpose(1, 2)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages