LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Introduction

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment [arXiv]

The Large Language Model Compression Benchmark (LLMCBench) is a rigorously designed benchmark with an in-depth analysis for LLM compression algorithms.

Installation

git clone https://github.com/AboveParadise/LLMCBench.git
cd LLMCBench

conda create -n llmcbench python=3.9
conda activate llmcbench
pip install -r requirements.txt

Usage

This repo contains codes for testing MMLU, MNLI, QNLI, Wikitext2, advGLUE, TruthfulQA datasets and FLOPs.

Testing MMLU

bash scripts/run_mmlu.sh

Overview of Arguments:

--path : Model checkpoint location.
--data_dir : Dataset location.
--ntrain : number of shots.
--seqlen : Denotes the maximum input sequence length for LLM.

Testing MNLI

bash scripts/run_mnli.sh

Overview of Arguments:

--path : Model checkpoint location.
--data_dir : Dataset location.
--ntrain : number of shots.
--seqlen : Denotes the maximum input sequence length for LLM.

Testing QNLI

bash scripts/run_qnli.sh

Overview of Arguments:

--path : Model checkpoint location.
--data_dir : Dataset location.
--ntrain : number of shots.
--seqlen : Denotes the maximum input sequence length for LLM.

Testing Wikitext2

bash scripts/run_wikitext2.sh

Overview of Arguments:

--path : Model checkpoint location.
--device : Denotes which device to place the model onto.
--seqlen : Denotes the maximum input sequence length for the model.

Testing advGLUE

bash scripts/run_advglue.sh

Overview of Arguments:

--path : Model checkpoint location.
--data_file : Dataset file location.
--ntrain : number of shots.
--test_origin : Denotes whether to test on the original GLUE data.

Testing TruthfulQA

bash scripts/run_tqa.sh

Overview of Arguments:

--path : Model checkpoint location.
--presets : Preset to use for prompt generation. Please see tqa_presets.py for options.
--input_path : Dataset file location.
--device : Denotes which device to place the model onto.

Testing FLOPs (floating point operations)

bash scripts/run_flops.sh

Overview of Arguments:

--path : Model checkpoint location.
--seqlen : Denotes the input sequence length for the model.

Acknowledgements

In addition to the code in this repo, we also use EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of language models. (github.com) for evaluation.

Citation

If you find our project useful or relevant to your research, please kindly cite our paper:

@inproceedings{yang2024llmcbench,
  title={LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment},
  author={Yang, Ge and He, Changyi and Guo, Jinyang and Wu, Jianyu and Ding, Yifu and Liu, Aishan and Qin, Haotong and Ji, Pengliang and Liu, Xianglong},
  booktitle={Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
data		data
figs		figs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
categories.py		categories.py
evaluate_advglue.py		evaluate_advglue.py
evaluate_flops.py		evaluate_flops.py
evaluate_mmlu.py		evaluate_mmlu.py
evaluate_mnli.py		evaluate_mnli.py
evaluate_ppl.py		evaluate_ppl.py
evaluate_qnli.py		evaluate_qnli.py
evaluate_tQA.py		evaluate_tQA.py
requirements.txt		requirements.txt
tqa_presets.py		tqa_presets.py
tqa_utilities.py		tqa_utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Introduction

Installation

Usage

Testing MMLU

Overview of Arguments:

Testing MNLI

Overview of Arguments:

Testing QNLI

Overview of Arguments:

Testing Wikitext2

Overview of Arguments:

Testing advGLUE

Overview of Arguments:

Testing TruthfulQA

Overview of Arguments:

Testing FLOPs (floating point operations)

Overview of Arguments:

Acknowledgements

Citation

About

Releases

Packages

Languages

License

AboveParadise/LLMCBench

Folders and files

Latest commit

History

Repository files navigation

LLMCBench: Benchmarking Large Language Model Compression for Efficient Deployment

Introduction

Installation

Usage

Testing MMLU

Overview of Arguments:

Testing MNLI

Overview of Arguments:

Testing QNLI

Overview of Arguments:

Testing Wikitext2

Overview of Arguments:

Testing advGLUE

Overview of Arguments:

Testing TruthfulQA

Overview of Arguments:

Testing FLOPs (floating point operations)

Overview of Arguments:

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages