Skip to content

Latest commit

 

History

History
160 lines (117 loc) · 5.15 KB

evaluate_with_opencompass.md

File metadata and controls

160 lines (117 loc) · 5.15 KB

Evaluate LLMs with OpenCompass

The LLMs accelerated by lmdeploy can be evaluated with OpenCompass.

Setup

In this part, we are going to setup the environment for evaluation.

Install lmdeploy

Install lmdeploy through pip (python 3.8+). If you want to install from source, you can refer to build.md.

pip install lmdeploy

Install OpenCompass

Install OpenCompass from source. Refer to installation for more information.

git clone https://github.com/open-compass/opencompass.git
cd opencompass
pip install -e .

At present, you can check the Quick Start to get to know the basic usage of OpenCompass.

Download datasets

Download the core datasets

# Run in the OpenCompass directory
cd opencompass
wget https://github.com/open-compass/opencompass/releases/download/0.1.8.rc1/OpenCompassData-core-20231110.zip
unzip OpenCompassData-core-20231110.zip

Prepare Evaluation Config

OpenCompass uses the configuration files as the OpenMMLab style. One can define a python config and start evaluating at ease. OpenCompass has supported the evaluation for lmdeploy's TurboMind engine using python API.

Dataset Config

In the home directory of OpenCompass, we are writing the config file $OPENCOMPASS_DIR/configs/eval_lmdeploy.py. We select multiple predefined datasets and import them from OpenCompass base dataset configs as datasets.

from mmengine.config import read_base


with read_base():
    # choose a list of datasets
    from .datasets.mmlu.mmlu_gen_a484b3 import mmlu_datasets
    from .datasets.ceval.ceval_gen_5f30c7 import ceval_datasets
    from .datasets.SuperGLUE_WiC.SuperGLUE_WiC_gen_d06864 import WiC_datasets
    from .datasets.SuperGLUE_WSC.SuperGLUE_WSC_gen_7902a7 import WSC_datasets
    from .datasets.triviaqa.triviaqa_gen_2121ce import triviaqa_datasets
    from .datasets.gsm8k.gsm8k_gen_1d7fe4 import gsm8k_datasets
    from .datasets.race.race_gen_69ee4f import race_datasets
    from .datasets.crowspairs.crowspairs_gen_381af0 import crowspairs_datasets
    # and output the results in a chosen format
    from .summarizers.medium import summarizer

datasets = sum((v for k, v in locals().items() if k.endswith('_datasets')), [])

Model Config

This part shows how to setup model config for LLMs. Let's check some examples:

````{tab} internlm-20b

```python
from opencompass.models.turbomind import TurboMindModel

internlm_20b = dict(
        type=TurboMindModel,
        abbr='internlm-20b-turbomind',
        path="internlm/internlm-20b",  # this path should be same as in huggingface
        engine_config=dict(session_len=2048,
                           max_batch_size=8,
                           rope_scaling_factor=1.0),
        gen_config=dict(top_k=1, top_p=0.8,
                        temperature=1.0,
                        max_new_tokens=100),
        max_out_len=100,
        max_seq_len=2048,
        batch_size=8,
        concurrency=8,
        run_cfg=dict(num_gpus=1, num_procs=1),
    )

models = [internlm_20b]
```

````

````{tab} internlm-chat-20b

For Chat models, you have to pass `meta_template` for chat models. Different Chat models may have different `meta_template` and it's important
to keep it the same as in training settings. You can read [meta_template](https://opencompass.readthedocs.io/en/latest/prompt/meta_template.html) for more information.


```python
from opencompass.models.turbomind import TurboMindModel

internlm_meta_template = dict(round=[
    dict(role='HUMAN', begin='<|User|>:', end='\n'),
    dict(role='BOT', begin='<|Bot|>:', end='<eoa>\n', generate=True),
],
                              eos_token_id=103028)

internlm_chat_20b = dict(
    type=TurboMindModel,
    abbr='internlm-chat-20b-turbomind',
    path='internlm/internlm-chat-20b',
    engine_config=dict(session_len=2048,
                       max_batch_size=8,
                       rope_scaling_factor=1.0),
    gen_config=dict(top_k=1,
                    top_p=0.8,
                    temperature=1.0,
                    max_new_tokens=100),
    max_out_len=100,
    max_seq_len=2048,
    batch_size=8,
    concurrency=8,
    meta_template=internlm_meta_template,
    run_cfg=dict(num_gpus=1, num_procs=1),
    end_str='<eoa>'
)

models = [internlm_chat_20b]

```

````

Note

Execute Evaluation Task

After defining the evaluation config, we can run the following command to start evaluating models. You can check Execution Task for more arguments of run.py.

# in the root directory of opencompass
python3 run.py configs/eval_lmdeploy.py --work-dir ./workdir