mmlu_zh

MMLU推理脚本

本项目在MMLU上测试了相关模型效果，其中验证集和测试集分别包含1.5K和14.1K个选择题，涵盖57个学科。

接下来将介绍MMLU数据集的预测方法。

数据准备

从MMLU官方指定路径下载评测数据集，并解压至data文件夹：

wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
tar xf data.tar

运行预测脚本

运行以下脚本：

model_path=path/to/chinese-mixtral
output_path=path/to/your_output_dir
data_path=path/to/mmlu-data

cd scripts/mmlu
python eval.py \
    --model_path ${model_path} \
    --data_dir ${data_path} \
    --save_dir ${output_path} \
    --load_in_4bit \
    --ntrain 5 \
    --use_flash_attention_2 \

参数说明

model_path：待评测模型所在目录（完整的Chinese-Mixtral或Chinese-Mixtral-Instruct模型，非LoRA）
data_dir: 评测数据集所在目录
ntrain：指定few-shot实例的数量（5-shot：ntrain=5，0-shot：ntrain=0）
save_dir：指定评测结果的输出路径
do_test：在valid或test集上测试：当do_test=False，在valid集上测试；当do_test=True，在test集上测试
load_in_4bit：以4bit量化形式加载模型
use_flash_attention_2：使用flash-attn2加速推理，否则使用SDPA加速。

评测输出

模型预测完成后，输出日志最后一行会显示最终的分数：Average accuracy: 0.651，生成目录save_dir/results中储存了各学科解码的结果。

中文文档

English Docs

Model Reconstruction
Model Quantization, Inference and Deployment
System Performance
Training Scripts
- Pre-training Scripts
- Instruction Fine-tuning Scripts
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly