📃 Paper | 🤗 Huggingface | 📭 Contact
- This repository shares the code and models of our latest work on multilingual reasoning. In this work, we present a novel X-English question alignment finetuning step which performs targeted language alignment for best use of the LLMs English reasoning abilities.
- Utilizing this library, you can finetune open-source LLMs into strong multilingual reasoning systems. For example, our fine-tuned LLaMA2-7B/13B achieves superior multilingual performance, significantly outperforming baseline models of equivalent size.
- Overall, our method effectively reduces the performance disparity of LLMs across English and non-English languages, showing a new paradigm to unlock LLM’s capabilities to accompolish multilingual tasks.
Below we present LLMs' average answer accuracy (zero-shot) on multilingual reasoning benchmarks. With question alignment, our fine-tuned LLM surpasses the unaligned counterpart and the translate-training baseline (MathOctopus) by a large margin.
Our model has been open-sourced on Huggingface.
System (13B) | Monolingual Supervision | Multilingual Supervision | mGSM | mSVAMP |
---|---|---|---|---|
QAlign (ours) | MetaMathQA | - | 57.1 | 62.6 |
MetaMath | MetaMathQA | - | 43.9 | 51.8 |
MathOctopus | - | GSM8KInstruct | 45.8 | 46.5 |
WizardMath | GSM8K & MATH | - | 28.3 | 35.7 |
MAmmoTh | MathInstruct | - | 28.9 | 38.6 |
RFT | GSM8k-ScRel | - | 29.5 | 37.1 |
SFT | GSM8K | - | 29.7 | 38.1 |
System (7B) | Monolingual Supervision | Multilingual Supervision | mGSM | mSVAMP |
---|---|---|---|---|
QAlign (ours) | MetaMathQA | - | 49.6 | 57.2 |
MetaMath | MetaMathQA | - | 38.4 | 46.2 |
MathOctopus | - | GSM8KInstruct | 40.0 | 44.1 |
WizardMath | GSM8K & MATH | - | 23.0 | 32.5 |
MAmmoTh | MathInstruct | - | 21.3 | 26.3 |
RFT | GSM8k-ScRel | - | 20.6 | 31.3 |
SFT | GSM8K | - | 22.6 | 30.9 |
In the table below, we list datasets that are used in this project. All datasets are available within this repository, with the exception of MetaMathQA. To use MetaMathQA, please download the file MetaMathQA-395K.json with the provided link and place it in the ./data/metamath directory.
Dataset | Usage | Size | Languages | Path |
---|---|---|---|---|
MetaMathQA | Training | 395,000 | En | ./data/metamath |
GSM8KInstruct | Training | 73,559 | En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es | ./data/gsm8kinstruct |
mGSM | Evaluation | 2,500 | En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es | ./evaluate/scripts/data/mgsm |
mSVAMP | Evaluation | 10,000 | En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es | ./evaluate/scripts/data/msvamp |
To install this repository, follow these steps:
git clone git@github.com:NJUNLP/QAlign.git
cd QAlign
pip install --editable ./
For detailed information about the conda environment, refer to the environment.yaml file.
We develope our training pipeline based on the stanford_alpaca repository.
To perform question alignment and response alignment on pre-trained LLMs, use the following command. Please note that you must replace $PROJECT_PATH with the appropriate paths in finetune.sh or finetune_dp.sh to ensure it is executable. When fine-tuning the 13B model, we utilize DeepSpeed to save memory. You can find our deepspeed configuration in the ./config/ds.json file.
- finetuning LLaMA2-7B
bash ./training_scripts/finetune_llama2_7B.sh
- finetuning LLaMA2-13B
bash ./training_scripts/finetune_llama2_13B.sh
We use the evaluation code provided by Chen et al., which meansures answer accuracy by comparing the last numerical number that appears in the LLM-generated response with the gold answer.
To evaluate the model on mGSM and mSVAMP dataset, use the following command. Please note that you must replace $PROJECT_PATH and $MODEL_PATH with the appropriate paths in the script to ensure it is executable.
- evaluating with mGSM
cd evaluate/scripts
bash evaluate_mgsm.sh
- evaluating with mSVAMP
cd evaluate/scripts
bash evaluate_msvamp.sh
If you find this repository helpful, feel free to cite our paper:
@misc{zhu2024question,
title={Question Translation Training for Better Multilingual Reasoning},
author={Wenhao Zhu and Shujian Huang and Fei Yuan and Shuaijie She and Jiajun Chen and Alexandra Birch},
year={2024},
eprint={2401.07817},
archivePrefix={arXiv},
primaryClass={cs.CL}
}