MindMerger

Code for MindMerger: Efficient Boosting LLM Reasoning in non-English Languages (NeurIPS 2024)

MindMerger is a new method for multilingual reasoning, which merges LLMs with the external language understanding capabilities from multilingual models to boost the multilingual reasoning performance. A two-step training scheme is introduced to first train to embeded the external capabilities into LLMs and then train the collaborative utilization of the external capabilities and the built-in capabilities in LLMs.

Pip Installation

pip install -r requirements.txt

Data Preparation

Download the datasets and checkpoint in here and put them under current folder.

In the folder, we provide two stage training data and evaluation data for math, x-csqa, and xnli tasks. We provide the checkpoint of MindMerger for math based on MetaMath-Llama-7B, for x-csqa based on LLaMAX-7B-X-CSQA, and for xnli based on LLaMAX-7B-X-XNLI. mT5-xl is used as multilingual encoder.

Evaluation

The checkpoint is the parameters of mapping layer for specfic LLM and multilingual model. To evaluate the performance of MindMerger, you can run as follows:

deepspeed run_evaluation.py --deepspeed \
    --llm_path meta-math/MetaMath-7B-V1.0 \
    --mt_path google/mt5-xl \
    --init_checkpoint outputs/MergeMinds/math/augmentation/pytorch_model.bin \
    --augmentation True

Evaluation results on MGSM dataset:

MGSM	Avg.	Te	Bn	Th	Sw	Ja	Zh	De	Fr	Ru	Es	En
MindMerger (MetaMath-Llama-7B)	57.6	52.8	52.0	59.2	56.8	51.2	55.2	61.2	55.2	61.6	62.4	66.0

Evaluation results on X-CSQA dataset:

X-CSQA	Avg.	Sw	Ur	Hi	Ar	Vi	Ja	Pl	Zh	Nl	Ru	It	De	Pt	Fr	Es	En
Llama2-7B-X-CSQA	50.9	23.2	24.7	32.9	32.4	51.0	50.0	51.5	55.6	56.9	55.8	58.8	59.9	60.4	61.8	61.9	78.1
MindMerger (Llama2-7B-X-CSQA)	61.0	45.5	46.2	48.4	51.4	60.6	53.9	63.3	62.9	63.8	63.7	66.8	67.0	67.1	68.1	69.1	78.1
LLaMAX-7B-X-CSQA	55.1	43.5	39.0	44.1	45.1	54.0	49.9	54.6	58.2	58.9	57.1	59.1	59.0	60.9	61.6	62.7	74.0
MindMerger (LLaMAX-7B-X-CSQA)	61.2	51.2	50.7	50.8	54.4	60.4	55.9	63.8	64.4	64.3	61.5	64.2	64.1	65.3	64.6	67.7	75.4

Evaluation results on XNLI dataset:

XNLI	Avg.	Sw	Ur	Hi	Th	Ar	Tr	El	Vi	Zh	Ru	Bg	De	Fr	Es	En
Llama2-7B-X-XNLI	70.6	44.6	55.1	62.2	58.4	64.7	64.9	65.6	75.4	75.9	78.9	78.6	80.7	81.7	83.1	89.5
MindMerer (Llama2-7B-X-XNLI)	78.4	66.6	69.4	74.7	71.8	76.2	75.7	78.5	80.3	80.0	80.7	82.4	83.5	83.9	84.4	88.7
LLaMAX-7B-X-XNLI	76.2	66.7	65.3	69.1	66.2	73.6	71.8	74.3	77.4	78.3	80.3	81.6	82.2	83.0	84.1	89.7
MindMerer (LLaMAX-7B-X-XNLI)	79.2	72.6	71.5	74.9	73.4	77.1	76.4	78.7	80.4	80.5	80.8	82.4	83.1	84.1	84.5	88.5

Training

We use a two-stage scheme to train MergeMinds.

Mapping stage helps LLM learn to use the capabilities of multilingual model.

deepspeed run_training.py --deepspeed \
    --llm_path meta-math/MetaMath-7B-V1.0 \
    --mt_path google/mt5-xl \
    --task math \
    --stage_name mapping --train_num 100000 \
    --train_batch_size 128 \
    --train_micro_batch_size_per_gpu 8 \
    --augmentation False \
    --epoch_num 3 \
    --max_seq_len 200 \
    --max_gen_len 200

Augmentation stage helps LLM collaboratively utilize its own and the capabilities from multilingual model.

deepspeed run_training.py --deepspeed \
    --llm_path meta-math/MetaMath-7B-V1.0 \
    --mt_path google/mt5-xl \
    --task math \
    --stage_name augmentation --train_num 30000 \
    --train_batch_size 128 \
    --train_micro_batch_size_per_gpu 2 \
    --augmentation False \
    --epoch_num 3 \
    --max_seq_len 512 \
    --max_gen_len 512

You can also use the script to run our codes:

bash scripts/training_math.sh

Reference

Please cite this paper in your publications if it helps your research:

@inproceedings{Huang2024MindMergerEB,
  title={MindMerger: Efficient Boosting LLM Reasoning in non-English Languages},
  author={Zixian Huang and Wenhao Zhu and Gong Cheng and Lei Li and Fei Yuan},
  year={2024},
  url={https://api.semanticscholar.org/CorpusID:270063337}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
mindmerger_tools		mindmerger_tools
scripts		scripts
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
model.png		model.png
modeling_mindmerger.py		modeling_mindmerger.py
requirements.txt		requirements.txt
run_evaluation.py		run_evaluation.py
run_training.py		run_training.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MindMerger

Pip Installation

Data Preparation

Evaluation

Training

Reference

About

Releases

Packages

Contributors 2

Languages

License

CONE-MT/MindMerger

Folders and files

Latest commit

History

Repository files navigation

MindMerger

Pip Installation

Data Preparation

Evaluation

Training

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages