This repository shares the code and models of our latest work "Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners". In this work, we first discover and comprehensively investigate the spontaneous multilingual alignment improvement of LLMs. We find that LLMs instruction-tuned on the question translation data (i.e. without annotated answers) are able to encourage the alignment between English and a wide range of languages, even including those unseen during instruction-tuning. Additionally, we utilize different settings and mechanistic interpretability methods to analyze the LLM's performance in the multilingual scenario comprehensively. Our work suggests LLMs' enormous potential for improving multilingual alignment efficiently with great language generalization and task generalization.
We provide the benchmarks and datasets we utilize in our experiments in the ./data
. We report the information in detail as below:
Dataset | Usage | Languages | Path |
---|---|---|---|
Amazon Reviews Polarity | Question Translation Alignment | \ | ./data/ap_emotion |
SNLI | Question Translation Alignment | \ | ./data/snli |
PAWS | Question Translation Alignment | \ | ./data/paws |
Amazon Reviews Polarity | Evaluation | en, zh, de, fr, es, it, nl, ja, ru, sv, sl, pl, bg, no, ms, is, hi, th, sw, bn | ./data/ap_emotion |
SNLI | Evaluation | en, zh, de, fr, es, it, nl, ja, ru, sv, sl, pl, bg, no, ms, is, hi, th, sw, bn | ./data/snli |
PAWS | Evaluation | en, zh, de, fr, es, it, nl, ja, ru, sv, sl, pl, bg, no, ms, is, hi, th, sw, bn | ./data/paws |
To install this repository, follow these steps:
git clone https://github.com/Shimao-Zhang/LLM-Multilingual-Learner.git
cd LLM-Multilingual-Learner
pip install -r requirements.txt
We train our models based on the LLaMA Factory.
You should replace the path of the model and data in ./LLaMA-Factory/sft_question_single_lora.bash
with the appropriate paths, and you should also use the corresponding template.
- finetuning
bash ./LLaMA-Factory/sft_question_single_lora.bash
For finetuning, you can use the hyperparameters below:
#!/bin/bash
export HF_HOME=/home/huggingface_cache_path
CUDA_VISIBLE_DEVICES=0 python ./src/train_bash.py \
--stage sft \
--do_train \
--model_name_or_path model_name_or_path \
--dataset dataset_name \
--dataset_dir ./data \
--template template_name \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir output_dir_path \
--overwrite_cache \
--overwrite_output_dir \
--cutoff_len 2048 \
--preprocessing_num_workers 16 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--warmup_steps total_step/10 \
--save_steps 150000 \
--eval_steps 50 \
--evaluation_strategy steps \
--load_best_model_at_end \
--learning_rate 5e-5 \
--num_train_epochs 3.0 \
--max_samples 100000 \
--val_size 0.05 \
--plot_loss \
--fp16
- merge
bash ./LLaMA-Factory/merge_lora_weights.bash
We evaluate the model by constrained decoding and calculating the accuracy. To evaluate the model performance, you can use the following command. Note that before running the scripts, you should set the appropriate model_size, target_lang, and model path in the corresponding .py
file.
- evaluating with Amazon Reviews Polarity
cd ./scripts
bash run_emotion_eval.bash
- evaluating with SNLI
cd ./scripts
bash run_snli_eval.bash
- evaluating with PAWS
cd ./scripts
bash run_paws_eval.bash
- logit lens
cd ./scripts
bash run_emotion.bash
-
Principal Component Analysis
Running the Jupyter file
knowledge_finding.ipynb
If you find this repository helpful, feel free to cite our paper.
You can just follow the citation information of ACL Anthology or Google Scholar:
@inproceedings{zhang2024getting,
title={Getting More from Less: Large Language Models are Good Spontaneous Multilingual Learners},
author={Zhang, Shimao and Gao, Changjiang and Zhu, Wenhao and Chen, Jiajun and Huang, Xin and Han, Xue and Feng, Junlan and Deng, Chao and Huang, Shujian},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages={8037--8051},
year={2024}
}