Skip to content

Code for the ACL 2024 paper "PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning"

Notifications You must be signed in to change notification settings

ytyz1307zzh/PLUG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

⚡ PLUG: Pivot Language Guided Generation

This is the repository for PLUG (Pivot Language gUided Generation), a simple yet effective method for the cross-lingual instruction tuning of large language models (LLMs). PLUG utilizes a high-resource language as the pivot to enhance instruction tuning in low-resource languages. It trains the model to first process the instruction and draft a response in the pivot language, before producing the final response in the target language. PLUG is proved to significantly improve the instruction-following abilities of LLMs in multiple target languages (Chinese, Korean, Italian, Spanish), compared to directly responding in the target language alone. For more details, please refer to our paper "PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning".

Auto-Instruct Illustration

Environment

pip install torch==2.0.1
pip install transformers==4.31.0 deepspeed==0.9.5 accelerate==0.21.0
pip install openai tiktoken tqdm peft huggingface_hub datasets 
# Only for evaluating X-AlpacaEval
pip install shortuuid anthropic

Code

Code can be found in src directory, which contains the following sub-directories:

  • ds_config: the configuration files for DeepSpeed.
  • model: code for LLM training (instruction tuning a.k.a. SFT) and inference
  • evaluate: code for evaluating instruction-tuned LLMs on X-AlpacaEval, X-TruthfulQA, X-SVAMP
  • translation: code for translating training data (instructions and responses) to target languages
  • utils: utility function for OpenAI API calls

Please refer to the corresponding directory for detailed information.

Data

We provide the following data used in our experiments

  • Training data: The training data used in the paper. We used the GPT4-Alpaca instruction tuning dataset and translated it into 4 target languages with GPT-3.5-turbo.
  • Evaluation benchmarks:
    • X-AlpacaEval: The main benchmark we used to evaluate the open-ended instruction-following abilities of LLMs. The benchmark was collected by hiring professional human translators to translate the original English AlpacaEval into 4 target languages. We used GPT-4 as the judge to compare responses from two models. We used the evaluation code from MT-bench, with a small edit in the GPT-4 prompt (specified in Appendix B of our paper).
    • X-TruthfulQA: Auxiliary experiments in our paper that evaluates the truthfulness of multilingual LLMs. We evaluated LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with the question, collect its answer, and let GPT-4 compare the answer with the reference answers.
    • X-SVAMP: Auxiliary experiments in our paper that evaluates the reasoning abilities of multilingual LLMs. We evaluated LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with the question, collect its response (a chain-of-thought rationale), and let GPT-3.5-turbo extract the predicted answer from the response. Then, we compared the extracted answer with the reference answer to calculate accuracy.

Citation

If you find our data or code useful, please kindly cite our paper:

@article{zhang2023plug,
  title={PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning},
  author={Zhang, Zhihan and Lee, Dong-Ho and Fang, Yuwei and Yu, Wenhao and Jia, Mengzhao and Jiang, Meng and Barbieri, Francesco},
  journal={arXiv preprint arXiv:2311.08711},
  year={2023}
}

About

Code for the ACL 2024 paper "PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published