ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Performance prediction is a method to estimate the performance of Language Models (LMs) on various Natural Language Processing (NLP) tasks, mitigating computational costs associated with model capacity and data for fine-tuning. Our paper presents ProxyLM, a scalable task- and language-agnostic framework designed to predict the performance of LMs using proxy models. These proxy models act as surrogates, approximating the performance of the LM of interest. By leveraging these proxy models, ProxyLM significantly reduces computational overhead in task evaluations, achieving up to a 37.08x speedup over traditional methods, even with our smallest proxy models. Our results across multiple multilingual NLP tasks and various robustness tests demonstrate that ProxyLM not only adapts well to previously unseen languages in pre-trained LMs, but also generalizes effectively across different datasets, outperforming the state-of-the-art by at least 1.78x in terms of root-mean-square error (RMSE).

If you are interested for more information, check out our full paper.

Environment

Python 3.10 or higher. Details of dependencies are in setup.py.

Setup Instruction

Run pip install .. This will install basic dependencies to reproduce ProxyLM's framework. Note that the experimental records are in src/proxy_regressor/csv_datasets.

The following steps are OPTIONAL for finetuning the language models:

[OPTIONAL] In order to finetune the language models to produce more experimental records, please install additional dependencies through pip install '.[fairseq, llama-factory]' depending on the need. We use fairseq to finetune/do inference on Machine Translation (MT), while we use LLaMA-Factory to finetune/do inference for intent classification and slot filling.
[OPTIONAL] Specifically for MT, run bash setup_mt_finetune.sh which will automatically download selected models and our curated dataset for MT. If the model or dataset cannot be downloaded successfully, please refer to section Dataset Manual Download Links and LMs Manual Download Links.
[OPTIONAL] Specifically for intent classification and slot filling, replace dataset_info.json in data at installed LLaMA-Factory library with our version at src/llama_factory_configs/dataset_info.json to use MASSIVE dataset.

Dataset Manual Download Links

Download our curated dataset for LM fine-tuning. You can also download the dataset from the original papers of MT560 dataset and NusaTranslation dataset, but we have compiled our dataset in a way that it smoothly runs within our pipeline.
Unzip the dataset by running tar -xzvf dataset.tar.gz dataset and put the dataset folder in experiments folder (need to be created) in the same directory as this README.md.

LMs Manual Download Links

If any of the download link has expired or become invalid, please use the following link below to download the model manually.

Example MT Finetuning Usages

Start training/finetuning tasks by running:

python -m src.mt_finetune.<lm_name>.main --src_lang ${src_lang} --tgt_lang ${tgt_lang} --finetune 1 --dataset ${dataset} --size ${size}

Start generation tasks by running:

python -m src.mt_finetune.<lm_name>.main --src_lang ${src_lang} --tgt_lang ${tgt_lang} --finetune 0

Replace <lm_name> with LMs name such as m2m100, nllb, small100, or transformer.
All the results will be displayed in experiments folder

Example Intent Classification or Slot Filling Finetuning/Inference Usages

Start training/finetuning tasks by running:

llamafactory-cli train <lm_name>_finetune_<task_name>.yaml

Start generation tasks by running:

llamafactory-cli train <lm_name>_predict_<task_name>.yaml

Replace <lm_name> with LMs name such as aya, llama3, smollm_135m, smollm_360m, and bloomz_560m.
Replace <task_name> with proper task such as intent and slot.
The configs can be found in src/llama_factory_configs folders.

Example Regressor Usages

General script to Run Experiment
```
proxylm-cli --config <config.yaml>
```
Replace <config.yaml> with a proper YAML file, example YAMLs can be found in sample_yaml folder.
Fill exp_mode with either random or lolo. Specifically for MT task, unseen, cross_dataset, and incremental are valid.
Fill regressor with either xgb, lgbm, poly, or mf.
Fill regressor_config with a proper path to the regressor JSON.
Fill score with proper score to be used.
Fill dataset_name with proper dataset corresponding to the tasks (either mt560, nusa, intent, or slot).
Fill model with proper model corresponding to the tasks. For MT, use either m2m100 or nllb. For intent or slot, use either aya or llama3
Specifically for lolo in exp_mode, you need to supply language to be left-out using lang argument. If you'd like to run lolo for all languages, supply lang with all.

Citation

If you use this code for your research, please cite the following work:

@article{anugraha2024proxylm,
  title={ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models},
  author={Anugraha, David and Winata, Genta Indra and Li, Chenyue and Irawan, Patrick Amadeus and Lee, En-Shiun Annie},
  journal={arXiv preprint arXiv:2406.09334},
  year={2024}
}

If you have any questions, you can open a GitHub Issue or send us an email.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
proxy_experiments_summary		proxy_experiments_summary
sample_yaml		sample_yaml
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
logo.png		logo.png
setup.py		setup.py
setup_mt_finetune.sh		setup_mt_finetune.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Contents

Environment

Setup Instruction

Dataset Manual Download Links

LMs Manual Download Links

Example MT Finetuning Usages

Example Intent Classification or Slot Filling Finetuning/Inference Usages

Example Regressor Usages

Citation

About

Releases

Packages

Contributors 2

Languages

License

davidanugraha/proxylm

Folders and files

Latest commit

History

Repository files navigation

ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models

Contents

Environment

Setup Instruction

Dataset Manual Download Links

LMs Manual Download Links

Example MT Finetuning Usages

Example Intent Classification or Slot Filling Finetuning/Inference Usages

Example Regressor Usages

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages