Performance prediction is a method to estimate the performance of Language Models (LMs) on various Natural Language Processing (NLP) tasks, mitigating computational costs associated with model capacity and data for fine-tuning. Our paper presents ProxyLM, a scalable task- and language-agnostic framework designed to predict the performance of LMs using proxy models. These proxy models act as surrogates, approximating the performance of the LM of interest. By leveraging these proxy models, ProxyLM significantly reduces computational overhead in task evaluations, achieving up to a 37.08x speedup over traditional methods, even with our smallest proxy models. Our results across multiple multilingual NLP tasks and various robustness tests demonstrate that ProxyLM not only adapts well to previously unseen languages in pre-trained LMs, but also generalizes effectively across different datasets, outperforming the state-of-the-art by at least 1.78x in terms of root-mean-square error (RMSE).
If you are interested for more information, check out our full paper.
- Environment
- Setup Instruction
- Dataset Manual Download Links
- LMs Manual Download Links
- Example LM Finetuning Usages
- Example Regressor Usages
- Citation
Python 3.10 or higher. Details of dependencies are in setup.py
.
- Run
pip install .
. This will install basic dependencies to reproduce ProxyLM's framework. Note that the experimental records are insrc/proxy_regressor/csv_datasets
.
The following steps are OPTIONAL for finetuning the language models:
- [OPTIONAL] In order to finetune the language models to produce more experimental records, please install additional dependencies through
pip install '.[fairseq, llama-factory]'
depending on the need. We use fairseq to finetune/do inference on Machine Translation (MT), while we use LLaMA-Factory to finetune/do inference for intent classification and slot filling. - [OPTIONAL] Specifically for MT, run
bash setup_mt_finetune.sh
which will automatically download selected models and our curated dataset for MT. If the model or dataset cannot be downloaded successfully, please refer to section Dataset Manual Download Links and LMs Manual Download Links. - [OPTIONAL] Specifically for intent classification and slot filling, replace
dataset_info.json
indata
at installed LLaMA-Factory library with our version atsrc/llama_factory_configs/dataset_info.json
to use MASSIVE dataset.
- Download our curated dataset for LM fine-tuning. You can also download the dataset from the original papers of MT560 dataset and NusaTranslation dataset, but we have compiled our dataset in a way that it smoothly runs within our pipeline.
- Unzip the dataset by running
tar -xzvf dataset.tar.gz dataset
and put thedataset
folder inexperiments
folder (need to be created) in the same directory as thisREADME.md
.
If any of the download link has expired or become invalid, please use the following link below to download the model manually.
-
Start training/finetuning tasks by running:
python -m src.mt_finetune.<lm_name>.main --src_lang ${src_lang} --tgt_lang ${tgt_lang} --finetune 1 --dataset ${dataset} --size ${size}
-
Start generation tasks by running:
python -m src.mt_finetune.<lm_name>.main --src_lang ${src_lang} --tgt_lang ${tgt_lang} --finetune 0
-
Replace
<lm_name>
with LMs name such asm2m100
,nllb
,small100
, ortransformer
. -
All the results will be displayed in
experiments
folder
-
Start training/finetuning tasks by running:
llamafactory-cli train <lm_name>_finetune_<task_name>.yaml
-
Start generation tasks by running:
llamafactory-cli train <lm_name>_predict_<task_name>.yaml
-
Replace
<lm_name>
with LMs name such asaya
,llama3
,smollm_135m
,smollm_360m
, andbloomz_560m
. -
Replace
<task_name>
with proper task such asintent
andslot
. -
The configs can be found in
src/llama_factory_configs
folders.
-
General script to Run Experiment
proxylm-cli --config <config.yaml>
-
Replace
<config.yaml>
with a proper YAML file, example YAMLs can be found insample_yaml
folder. -
Fill
exp_mode
with eitherrandom
orlolo
. Specifically for MT task,unseen
,cross_dataset
, andincremental
are valid. -
Fill
regressor
with eitherxgb
,lgbm
,poly
, ormf
. -
Fill
regressor_config
with a proper path to the regressor JSON. -
Fill
score
with proper score to be used. -
Fill
dataset_name
with proper dataset corresponding to the tasks (eithermt560
,nusa
,intent
, orslot
). -
Fill
model
with proper model corresponding to the tasks. For MT, use eitherm2m100
ornllb
. For intent or slot, use eitheraya
orllama3
-
Specifically for
lolo
inexp_mode
, you need to supply language to be left-out usinglang
argument. If you'd like to runlolo
for all languages, supplylang
withall
.
If you use this code for your research, please cite the following work:
@article{anugraha2024proxylm,
title={ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models},
author={Anugraha, David and Winata, Genta Indra and Li, Chenyue and Irawan, Patrick Amadeus and Lee, En-Shiun Annie},
journal={arXiv preprint arXiv:2406.09334},
year={2024}
}
If you have any questions, you can open a GitHub Issue or send us an email.