Exploring Task Selection for Intermediate-Task Transfer Learning

This repository contains the implementation of "Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study" with prompt tuning, and prompt transfer.

Installation

Python Version

Python >= 3.8

Environment

Create an environment from file and activate the environment.

conda create -n intermediate-task-selection
conda activate intermediate-task-selection

Install denpendencies first.

. install_dependencies.sh

We suggest to install editable mode. This works for most of case.

pip install -e .

Datasets

We train 23 tasks across NLI, paraphrase detection, semantic similarity, question answering, reading comprehension, grammatical acceptability, word sense disambiguation, sentiment analysis, and coreference resolution.

We split these datasets into those with less than 1K training samples as target tasks and the those with more than 1K samples as source tasks.

In total, we select 13 source tasks and 10 target tasks. To run the following tasks, please use the provided names below:

Source Tasks

Here, source tasks paired with their respective evaluation metrics are the dataset with richer annotation numbers. We train 13 source tasks with prompt tuning, then initialize these pre-trained prompt weights for continual training on target tasks. We apply a learning rate of 5e-1 for all source tasks.

Dataset	Metrics
mnli	accuracy
mnli_mismatched	accuracy
mnli_matched	accuracy
qqp	accuracy, f1
qnli	accuracy
superglue-record	f1, em
cxc	pearson, spearmanr
squad	f1, em
drop	f1, em
sst2	accuracy
winogrande	accuracy
hellaswag	accuracy
superglue-multirc	f1, em
cosmosqa	accuracy
race	accuracy

Target Tasks

We train on 10 target task as baseline and transfer 13 source prompt to each target task. We apply learning rate of 2.

Dataset	Metrics
boolq	accuracy
cola	matthews_correlation
stsb	pearson, spearmanr
superglue-wic	accuracy
cr	accuracy
mrpc	accuracy, f1
rte	accuracy
superglue-wsc	accuracy
superglue-copa	accuracy
cb	f1_multiclass, accuracy

Prompt Transfer

This section performs intermediate task transfer with prompt tuning. This involves:

Prompt tuning that initialized from the sampled embedding's token.
Prompt transfer that initialized from pretrained prompt trained on the source task.

To reproduce the results of Table 5.2 (Effect of prompt transfer), you need to execute both scripts for prompt tuning and the prompt transfer one.

We applied the same configuration for both prompt tuning and prompt transfer.

You can find our example scripts under seq2seq/scripts. These scripts demonstrate prompt tuning, prompt transfer, and task selection using the configuration files located in seq2seq/configs. To execute the models, please first do:

cd intermediate-task-selection/seq2seq

Configuration File and Arguments

When training the model with prompt tuning, all code for training requires a configuration file defined in configs folder. Our implementation manages files according to the fine-tuning methods and model type, e.g. configs/prompt_tuning_tokens_config/t5-base. Feel free to create your own directory.

Note that we offer partial arguments in our main Python script (run_seq2seq.py) to enable flexible configuration of hyperparameters. These partial arguments facilitate sweeping over different testing values, overriding the arguments specified in the configuration files.

Run Prompt Tuning

To perform prompt tuning on both the source and target tasks for a single task, execute the following command. This script trains prompt tuning with initialization from the language model's vocabulary method.

When dealing with target tasks, we set the learning rate to 2, while for the source tasks, a learning rate of 5e-1 is employed.

To run prompt tuning, please run the command:

. script/prompt_tuning_tokens_init.sh

To get the average performance, please run:

python dev/get_prompt_scores.py

Run Prompt Transfer

The commands train prompt tuning with initialization from pretrained prompt weights. Please specify your prompt checkpoints to CKPTS.

. script/tf_prompt_tuning.sh

To get the relative performance, please run:

python dev/get_transfer_scores.py

Creating Ranking Files

After training all models on target tasks, you can create a ranking of empirical prompt transfer as ground-truth reference for evaluating the prediction of task embedding. For evaluating all 13 source tasks transferring to RTE, please do:

python dev/get_transfer_ranking.py \
	--tgt_task=rte \
	--output_dir=PATH_TO_YOUR_DIR/spot_eval/transfer_ranking

We save the result file in the --output_dir and require it for evaluation ranking. Each file is named as eval_tgt-TASKNAME_seed-VALUE, which contains prompt tuning's and prompt transfer's performances and ranking of intermediate tasks sorted by prompt transfer.

You can also run the script once you have done all the training jobs.

. scripts/run_transfer_ranking.sh

Task Selection

In order to evaluate the transferability of a source task to a given target task, one would need to run prompt tuning on all tasks. We provides code for estimating the transferability via vocab similarity and task embedding upon prompt weight.

vocab similarity

vocab similarity estimates the overlapping of two vocaublaries. Please run:

. scripts/run_vocab_sim.sh

task embeddings

With our training scripts, we save the prompt weight along in the file prefix_shared.bin. Through all task embedding experiments, the weights are calculated for the task embeddings.

Except for prompt similarity (feature_mean), we provide additional constructions for task embeddings. The following values are supported for --task_embedding_type argument: feature_mean, flatten, unigram, bigram, max_pairwise.

. scripts/get_prompt_similarity.sh

We save the predicted ranking file eval_tgt-TASKNAME_seed-VALUE.json in the --output_dir.

Evaluation on Task Selection

We evaluate using metrics such as ndcg and regret_at_k, which are supported by the --method argument. To assess the task selection methods (random, size-based, text embedding-based, or task embedding-based), both a prediction file and a reference file are necessary.

python dev/get_ndcg.py \
	--pred_dir=PATH_TO_YOUR_DIR \
	--target_dir=PATH_TO_YOUR_DIR \
	--output_dir=PATH_TO_YOUR_DIR \
	--method=ndcg

You can replace ndcg with regret_at_k and top_k_performance.

For evaluation prediction of data size and random approaches, you can directly pass boolean arguments as follows:

python dev/get_ndcg.py 	--ranking_by_random

For data size method, please do:

python dev/get_ndcg.py 	--ranking_by_size

Fine-tuning methods and adding tasks

This repo is developed based on COMPACTER and contains the implementation of recent parameter-efficient fine-tuning methods. For full fine-tuning, please run:

. scripts/baseline.sh

Other parameter fine-tuning methods can be found in the scripts folder (Adapter, AdapterDrop, Low-Rank, BitFit, Compacter, Compacter++, PHM-Adapters, Intrinsic-SAID). Please check scripts for detail

If you wish to add a new task, you will need to create a new dataset class in /data/{tasks,postprocessors}.py and its corresponding configuration file. For example, when running a task, a configuration file such as prompt_tuning_tokens_config/t5-base/prompt_tuning_tokens_init_boolq.json is required.

Contact Information

For help or issues using our code, please submit a issue or contact to pjlin@lsv.uni-saarland.de

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
seq2seq		seq2seq
LICENSE		LICENSE
README.md		README.md
install_dependencies.sh		install_dependencies.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring Task Selection for Intermediate-Task Transfer Learning

Installation

Python Version

Environment

Datasets

Source Tasks

Target Tasks

Prompt Transfer

Configuration File and Arguments

Run Prompt Tuning

Run Prompt Transfer

Creating Ranking Files

Task Selection

Evaluation on Task Selection

Fine-tuning methods and adding tasks

Contact Information

About

Releases

Packages

Languages

License

uds-lsv/intermediate-task-selection

Folders and files

Latest commit

History

Repository files navigation

Exploring Task Selection for Intermediate-Task Transfer Learning

Installation

Python Version

Environment

Datasets

Source Tasks

Target Tasks

Prompt Transfer

Configuration File and Arguments

Run Prompt Tuning

Run Prompt Transfer

Creating Ranking Files

Task Selection

Evaluation on Task Selection

Fine-tuning methods and adding tasks

Contact Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages