layout | title | nav_order | image | description |
---|---|---|---|---|
default |
Home |
0 |
Twitter_card.png |
Model-recycling - the best model per architecture. Comparing finetuned models from HF, as base models for future finetuning. |
Hardly anyone trains from scratch anymore, we all finetune over a pretrained model.
Research slowly reaches consensus that some finetuned models are better base models than the pretrained models themselves.
This site presents a dynamic view of the best models to choose for a given model size and architecture. We follow the findings and methodology from our paper: We download finetuned models found in HuggingFace per architecture and efficiently rank them over a representative task. We then evaluate the top ranked models by finetuning over a large set of 36 target tasks, and report the average performance of each base model.
Tested so far: 2685 (and counting)
Pretrained | Best model | Avg. | Pretrained Avg. | Ranking |
---|---|---|---|---|
roberta-base | ibm/ColD-Fusion | 78.47 | 76.22 | link |
bert-base-uncased | ibm/ColD-Fusion-bert-base-uncased-itr23-seed0 | 75.64 | 72.20 | link |
bert-base-cased | skim945/bert-finetuned-squad | 74.43 | 72.43 | link |
t5-base | adit94/nlpcharade | 78.23 | 75.45 | link |
google/t5-v1_1-base | shaiman12/flan-t5-base-samsum | 78.18 | 68.82 | link |
microsoft/deberta-v3-base | sileod/deberta-v3-base-tasksource-nli | 80.73 | 79.04 | link |
To learn more see our FAQ or read the paper. See detailed evaluation results on each architecture here. If you have any feedback or question please contact us.
This work was performed in IBM Research by Leshem Choshen, Elad Venezian, Shachar Don-Yehiya, Noam Slonim and Yoav Katz.