Multi-task learning aims to learn multiple different tasks simultaneously while maximizing performance on one or all of the tasks.
The General Language Understanding Evaluation benchmark (GLUE) is a tool for evaluating and analyzing the performance of models across a diverse range of existing natural language understanding tasks. Models are evaluated based on their average accuracy across all tasks.
The state-of-the-art results can be seen on the public GLUE leaderboard.