Evaluate performance of Transformers in different scenarios. The library is mainly based on the work of the 🤗 team and should be used if you already use their libraries.
Install using pip:
pip install benchmark-for-transformers
benchmark-for-transformers
allows you to create a benchmark to evaluate and compare Transformers in a scenario.
You have to create a benchmark that is composed of:
- one dataset
- one or more metrics
- one or more scenarios
To create a benchmark you can either use the API or a json
file.
from benchmark_for_transformers import Benchmark
import torch
torch.set_num_threads(1)
# Set the dataset and the metric to use for the Benchmark
benchmark = Benchmark.from_args(
dataset_name="xsum",
dataset_split="test[:10]",
x_column_name=["document"],
y_column_name="summary",
metric_name="rouge",
metric_values=["rouge1", "rouge2", "rougeL"],
metric_run_kwargs={"rouge_types": ["rouge1", "rouge2", "rougeL"]},
)
# Add a scenario
benchmark.reset_scenarios()
benchmark.add_scenario(
name="Bart Xsum on cuda",
model_class="summarization",
model_name="facebook/bart-large-xsum",
tokenizer_name="facebook/bart-large",
init_kwargs={
"generation_parameters": {
"num_beams": 4,
"length_penalty": 0.5,
"min_length": 11,
"max_length": 62
}
},
batch_size=1,
device="cuda"
)
df = benchmark.run()
print(df)
# # of parameters latency (mean) latency (90th percentile) rouge_rouge1 rouge_rouge2 rouge_rougeL
# Bart Xsum on cuda 406290432 0.850256 0.941304 0.376018 0.118984 0.274553
The benchmark json
file takes the same arguments as the API.
For example, sst-2.json
is a benchmark file for the Sentiment Analysis dataset:
{
"scenarios": [
{
"name": "distilbert",
"model_class": "classification",
"model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"tokenizer_name": "distilbert-base-uncased",
"batch_size": 1,
"device": "cuda"
},
{
"name": "albert-base",
"model_class": "classification",
"model_name": "textattack/albert-base-v2-SST-2",
"tokenizer_name": "textattack/albert-base-v2-SST-2",
"batch_size": 1,
"device": "cuda"
},
{
"name": "bert base",
"model_class": "classification",
"model_name": "textattack/bert-base-uncased-SST-2",
"batch_size": 1,
"device": "cuda"
}
],
"dataset": {
"dataset_name": "glue",
"split": "validation",
"x_column_name": ["sentence"],
"y_column_name": "label",
"init_kwargs": {"name": "sst2"}
},
"metrics": [
{
"metric_name": "glue",
"values": ["accuracy"],
"init_kwargs": {"config_name": "sst2"}
}
]
}
Once the benchmark file is ready, you can either load it using the API or directly run it using the CLI.
from benchmark_for_transformers import Benchmark
benchmark = Benchmark.from_json("sst-2.json")
df = benchmark.run()
print(df)
# # of parameters latency (mean) latency (90th percentile) glue_accuracy
# distilbert 66955010 0.006111 0.007480 0.910550
# albert-base 11685122 0.012642 0.014657 0.925459
# bert base 109483778 0.010371 0.012245 0.924312
benchmark-for-transformers-run --run_args_file "sst-2.json" --verbose --csv_file "results.csv"
# # of parameters latency (mean) latency (90th percentile) glue_accuracy
# distilbert 66955010 0.006111 0.007480 0.910550
# albert-base 11685122 0.012642 0.014657 0.925459
# bert base 109483778 0.010371 0.012245 0.924312
benchmark-for-transformers
uses the datasets
to load datasets and metrics. Therefore you can use all the datasets and metrics avalaible in this library. If you want to use a dataset or a metric that is not include in datasets
, you can easily add it by creating a small script (see documentation to add a dataset or a metric). For more information see the datasets
documentation.
For the moment, benchmark-for-transformers
only supports 4 tasks:
- Classification,
- NER,
- Summarization,
- Ordering (this can not be used for the moment and is for internal use).
These class are based on the main Model class and use HuggingFace transformers
models.
You can add a new task by creating a task script and put the path to this script in the model_class
Scenario
argument.
You can define several optimization features in the scenario:
- batch size,
- quantization,
- ONNX support.
You can also define the device you want to use.
For example, let's try some optimization features on distilbert
on the Sentiment Analysis dataset.
First we define a new benchmark json
file:
{
"scenarios": [
{
"name": "distilbert on cpu",
"model_class": "classification",
"model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"tokenizer_name": "distilbert-base-uncased",
"batch_size": 1,
"device": "cpu"
},
{
"name": "distilbert on cuda",
"model_class": "classification",
"model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"tokenizer_name": "distilbert-base-uncased",
"batch_size": 1,
"device": "cuda"
},
{
"name": "distilbert on cpu bsz 8",
"model_class": "classification",
"model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"tokenizer_name": "distilbert-base-uncased",
"batch_size": 8,
"device": "cpu"
},
{
"name": "distilbert on onnx cpu bsz 8",
"model_class": "classification",
"model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"tokenizer_name": "distilbert-base-uncased",
"batch_size": 8,
"device": "cpu",
"onnx": true
},
{
"name": "quantized distilbert on onnx cpu bsz 8",
"model_class": "classification",
"model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"tokenizer_name": "distilbert-base-uncased",
"batch_size": 8,
"device": "cpu",
"onnx": true,
"quantization": true
}
],
"dataset": {
"dataset_name": "glue",
"split": "validation",
"x_column_name": ["sentence"],
"y_column_name": "label",
"init_kwargs": {"name": "sst2"}
},
"metrics": [
{
"metric_name": "glue",
"values": ["accuracy"],
"init_kwargs": {"config_name": "sst2"}
}
]
}
Then, we run it using the API:
from benchmark_for_transformers import Benchmark
benchmark = Benchmark.from_json("sst-2-optimization.json")
df = benchmark.run()
print(df)
# # of parameters latency (mean) latency (90th percentile) glue_accuracy
# distilbert on cpu 66955010 0.061905 0.074103 0.910550
# distilbert on cuda 66955010 0.005782 0.006732 0.910550
# distilbert on cpu bsz 8 66955010 0.035685 0.043952 0.910550
# distilbert on onnx cpu bsz 8 -1 0.036746 0.044342 0.910550
# quantized distilbert on onnx cpu bsz 8 -1 0.023608 0.029647 0.902523
Some examples benchmark json
files are in the examples folder. You can look at it to see how use benchmark-for-transformers
.
In the examples folder, there are also subfolders containing examples of personnalized datasets and metrics scripts.
You can find a description of the repository, guide and examples in the documentation.