diff --git a/.gitignore b/.gitignore
index 5c719f5c..5f327c39 100644
--- a/.gitignore
+++ b/.gitignore
@@ -216,3 +216,4 @@ cython_debug/
 .idea/
 TensorRT/
 triton_models/
+demo/roberta-*/
diff --git a/README.md b/README.md
index c8c14879..8fe6666d 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,4 @@
-# From 🤗 to 🤯, Hugging Face Transformer submillisecond inference️ and deployment to production
+# Hugging Face Transformer submillisecond inference️ and deployment to production: 🤗 → 🤯
 
 [![tests](https://github.com/ELS-RD/transformer-deploy/actions/workflows/python-app.yml/badge.svg)](https://github.com/ELS-RD/transformer-deploy/actions/workflows/python-app.yml) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENCE) [![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)
 
@@ -14,6 +14,7 @@
 * [🐍 TensorRT usage in Python script](#tensorrt-usage-in-python-script)
 * [⏱ benchmarks](#benchmarks)
 * [🤗 end to end reproduction of Infinity Hugging Face demo](./demo/README.md) (to replay [Medium article](https://towardsdatascience.com/hugging-face-transformer-inference-under-1-millisecond-latency-e1be0057a51c?source=friends_link&sk=cd880e05c501c7880f2b9454830b8915))
+* [🏎️ end to end GPU quantization tutorial](./demo/quantization_end_to_end.ipynb)
 
 #### Why this tool?
 
@@ -85,7 +86,16 @@ With the single command below, you will:
 * **generate** configuration files for Triton inference server
 
 ```shell
-convert_model -m roberta-large-mnli --backend tensorrt onnx pytorch --seq-len 16 128 128 --batch-size 1 32 32
+convert_model -m roberta-large-mnli --backend tensorrt onnx --seq-len 16 128 128 --batch-size 1 32 32
+# ...
+# Inference done on NVIDIA GeForce RTX 3090
+# latencies:
+# [Pytorch (FP32)] mean=123.26ms, sd=3.35ms, min=117.84ms, max=136.12ms, median=122.09ms, 95p=129.50ms, 99p=131.24ms
+# [Pytorch (FP16)] mean=78.41ms, sd=2.83ms, min=75.58ms, max=88.48ms, median=77.28ms, 95p=84.66ms, 99p=85.97ms
+# [TensorRT (FP16)] mean=182.99ms, sd=3.15ms, min=175.75ms, max=191.58ms, median=182.32ms, 95p=188.37ms, 99p=190.80ms
+# [ONNX Runtime (vanilla)] mean=119.03ms, sd=8.27ms, min=112.15ms, max=185.57ms, median=116.51ms, 95p=129.18ms, 99p=167.70ms
+# [ONNX Runtime (optimized)] mean=53.82ms, sd=0.81ms, min=52.79ms, max=58.27ms, median=53.74ms, 95p=55.38ms, 99p=57.29ms
+
 ```
 
 > **16 128 128** -> minimum, optimal, maximum sequence length, to help TensorRT better optimize your model  
diff --git a/VERSION b/VERSION
index 17e51c38..0ea3a944 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-0.1.1
+0.2.0
diff --git a/demo/README.md b/demo/README.md
index 2a606e17..b97ae13d 100644
--- a/demo/README.md
+++ b/demo/README.md
@@ -40,7 +40,7 @@ docker run -it --rm --gpus all \
   -v $PWD:/project ghcr.io/els-rd/transformer-deploy:0.1.1 \
   bash -c "cd /project && \
     convert_model -m \"philschmid/MiniLM-L6-H384-uncased-sst2\" \
-    --backend tensorrt onnx pytorch \
+    --backend tensorrt onnx \
     --seq-len 16 128 128"
 ```
 
diff --git a/demo/quantization_end_to_end.ipynb b/demo/quantization_end_to_end.ipynb
new file mode 100644
index 00000000..1d33edce
--- /dev/null
+++ b/demo/quantization_end_to_end.ipynb
@@ -0,0 +1,5777 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Recipes to perform Nvidia GPU INT-8 quantization on most transformers model (encoder based)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Quantization is one of the most effective and generic approach to make model inference faster.\n",
+    "Basically, it replaces high precision float numbers in model tensors encoded in 32 or 16 bits by lower precision ones encoded in 8 bits or less:\n",
+    "\n",
+    "* it takes less memory\n",
+    "* computation is easier / faster\n",
+    "\n",
+    "It can be applied to any model in theory, and, if done well, it should not decrease its accuracy.\n",
+    "\n",
+    "The purpose of this tutorial is to show 2 processes to perform quantization on most `transformer` architectures.\n",
+    "\n",
+    "**TL;DR, inference is 5 times faster on a `Roberta-base` model** with a batch of size 32 / seq len 256, benchmark on MNLI datasets (bold -> **quantization**):\n",
+    "\n",
+    "| Framework                  | Precision | Latency (ms) | Accuracy | Speedup   | Hardware |\n",
+    "|:---------------------------|-----------|--------------|----------|:----------|:--------:|\n",
+    "| Pytorch                    | FP32      | 4407         | 86.8 %   | X 0.02    |   CPU    |\n",
+    "| Pytorch                    | FP16      | 4255         | 86.8 %   | X 0.02    |   CPU    |\n",
+    "| Pytorch                    | FP32      | 77           | 86.8 %   | X 1       |   GPU    |\n",
+    "| Pytorch                    | FP16      | 58           | 86.8 %   | X 1.3     |   GPU    |\n",
+    "| TensorRT                   | FP16      | 30           | 86.8 %   | X 2.6     |   GPU    |\n",
+    "| TensorRT (transplantation) | **INT-8** | 15           | 84.8 %   | **X 5.1** |   GPU    |\n",
+    "| TensorRT (custom QDQ code) | **INT-8** | 15           | 85.6 %   | **X 5.1** |   GPU    |\n",
+    "\n",
+    "> measures done on a Nvidia RTX 3090 GPU + 12 cores i7 Intel CPU  \n",
+    "> accuracy obtained after a single epoch, no LR search or any hyper parameter optimization  \n",
+    "> CPU measures are unfair (no try to optimize inference speed at all) but still indicative of what kind of perf to expect from Pytorch+CPU deployment  \n",
+    "> same kind of acceleration is observed on all seq len / batch sizes\n",
+    "\n",
+    "\n",
+    "## A (very) short intro to INT-8 quantization\n",
+    "\n",
+    "Basic idea behind model quantization is to replace tensors made of float numbers (usually encoded on 32 bits) by lower precision representation (integers encoded on 8 bits for Nvidia GPUs).\n",
+    "Therefore computation is faster and model memory footprint is lower. Making tensor storage smaller makes memory transfer faster... and is also a source of computation acceleration.\n",
+    "This technic is very interesting for its trade-off: you reduce inference time significantly, and when dataset is large enough, it costs close to nothing in accuracy.\n",
+    "\n",
+    "Replacing float numbers by integers is done through a mapping.\n",
+    "This step is called `calibration`, and its purpose is to compute for each tensor or each channel of a tensor (one of its dimensions) a range of all possible values and then define a scale and a distribution center to map float numbers to 8 bits integers.\n",
+    "The process is well described in this [Nvidia presentation](https://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf).\n",
+    "\n",
+    "There are several ways to perform quantization, depending of how and when the `calibration` is performed:\n",
+    "\n",
+    "* dynamically: the mapping is done during the inference, there are some overhead but it's easy to put in place and usually the accuracy is preserved,\n",
+    "* statically, after training (`post training quantization` or `PTQ`): this way is efficient, but it may have a significant accuracy cost,\n",
+    "* statically, before training (`quantization aware training` or `QAT`): this way is efficient and has a low accuracy cost as the weights will take care of the result\n",
+    "\n",
+    "In this guide we will focus on the third option: `QAT`.\n",
+    "\n",
+    "During the quantization aware *training*:\n",
+    "\n",
+    "* in the inside, Pytorch will train with high precision float numbers,\n",
+    "* on the outside, Pytorch will simulate that a quantization has already been applied and output results accordingly (for loss computation for instance)\n",
+    "\n",
+    "The simulation process is done through the add of quantization / dequantization nodes, most often called `QDQ`, it's an abbreviation you will see often in quantization world.\n",
+    "\n",
+    "You can check this [high quality blog post](https://leimao.github.io/article/Neural-Networks-Quantization/) for more information.\n",
+    "\n",
+    "## Why a dedicated tutorial?\n",
+    "\n",
+    "CPU quantization is supported out of the box by `Pytorch` and `ONNX Runtime`.\n",
+    "**GPU quantization on the other side requires specific tools and process to be applied**.\n",
+    "\n",
+    "In the specific case of `transformer` models, until recently (december 2021), the only way shown by Nvidia is to build manually the graph of our models in `TensorRT`. This is a low level approach, based on GPU capacity knowledge (which operators are supported, etc.). It's certainly out of reach of most NLP practitioners and is very time consuming to update/adapt to new architectures.\n",
+    "\n",
+    "Hopefully, Nvidia added to Hugging Face `transformer` library a new model called `QDQBert` few weeks ago.\n",
+    "Basically, it's a vanilla `Bert` architecture which supports INT-8 quantization.\n",
+    "It doesn't support any other architecture out of the box, like `Albert`, `Roberta`, or `Electra`.\n",
+    "Nvidia also provide a demo dedicated to the SQuaD task.\n",
+    "\n",
+    "This open the door to extension of the approach to other architectures.\n",
+    "\n",
+    "To be both simple and cover most use cases, in this tutorial we will see:\n",
+    "\n",
+    "* how to perform GPU quantization on **any** transformer model (not just Bert) using a simple trick, a `transplatation`\n",
+    "* how to perform GPU quantization on `QDQRoberta`, a custom model similar to `QDQBert` and supported by `transformer-deploy` library\n",
+    "* how to apply quantization to a common task like classification (which is easier to understand than question answering)\n",
+    "* measure performance gain (latency)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Project setup\n",
+    "\n",
+    "### Dependencies installation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We install `master` branch of `transfomers` library to use a new model: **QDQBert** and `transformer-deploy` to leverage `TensorRT` models (TensorRT API is not something simple to master, it's highly advised to use a wrapper). Your machine should have Nvidia CUDA 11.X, TensorRT 8.2.1 and cuBLAS installed. It's said to be tricky to install, in my experience, just follow Nvidia instructions **and nothing else**, it should work out of the box. Docker image with TensorRT 8.2.1 has not yet been released, this tuto will be updated when it's ready."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "id": "MOsHUjgdIrIW"
+   },
+   "outputs": [],
+   "source": [
+    "#! pip install git+https://github.com/huggingface/transformers\n",
+    "#! pip install git+https://github.com/ELS-RD/transformer-deploy\n",
+    "#! pip install sklearn datasets\n",
+    "#! pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com\n",
+    "# or install pytorch-quantization from https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Check the GPU is enabled and usable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "id": "2OzrD4f-3ydk",
+    "outputId": "54cc2ea6-6969-4e01-f9f9-78c5fc91ff85"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Wed Dec  8 07:41:28 2021       \r\n",
+      "+-----------------------------------------------------------------------------+\r\n",
+      "| NVIDIA-SMI 495.29.05    Driver Version: 495.29.05    CUDA Version: 11.5     |\r\n",
+      "|-------------------------------+----------------------+----------------------+\r\n",
+      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\r\n",
+      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\r\n",
+      "|                               |                      |               MIG M. |\r\n",
+      "|===============================+======================+======================|\r\n",
+      "|   0  NVIDIA GeForce ...  On   | 00000000:03:00.0  On |                  N/A |\r\n",
+      "| 30%   40C    P8    37W / 350W |    499MiB / 24267MiB |      0%      Default |\r\n",
+      "|                               |                      |                  N/A |\r\n",
+      "+-------------------------------+----------------------+----------------------+\r\n",
+      "                                                                               \r\n",
+      "+-----------------------------------------------------------------------------+\r\n",
+      "| Processes:                                                                  |\r\n",
+      "|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |\r\n",
+      "|        ID   ID                                                   Usage      |\r\n",
+      "|=============================================================================|\r\n",
+      "|    0   N/A  N/A      1632      G   /usr/lib/xorg/Xorg                119MiB |\r\n",
+      "|    0   N/A  N/A      7547      G   /usr/bin/gnome-shell               37MiB |\r\n",
+      "|    0   N/A  N/A     23797      G   ..._12759.log --shared-files       16MiB |\r\n",
+      "|    0   N/A  N/A     23894      G   ...AAAAAAAAA= --shared-files       69MiB |\r\n",
+      "|    0   N/A  N/A    291688      C   ...st_transformer/bin/python      251MiB |\r\n",
+      "+-----------------------------------------------------------------------------+\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "! nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 1000,
+     "referenced_widgets": [
+      "ac14ba24dcf3404db9fd303dbb24d7a5",
+      "4e91efae49b64f038fd3fbfcfd2be510",
+      "17b83e0d0fb947d7bf20319ff930e8fc",
+      "1da1d80871f545bbb21bf5a84d2120a0",
+      "c593f2e45e244637821cc5721788bf2c",
+      "cbbb20b5d01a4450bfb8dfbf8048d64f",
+      "854cfd13416543fba8221093b903658b",
+      "7ec6da801d0d45c4bb80eeab5518e124",
+      "8585eab4b3fe4992bd7e7c4596e2483b",
+      "990482eebca2424bb5ecbd114007e02c",
+      "c92a19dfa84142af91522bc22f21fca6",
+      "78601982b0e04b80adaa502db2ef685a",
+      "167874df55014291be95cd390b1e60d3",
+      "d6426fea2eda41dd9a31cb3f35b0877e",
+      "163146c2f23440bcbf782116a35b5684",
+      "0dab554959dc44b3b313ee8ae91ca88d",
+      "f651eecbb6d44c24820cf6fe5ab92e7b",
+      "a51b461c062f4636bfa4b48823d0709b",
+      "cced5f1cccc2400a8fbfd7a6eaedc666",
+      "cf9597523c024514b9b3e66bc77e3fa8",
+      "f01fdef82047471e8c1b780cae5379cc",
+      "e1f08cf954ae4aea818c90d893486c77"
+     ]
+    },
+    "id": "KPMoLPBn_1vN",
+    "outputId": "58dca4e7-fc5c-4fd1-a8d4-755aa1e956cb"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from tqdm.notebook import tqdm\n",
+    "import transformers\n",
+    "import datasets\n",
+    "from typing import OrderedDict as OD, List, Dict, Union\n",
+    "import torch\n",
+    "from torch import Tensor\n",
+    "from transformers import (\n",
+    "    AutoModelForSequenceClassification,\n",
+    "    PreTrainedModel,\n",
+    "    QDQBertForSequenceClassification,\n",
+    "    BertForSequenceClassification,\n",
+    "    TrainingArguments,\n",
+    "    Trainer,\n",
+    "    IntervalStrategy,\n",
+    "    AutoTokenizer,\n",
+    "    PreTrainedTokenizer,\n",
+    ")\n",
+    "from datasets import load_dataset, load_metric\n",
+    "from transformer_deploy.QDQModels.QDQRoberta import QDQRobertaForSequenceClassification\n",
+    "import pytorch_quantization.nn as quant_nn\n",
+    "from pytorch_quantization.tensor_quant import QuantDescriptor\n",
+    "from pytorch_quantization import calib\n",
+    "import logging\n",
+    "from datasets import DatasetDict\n",
+    "from transformer_deploy.backends.trt_utils import build_engine, get_binding_idxs, infer_tensorrt, load_engine\n",
+    "from transformer_deploy.backends.ort_utils import convert_to_onnx\n",
+    "from collections import OrderedDict\n",
+    "from transformer_deploy.benchmarks.utils import track_infer_time, print_timings\n",
+    "from pycuda._driver import Stream\n",
+    "import tensorrt as trt\n",
+    "from tensorrt.tensorrt import IExecutionContext, Logger, Runtime\n",
+    "import pycuda.autoinit"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Set logging to `error` to make the `notebook` more readable on Github."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "log_level = logging.ERROR\n",
+    "logging.getLogger().setLevel(log_level)\n",
+    "datasets.utils.logging.set_verbosity(log_level)\n",
+    "transformers.utils.logging.set_verbosity(log_level)\n",
+    "transformers.utils.logging.enable_default_handler()\n",
+    "transformers.utils.logging.enable_explicit_format()\n",
+    "trt_logger: Logger = trt.Logger(trt.Logger.ERROR)\n",
+    "transformers.logging.set_verbosity_error()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "rEJBSTyZIrIb"
+   },
+   "source": [
+    "### Download data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This part is inspired from an [official Notebooks from Hugging Face](https://github.com/huggingface/notebooks/blob/master/examples/text_classification.ipynb)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "id": "zVvslsfMIrIh"
+   },
+   "outputs": [],
+   "source": [
+    "task = \"mnli\"\n",
+    "num_labels = 3\n",
+    "model_checkpoint = \"roberta-base\"\n",
+    "batch_size = 32\n",
+    "max_seq_len = 256\n",
+    "validation_key = \"validation_matched\"\n",
+    "timings: Dict[str, List[float]] = dict()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "W7QYTpxXIrIl"
+   },
+   "source": [
+    "We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "id": "IreSlFmlIrIm"
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "157435bd8610413f83c3bf7bdff3fb5d",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/5 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "DatasetDict({\n",
+       "    train: Dataset({\n",
+       "        features: ['premise', 'hypothesis', 'label', 'idx'],\n",
+       "        num_rows: 392702\n",
+       "    })\n",
+       "    validation_matched: Dataset({\n",
+       "        features: ['premise', 'hypothesis', 'label', 'idx'],\n",
+       "        num_rows: 9815\n",
+       "    })\n",
+       "    validation_mismatched: Dataset({\n",
+       "        features: ['premise', 'hypothesis', 'label', 'idx'],\n",
+       "        num_rows: 9832\n",
+       "    })\n",
+       "    test_matched: Dataset({\n",
+       "        features: ['premise', 'hypothesis', 'label', 'idx'],\n",
+       "        num_rows: 9796\n",
+       "    })\n",
+       "    test_mismatched: Dataset({\n",
+       "        features: ['premise', 'hypothesis', 'label', 'idx'],\n",
+       "        num_rows: 9847\n",
+       "    })\n",
+       "})"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset = load_dataset(\"glue\", task)\n",
+    "metric = load_metric(\"glue\", task)\n",
+    "dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "n9qywopnIrJH"
+   },
+   "source": [
+    "### Preprocessing the data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "YVx71GdAIrJH"
+   },
+   "source": [
+    "Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 Transformers `Tokenizer` which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "referenced_widgets": [
+      "b6be028de2ae4ff691538eedb33793af",
+      "a3e2c73d393d4e58a371f3da3dd80e6d",
+      "b4d3f284fc4c4061b58d43a738f9bc78",
+      "8a11c8fed672470b8335dc575a4a220e",
+      "08286a6371584b4186014ecb5d5f164d",
+      "68c4c867096d41a78740fdee30edcadb",
+      "7d520bdde27742abb42803843721d101",
+      "f8a0053903c64e75ac25eab5b24d5871",
+      "93dbcc6d23a743bab0da8af6ee5e2825",
+      "d1ecc3d380fc4758b03190b23686a2f1",
+      "2d3a08166846438db79b0f89314fe76a",
+      "5e2185bd6e4f4a10b89ac606868a43bd",
+      "f44d2beebfe44186b0ac8016e89e4b49",
+      "2eac6b4817e14d7fae396e6458b940fa",
+      "af16284f77594397a69ad0e322b5e736",
+      "a20579a9e7364fb485d79bdc4feb54dc",
+      "cae29b9c6d45412fab70977fcd0f3234",
+      "927ad6ade85a402594074fa90ab558c2",
+      "30646fa2c0dc494e9dbcbd4dc598410e",
+      "7a75099f99054645bf3fc1b778dac7e6",
+      "d5d015711ae04d2f801577fc50af6c15",
+      "4b13c3b3435f4689b29d48e0a35bebd6",
+      "be4affe852b348de8fe1362582b08da9",
+      "c6c100b71f26405fb960598feb5eee03",
+      "99e94791043b4499b06601f7524f9b14",
+      "26bc2038bed74279813ab5af09a2724c",
+      "9bc6e14b912249e3b7d02f31bcc74667",
+      "196ffc99ad5a40109d9b1cfe12032b62",
+      "d5c8ff9e3bd849059fa7b30eab5fc940",
+      "7ff32d18c9f0473893a6a6b2941c54b0",
+      "0022faf286b44e858e638ccd5ded38b0",
+      "6e54ce781ca54ad283911fa4774e3361",
+      "969b6fdac1d6418d89a683db1e6ec6b2",
+      "092db03992f24951b494fbb81da5b9d6",
+      "023900ca566446eab5905b25b16a3de7",
+      "994cf2338c7c4899952e25723445693c",
+      "6aa2f5d46f1f454198d8e69517549ff1",
+      "72b8f11065254e5ca488cd346b5add54",
+      "c7bd52ef524c4d279dfcaa3aebe4a2c5",
+      "d9a0852554284d36b6b121f579b06b41",
+      "4320b12de9d14c459cc88319e2d7622a",
+      "7b483d17d1d14fdd922600f0c906fc2f",
+      "14648b8262944f5faac134a7c0184e47",
+      "10678736bd534c63aebda414da01b4db"
+     ]
+    },
+    "id": "eXNLu_-nIrJI",
+    "outputId": "10b2f739-6277-44c2-fd31-0de3a9ab9fa8"
+   },
+   "outputs": [],
+   "source": [
+    "tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "2C0hcmp9IrJQ"
+   },
+   "source": [
+    "We can them write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True` and `padding=\"max_length\"`. This will ensure that all sequences have the same size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "id": "vc0BSBLIIrJQ"
+   },
+   "outputs": [],
+   "source": [
+    "def preprocess_function(examples):\n",
+    "    return tokenizer(\n",
+    "        examples[\"premise\"], examples[\"hypothesis\"], truncation=True, padding=\"max_length\", max_length=max_seq_len\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "referenced_widgets": [
+      "65017db07d7f4e798ede741cc92488f0",
+      "6fa74604c68543a38392fa0e1587f707",
+      "86cc326e574a4fada7224e6f0c209e9a",
+      "af5b646f89024c139c695a1f058fb772",
+      "37cda4cae81a4d94aa831fb40b5c3b26",
+      "28b7346a9b8c4b198dd9dbea1be013b6",
+      "561b1ede331a40c1a2bff9422e8eea0e",
+      "aecf7f063234416abf3f24766481cb89",
+      "21ef195fa88f49c4a2c057f8028177a2",
+      "5b1ad9f5d02c4b298a02ce6041692057",
+      "56fd7584b0844590936519ec3851922e",
+      "bbe3a471efb04ea8b5aabc4be819d585",
+      "59418bbeb20547e5b5e1a5728262c757",
+      "a61d366d91c34697a55f62b754e1f3a5",
+      "1bea379404df429b9852b62a938661ae",
+      "c801e1727de44b67aa7cb1c3d970e1fe",
+      "b8722dc10d4447fe9630cbf169260cc8",
+      "a9b98fd93fcd4fc4a2b2aa88c82835d0",
+      "300f01e3547648f3983a83d3d3118c54",
+      "a4c444f06c0847c09a44917084d3908d",
+      "7c875ecd9cb54405a6c45969bcb4b4c6",
+      "4552ee8ca6bd4a0b956651cc23f4ff3c",
+      "3bfff454943b4b04a12ec29bbe28e0aa",
+      "154200a8bc0b44fe8d0419fd56c6539d",
+      "cedca6e55b84443e82f3d01471d61048",
+      "a7d355f456eb4d3995dd91c5917a72c1",
+      "b264b220d9c444bd9da46a7e6c8fd5ed",
+      "4fae966b76844c869cdea1e53891e26f",
+      "a0a2918e9772475cac51124b3b83fcaf",
+      "a02624219ee84f50b1a3032eaa030a39",
+      "5f032f56105f463a8680aa2482d0b162",
+      "7701ec898fd443f1b35b187aea3651e9",
+      "8399339998564d21ba5db6f0514c02c6"
+     ]
+    },
+    "id": "DDtsaJeVIrJT",
+    "outputId": "0eeb1cb2-e308-493b-807e-532eeae5f4fe",
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "encoded_dataset = dataset.map(preprocess_function, batched=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "Some functions required for training and exporting the model:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def compute_metrics(eval_pred):\n",
+    "    predictions, labels = eval_pred\n",
+    "    if task != \"stsb\":\n",
+    "        predictions = np.argmax(predictions, axis=1)\n",
+    "    else:\n",
+    "        predictions = predictions[:, 0]\n",
+    "    return metric.compute(predictions=predictions, references=labels)\n",
+    "\n",
+    "\n",
+    "def calibrate(model: PreTrainedModel, encoded_dataset: DatasetDict, nb_sample: int = 128) -> PreTrainedModel:\n",
+    "    # Find the TensorQuantizer and enable calibration\n",
+    "    for name, module in model.named_modules():\n",
+    "        if isinstance(module, quant_nn.TensorQuantizer):\n",
+    "            if module._calibrator is not None:\n",
+    "                module.disable_quant()\n",
+    "                module.enable_calib()\n",
+    "            else:\n",
+    "                module.disable()\n",
+    "\n",
+    "    with torch.no_grad():\n",
+    "        for start_index in tqdm(range(0, nb_sample, batch_size)):\n",
+    "            end_index = start_index + batch_size\n",
+    "            data = encoded_dataset[\"train\"][start_index:end_index]\n",
+    "            input_torch = {\n",
+    "                k: torch.tensor(v, dtype=torch.long, device=\"cpu\")\n",
+    "                for k, v in data.items()\n",
+    "                if k in [\"input_ids\", \"attention_mask\", \"token_type_ids\"]\n",
+    "            }\n",
+    "            model(**input_torch)\n",
+    "\n",
+    "    # Finalize calibration\n",
+    "    for name, module in model.named_modules():\n",
+    "        if isinstance(module, quant_nn.TensorQuantizer):\n",
+    "            if module._calibrator is not None:\n",
+    "                if isinstance(module._calibrator, calib.MaxCalibrator):\n",
+    "                    module.load_calib_amax()\n",
+    "                else:\n",
+    "                    module.load_calib_amax(\"percentile\", percentile=99.99)\n",
+    "                module.enable_quant()\n",
+    "                module.disable_calib()\n",
+    "            else:\n",
+    "                module.enable()\n",
+    "\n",
+    "    model.cuda()\n",
+    "    return model\n",
+    "\n",
+    "\n",
+    "def convert_tensor(data: OD[str, List[List[int]]], output: str) -> OD[str, Union[np.ndarray, torch.Tensor]]:\n",
+    "    input: OD[str, Union[np.ndarray, torch.Tensor]] = OrderedDict()\n",
+    "    for k in [\"input_ids\", \"attention_mask\", \"token_type_ids\"]:\n",
+    "        if k in data:\n",
+    "            v = data[k]\n",
+    "            if output == \"torch\":\n",
+    "                value = torch.tensor(v, dtype=torch.long, device=\"cuda\")\n",
+    "            elif output == \"np\":\n",
+    "                value = np.asarray(v, dtype=np.int32)\n",
+    "            else:\n",
+    "                raise Exception(f\"unknown output type: {output}\")\n",
+    "            input[k] = value\n",
+    "    return input"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Some `TensorRT` reused variables:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "runtime: Runtime = trt.Runtime(trt_logger)\n",
+    "profile_index = 0"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Fine-tuning model\n",
+    "\n",
+    "Now that our data are ready, we can download the pretrained model and fine-tune it.\n",
+    "\n",
+    "Default parameters to be used for the training:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nb_step = 1000\n",
+    "strategy = IntervalStrategy.STEPS\n",
+    "args = TrainingArguments(\n",
+    "    f\"{model_checkpoint}-{task}\",\n",
+    "    evaluation_strategy=strategy,\n",
+    "    eval_steps=nb_step,\n",
+    "    logging_steps=nb_step,\n",
+    "    save_steps=nb_step,\n",
+    "    save_strategy=strategy,\n",
+    "    learning_rate=1e-5,\n",
+    "    per_device_train_batch_size=batch_size,\n",
+    "    per_device_eval_batch_size=batch_size * 2,\n",
+    "    num_train_epochs=1,\n",
+    "    fp16=True,\n",
+    "    group_by_length=True,\n",
+    "    weight_decay=0.01,\n",
+    "    load_best_model_at_end=True,\n",
+    "    metric_for_best_model=\"accuracy\",\n",
+    "    report_to=[],\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Method 1: `Transplantation` of weights from a source model to an optimized architecture\n",
+    "\n",
+    "Transplantation idea is to export weights from one model and use them in another one.\n",
+    "In our case, the source are `Roberta` weights and the target is `Bert` archtecture which is highly optimized on `TensorRT` for GPU quantization.\n",
+    "\n",
+    "Indeed, not all models are quantization compliant. The optimization engine (`TensorRT`) search for some patterns and will fail to opimize the model if it doesn't find them. It requires the Pytorch code to be written in a certain way and use certain operations. For that reason, it's a good idea to reuse an architecture highly optimized.\n",
+    "\n",
+    "We will leverage the fact that since `Bert` have been released, very few improvements have been brought to the transformer architecture (at least for encoder only models).\n",
+    "Better models appeared, and most of the work has been done to improve the pretraining step (aka the weights).\n",
+    "So the idea will be to take the weights from those new models and put them inside `Bert` architecture.\n",
+    "\n",
+    "The process described below should work for most architectures.\n",
+    "\n",
+    "**steps**:\n",
+    "\n",
+    "* load `Bert` model\n",
+    "* retrieve layer/weight names\n",
+    "* load target model (here `Roberta`)\n",
+    "* replace weight/layer names with those from `Roberta`\n",
+    "* override the architecture name in model configuration\n",
+    "\n",
+    "If there is no 1 to 1 correspondance (it happens), try to keep at least token embeddings and self attention. Of course, it's possible that if a model is very different, the transplant may cost some accuracy. In our experience, if your trainset is big enough it should not happen.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "model_bert: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(\n",
+    "    \"bert-base-uncased\", num_labels=num_labels\n",
+    ")\n",
+    "bert_keys = list(model_bert.state_dict().keys())\n",
+    "del model_bert\n",
+    "\n",
+    "model_roberta: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(\n",
+    "    model_checkpoint, num_labels=num_labels\n",
+    ")\n",
+    "model_roberta.save_pretrained(\"roberta-in-bert\")\n",
+    "del model_roberta\n",
+    "model_weights: OD[str, Tensor] = torch.load(\"roberta-in-bert/pytorch_model.bin\")\n",
+    "\n",
+    "# Roberta -> Bert, there is 1 to 1 correspondance, for other models, you may need to create your own mapping.\n",
+    "for bert_key in bert_keys:\n",
+    "    # pop remove the first weights from the Ordered dict ...\n",
+    "    _, weight = model_weights.popitem(last=False)\n",
+    "    # ... and we re-insert them, in order, with a new key\n",
+    "    model_weights[bert_key] = weight\n",
+    "\n",
+    "# we re-export the weights\n",
+    "torch.save(model_weights, \"roberta-in-bert/pytorch_model.bin\")\n",
+    "del model_weights"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We override the architecture name to make `transformers` believe it is `Bert`..."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# =====> change architecture to bert base <======\n",
+    "import json\n",
+    "\n",
+    "with open(\"roberta-in-bert/config.json\") as f:\n",
+    "    content = json.load(f)\n",
+    "    content[\"architectures\"] = [\"bert\"]\n",
+    "\n",
+    "with open(\"roberta-in-bert/config.json\", mode=\"w\") as f:\n",
+    "    json.dump(content, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Model training\n",
+    "\n",
+    "The goal is to update weights to the new architecture, not to get the best score.\n",
+    "For instance, position embeddings are not managed the same way on Bert and Roberta. We need to relearn those parts."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[INFO|trainer.py:437] 2021-12-08 07:41:50,834 >> Using amp half precision backend\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'loss': 0.7658, 'learning_rate': 9.1875814863103e-06, 'epoch': 0.08}\n",
+      "{'eval_loss': 0.5338948369026184, 'eval_accuracy': 0.7948038716250637, 'eval_runtime': 18.3625, 'eval_samples_per_second': 534.514, 'eval_steps_per_second': 8.387, 'epoch': 0.08}\n",
+      "{'loss': 0.5566, 'learning_rate': 8.372718383311604e-06, 'epoch': 0.16}\n",
+      "{'eval_loss': 0.4757803678512573, 'eval_accuracy': 0.8167091186958737, 'eval_runtime': 18.39, 'eval_samples_per_second': 533.713, 'eval_steps_per_second': 8.374, 'epoch': 0.16}\n",
+      "{'loss': 0.5135, 'learning_rate': 7.557855280312908e-06, 'epoch': 0.24}\n",
+      "{'eval_loss': 0.46861791610717773, 'eval_accuracy': 0.8164034640855833, 'eval_runtime': 18.4131, 'eval_samples_per_second': 533.044, 'eval_steps_per_second': 8.364, 'epoch': 0.24}\n",
+      "{'loss': 0.4868, 'learning_rate': 6.743807040417211e-06, 'epoch': 0.33}\n",
+      "{'eval_loss': 0.4253948926925659, 'eval_accuracy': 0.8351502801833928, 'eval_runtime': 18.4305, 'eval_samples_per_second': 532.543, 'eval_steps_per_second': 8.356, 'epoch': 0.33}\n",
+      "{'loss': 0.4669, 'learning_rate': 5.9289439374185145e-06, 'epoch': 0.41}\n",
+      "{'eval_loss': 0.4190593957901001, 'eval_accuracy': 0.8383087111563933, 'eval_runtime': 18.4268, 'eval_samples_per_second': 532.649, 'eval_steps_per_second': 8.357, 'epoch': 0.41}\n",
+      "{'loss': 0.4544, 'learning_rate': 5.114080834419818e-06, 'epoch': 0.49}\n",
+      "{'eval_loss': 0.4306202828884125, 'eval_accuracy': 0.8335201222618441, 'eval_runtime': 18.4565, 'eval_samples_per_second': 531.792, 'eval_steps_per_second': 8.344, 'epoch': 0.49}\n",
+      "{'loss': 0.4542, 'learning_rate': 4.30003259452412e-06, 'epoch': 0.57}\n",
+      "{'eval_loss': 0.40120720863342285, 'eval_accuracy': 0.844727457972491, 'eval_runtime': 18.4367, 'eval_samples_per_second': 532.362, 'eval_steps_per_second': 8.353, 'epoch': 0.57}\n",
+      "{'loss': 0.4427, 'learning_rate': 3.4851694915254244e-06, 'epoch': 0.65}\n",
+      "{'eval_loss': 0.3936639130115509, 'eval_accuracy': 0.8454406520631687, 'eval_runtime': 18.4345, 'eval_samples_per_second': 532.425, 'eval_steps_per_second': 8.354, 'epoch': 0.65}\n",
+      "{'loss': 0.4369, 'learning_rate': 2.670306388526728e-06, 'epoch': 0.73}\n",
+      "{'eval_loss': 0.3961443305015564, 'eval_accuracy': 0.8489047376464595, 'eval_runtime': 18.4534, 'eval_samples_per_second': 531.879, 'eval_steps_per_second': 8.345, 'epoch': 0.73}\n",
+      "{'loss': 0.4257, 'learning_rate': 1.8554432855280313e-06, 'epoch': 0.81}\n",
+      "{'eval_loss': 0.39044129848480225, 'eval_accuracy': 0.8509424350483953, 'eval_runtime': 18.4536, 'eval_samples_per_second': 531.876, 'eval_steps_per_second': 8.345, 'epoch': 0.81}\n",
+      "{'loss': 0.4285, 'learning_rate': 1.0413950456323338e-06, 'epoch': 0.9}\n",
+      "{'eval_loss': 0.38357552886009216, 'eval_accuracy': 0.8525725929699439, 'eval_runtime': 18.4857, 'eval_samples_per_second': 530.952, 'eval_steps_per_second': 8.331, 'epoch': 0.9}\n",
+      "{'loss': 0.4278, 'learning_rate': 2.265319426336376e-07, 'epoch': 0.98}\n",
+      "{'eval_loss': 0.3847087025642395, 'eval_accuracy': 0.8522669383596536, 'eval_runtime': 18.4593, 'eval_samples_per_second': 531.711, 'eval_steps_per_second': 8.343, 'epoch': 0.98}\n",
+      "{'train_runtime': 2604.6513, 'train_samples_per_second': 150.77, 'train_steps_per_second': 4.712, 'train_loss': 0.48698368594730385, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.38357552886009216, 'eval_accuracy': 0.8525725929699439, 'eval_runtime': 18.4563, 'eval_samples_per_second': 531.796, 'eval_steps_per_second': 8.344, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.38357552886009216, 'eval_accuracy': 0.8525725929699439, 'eval_runtime': 18.4563, 'eval_samples_per_second': 531.796, 'eval_steps_per_second': 8.344, 'epoch': 1.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "transformers.logging.set_verbosity_error()\n",
+    "model_bert = BertForSequenceClassification.from_pretrained(\"roberta-in-bert\", num_labels=num_labels)\n",
+    "model_bert = model_bert.cuda()\n",
+    "\n",
+    "trainer = Trainer(\n",
+    "    model_bert,\n",
+    "    args,\n",
+    "    train_dataset=encoded_dataset[\"train\"],\n",
+    "    eval_dataset=encoded_dataset[validation_key],\n",
+    "    tokenizer=tokenizer,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "transformers.logging.set_verbosity_error()\n",
+    "trainer.train()\n",
+    "print(trainer.evaluate())\n",
+    "model_bert.save_pretrained(\"roberta-in-bert-trained\")\n",
+    "del trainer\n",
+    "del model_bert"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Quantization"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Below we will start the quantization process.\n",
+    "It follow those steps:\n",
+    "\n",
+    "* perform the calibration\n",
+    "* perform a quantization aware training\n",
+    "\n",
+    "By passing validation values to the model, we will calibrate it, meaning it will get the right range / scale to convert FP32 weights to int-8 ones."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Calibration"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Activate histogram calibration\n",
+    "\n",
+    "There are several kinds of calbrators, below we use the percentile one (99.99p) (`histogram`), basically, its purpose is to just remove the most extreme values before computing range / scale.\n",
+    "The other option is `max`, it's much faster but expect lower accuracy.\n",
+    "\n",
+    "Second calibration option, choose between calibration done at the tensor level or per channel (fine grained, slower)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# you can also use \"max\" instead of \"historgram\"\n",
+    "input_desc = QuantDescriptor(num_bits=8, calib_method=\"histogram\")\n",
+    "# below we do per-channel quantization for weights, set axis to None to get a per tensor calibration\n",
+    "weight_desc = QuantDescriptor(num_bits=8, axis=(0,))\n",
+    "quant_nn.QuantLinear.set_default_quant_desc_input(input_desc)\n",
+    "quant_nn.QuantLinear.set_default_quant_desc_weight(weight_desc)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Perform calibration\n",
+    "\n",
+    "During this step we will enable the calibration nodes, and pass some representative data to the model.\n",
+    "It will then be used to compute the scale/range.\n",
+    "\n",
+    "Official recommendations from Nvidia is to calibrate over thousands of examples from the validation set.\n",
+    "Here we use 128 examples because it's a slow process. It's enough to be close from the original accuracy."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "946664d7c0684b6e903745802c39fa17",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/4 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "# keep it on CPU\n",
+    "model_q = QDQBertForSequenceClassification.from_pretrained(\"roberta-in-bert-trained\", num_labels=num_labels)\n",
+    "model_q = calibrate(model=model_q, encoded_dataset=encoded_dataset)\n",
+    "model_q.save_pretrained(\"roberta-in-bert-trained-quantized\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Quantization Aware Training (QAT)\n",
+    "\n",
+    "The query aware training is not a mandatory step, but **highly** recommended to get the best accuracy. Basically we will redo the training with the quantization enabled and a low learning rate to avoid overfitting."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "id": "imY1oC3SIrJf"
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[INFO|trainer.py:437] 2021-12-08 08:48:40,176 >> Using amp half precision backend\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'eval_loss': 0.43096092343330383, 'eval_accuracy': 0.8348446255731024, 'eval_runtime': 46.2449, 'eval_samples_per_second': 212.24, 'eval_steps_per_second': 3.33}\n",
+      "{'eval_loss': 0.43096092343330383, 'eval_accuracy': 0.8348446255731024, 'eval_runtime': 46.2449, 'eval_samples_per_second': 212.24, 'eval_steps_per_second': 3.33}\n",
+      "{'loss': 0.4542, 'learning_rate': 9.187581486310299e-07, 'epoch': 0.08}\n",
+      "{'eval_loss': 0.4320202171802521, 'eval_accuracy': 0.8392256749872644, 'eval_runtime': 47.5223, 'eval_samples_per_second': 206.535, 'eval_steps_per_second': 3.241, 'epoch': 0.08}\n",
+      "{'loss': 0.4439, 'learning_rate': 8.372718383311604e-07, 'epoch': 0.16}\n",
+      "{'eval_loss': 0.4244120717048645, 'eval_accuracy': 0.8415690269994905, 'eval_runtime': 46.9517, 'eval_samples_per_second': 209.045, 'eval_steps_per_second': 3.28, 'epoch': 0.16}\n",
+      "{'loss': 0.4323, 'learning_rate': 7.557855280312907e-07, 'epoch': 0.24}\n",
+      "{'eval_loss': 0.4180322289466858, 'eval_accuracy': 0.8435048395313296, 'eval_runtime': 46.8629, 'eval_samples_per_second': 209.441, 'eval_steps_per_second': 3.286, 'epoch': 0.24}\n",
+      "{'loss': 0.4254, 'learning_rate': 6.74380704041721e-07, 'epoch': 0.33}\n",
+      "{'eval_loss': 0.42280977964401245, 'eval_accuracy': 0.8436067244014264, 'eval_runtime': 46.8872, 'eval_samples_per_second': 209.332, 'eval_steps_per_second': 3.284, 'epoch': 0.33}\n",
+      "{'loss': 0.4285, 'learning_rate': 5.928943937418513e-07, 'epoch': 0.41}\n",
+      "{'eval_loss': 0.416576623916626, 'eval_accuracy': 0.8393275598573612, 'eval_runtime': 48.0341, 'eval_samples_per_second': 204.334, 'eval_steps_per_second': 3.206, 'epoch': 0.41}\n",
+      "{'loss': 0.427, 'learning_rate': 5.114080834419818e-07, 'epoch': 0.49}\n",
+      "{'eval_loss': 0.41878825426101685, 'eval_accuracy': 0.8414671421293938, 'eval_runtime': 48.3193, 'eval_samples_per_second': 203.128, 'eval_steps_per_second': 3.187, 'epoch': 0.49}\n",
+      "{'loss': 0.4207, 'learning_rate': 4.2992177314211206e-07, 'epoch': 0.57}\n",
+      "{'eval_loss': 0.42357301712036133, 'eval_accuracy': 0.8398369842078451, 'eval_runtime': 48.6821, 'eval_samples_per_second': 201.614, 'eval_steps_per_second': 3.163, 'epoch': 0.57}\n",
+      "{'loss': 0.425, 'learning_rate': 3.4859843546284223e-07, 'epoch': 0.65}\n",
+      "{'eval_loss': 0.41158831119537354, 'eval_accuracy': 0.8456444218033622, 'eval_runtime': 48.0513, 'eval_samples_per_second': 204.261, 'eval_steps_per_second': 3.205, 'epoch': 0.65}\n",
+      "{'loss': 0.4283, 'learning_rate': 2.6711212516297263e-07, 'epoch': 0.73}\n",
+      "{'eval_loss': 0.40967991948127747, 'eval_accuracy': 0.8455425369332654, 'eval_runtime': 47.2486, 'eval_samples_per_second': 207.731, 'eval_steps_per_second': 3.259, 'epoch': 0.73}\n",
+      "{'loss': 0.4162, 'learning_rate': 1.85625814863103e-07, 'epoch': 0.81}\n",
+      "{'eval_loss': 0.417491614818573, 'eval_accuracy': 0.844319918492104, 'eval_runtime': 46.8968, 'eval_samples_per_second': 209.289, 'eval_steps_per_second': 3.284, 'epoch': 0.81}\n",
+      "{'loss': 0.4179, 'learning_rate': 1.0422099087353324e-07, 'epoch': 0.9}\n",
+      "{'eval_loss': 0.4117409586906433, 'eval_accuracy': 0.8449312277126847, 'eval_runtime': 50.0029, 'eval_samples_per_second': 196.289, 'eval_steps_per_second': 3.08, 'epoch': 0.9}\n",
+      "{'loss': 0.4201, 'learning_rate': 2.2734680573663624e-08, 'epoch': 0.98}\n",
+      "{'eval_loss': 0.4105292558670044, 'eval_accuracy': 0.8482934284258787, 'eval_runtime': 49.9141, 'eval_samples_per_second': 196.638, 'eval_steps_per_second': 3.085, 'epoch': 0.98}\n",
+      "{'train_runtime': 5124.3924, 'train_samples_per_second': 76.634, 'train_steps_per_second': 2.395, 'train_loss': 0.4281486333426783, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.4105292558670044, 'eval_accuracy': 0.8482934284258787, 'eval_runtime': 51.6384, 'eval_samples_per_second': 190.072, 'eval_steps_per_second': 2.982, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.4105292558670044, 'eval_accuracy': 0.8482934284258787, 'eval_runtime': 51.6384, 'eval_samples_per_second': 190.072, 'eval_steps_per_second': 2.982, 'epoch': 1.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_q = QDQBertForSequenceClassification.from_pretrained(\"roberta-in-bert-trained-quantized\", num_labels=num_labels)\n",
+    "model_q = model_q.cuda()\n",
+    "\n",
+    "args.learning_rate = 1e-6\n",
+    "trainer = Trainer(\n",
+    "    model_q,\n",
+    "    args,\n",
+    "    train_dataset=encoded_dataset[\"train\"],\n",
+    "    eval_dataset=encoded_dataset[validation_key],\n",
+    "    tokenizer=tokenizer,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "transformers.logging.set_verbosity_error()\n",
+    "print(trainer.evaluate())\n",
+    "trainer.train()\n",
+    "print(trainer.evaluate())\n",
+    "model_q.save_pretrained(\"roberta-in-bert-trained-quantized-bis\")\n",
+    "del model_q\n",
+    "del trainer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Benchmark"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Export a `QDQ Pytorch` model on `ONNX`, we need to enable fake quantization mode from Pytorch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = encoded_dataset[\"train\"][0:3]\n",
+    "input_torch = convert_tensor(data, output=\"torch\")\n",
+    "\n",
+    "model_q = QDQBertForSequenceClassification.from_pretrained(\n",
+    "    \"roberta-in-bert-trained-quantized-bis\", num_labels=num_labels\n",
+    ")\n",
+    "model_q = model_q.cuda()\n",
+    "from pytorch_quantization.nn import TensorQuantizer\n",
+    "\n",
+    "TensorQuantizer.use_fb_fake_quant = True\n",
+    "convert_to_onnx(model_q, output_path=\"model_q.onnx\", inputs_pytorch=input_torch, opset=13)\n",
+    "TensorQuantizer.use_fb_fake_quant = False\n",
+    "# del model_q"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "#### Convert `ONNX` graph to `TensorRT` engine"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "engine = build_engine(\n",
+    "    runtime=runtime,\n",
+    "    onnx_file_path=\"model_q.onnx\",\n",
+    "    logger=trt_logger,\n",
+    "    min_shape=(batch_size, max_seq_len),\n",
+    "    optimal_shape=(batch_size, max_seq_len),\n",
+    "    max_shape=(batch_size, max_seq_len),\n",
+    "    workspace_size=10000 * 1024 * 1024,\n",
+    "    fp16=False,\n",
+    "    int8=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    },
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# same thing from command line\n",
+    "# !/usr/src/tensorrt/bin/trtexec --onnx=model_q.onnx --shapes=input_ids:32x256,attention_mask:32x256 --int8 --workspace=10000 --saveEngine=\"test.plan\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "#### Prepare input and output buffer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "stream: Stream = pycuda.driver.Stream()\n",
+    "context: IExecutionContext = engine.create_execution_context()\n",
+    "context.set_optimization_profile_async(profile_index=profile_index, stream_handle=stream.handle)\n",
+    "input_binding_idxs, output_binding_idxs = get_binding_idxs(engine, profile_index)  # type: List[int], List[int]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "data = encoded_dataset[\"train\"][0:batch_size]\n",
+    "input_np: Dict[str, np.ndarray] = convert_tensor(data, output=\"np\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "#### Inference on `TensorRT`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We first check that inference is working correctly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[array([[ 0.34206298,  1.5652132 , -2.3528326 ],\n",
+      "       [ 2.5013878 , -0.81571996, -1.6251811 ],\n",
+      "       [ 1.8918471 , -0.76798105, -1.0148249 ],\n",
+      "       [ 2.0562491 , -0.22451262, -1.8686965 ],\n",
+      "       [ 2.586117  , -0.09310705, -2.4128742 ],\n",
+      "       [ 3.1871881 , -0.38016185, -2.5407064 ],\n",
+      "       [-3.4681158 ,  2.25822   ,  0.37315404],\n",
+      "       [ 3.5095093 , -0.8846639 , -2.5989952 ],\n",
+      "       [-0.17400724, -1.6495969 ,  1.7838944 ],\n",
+      "       [-2.966234  , -1.4364657 ,  4.0166936 ],\n",
+      "       [ 3.275045  , -0.9761375 , -2.1260378 ],\n",
+      "       [-1.35331   , -0.42718923,  1.3907498 ],\n",
+      "       [-2.6201942 ,  2.9925148 , -1.0296444 ],\n",
+      "       [-2.8947299 ,  2.072019  ,  0.1730565 ],\n",
+      "       [ 0.10867599, -0.7385151 ,  0.35388532],\n",
+      "       [ 3.0392425 , -0.94136757, -1.9179116 ],\n",
+      "       [ 3.5692515 , -0.6002568 , -2.7545912 ],\n",
+      "       [-2.6759057 , -1.738315  ,  4.1253285 ],\n",
+      "       [-3.2203894 , -1.2297541 ,  4.019567  ],\n",
+      "       [-2.4096491 ,  3.5356538 , -1.7411288 ],\n",
+      "       [ 3.8419678 , -0.9140588 , -2.8194869 ],\n",
+      "       [ 2.7242563 ,  0.10581933, -2.7189605 ],\n",
+      "       [-2.6767159 ,  0.0738265 ,  1.8019531 ],\n",
+      "       [ 3.4024699 , -0.23903687, -3.2066634 ],\n",
+      "       [ 3.2721906 , -1.4004866 , -1.7683858 ],\n",
+      "       [-1.3776261 ,  0.23932378,  0.65892386],\n",
+      "       [-2.2985775 ,  1.4366189 ,  0.42702717],\n",
+      "       [-2.0242352 ,  2.6943915 , -1.1765195 ],\n",
+      "       [-3.738225  , -0.40719697,  3.6082602 ],\n",
+      "       [ 3.3571942 , -0.5865445 , -2.7262824 ],\n",
+      "       [ 2.5306373 , -0.16031216, -2.4750497 ],\n",
+      "       [ 2.9033797 ,  0.02746576, -2.9880157 ]], dtype=float32)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "tensorrt_output = infer_tensorrt(\n",
+    "    context=context,\n",
+    "    host_inputs=input_np,\n",
+    "    input_binding_idxs=input_binding_idxs,\n",
+    "    output_binding_idxs=output_binding_idxs,\n",
+    "    stream=stream,\n",
+    ")\n",
+    "print(tensorrt_output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We warmup the GPU with few inferences and then start the measures:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[TensorRT (INT-8)] mean=15.42ms, sd=1.35ms, min=14.16ms, max=18.86ms, median=14.58ms, 95p=17.79ms, 99p=18.25ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "for _ in range(30):\n",
+    "    _ = infer_tensorrt(\n",
+    "        context=context,\n",
+    "        host_inputs=input_np,\n",
+    "        input_binding_idxs=input_binding_idxs,\n",
+    "        output_binding_idxs=output_binding_idxs,\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "time_buffer = list()\n",
+    "for _ in range(100):\n",
+    "    with track_infer_time(time_buffer):\n",
+    "        _ = infer_tensorrt(\n",
+    "            context=context,\n",
+    "            host_inputs=input_np,\n",
+    "            input_binding_idxs=input_binding_idxs,\n",
+    "            output_binding_idxs=output_binding_idxs,\n",
+    "            stream=stream,\n",
+    "        )\n",
+    "\n",
+    "print_timings(name=\"TensorRT (INT-8)\", timings=time_buffer)\n",
+    "del engine, context  # delete all tensorrt objects"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "## Method 2: use a dedicated QDQ model\n",
+    "\n",
+    "In method 2, the idea is to take the source code of a specific model and add manually in the source code `QDQ` nodes. That way, quantization will work out of the box for this architecture.\n",
+    "We have started with `QDQRoberta` a quantization compliant `Roberta` model.\n",
+    "\n",
+    "To adapt to another architecture, one need to:\n",
+    "\n",
+    "* replace linear layers with their quantized version\n",
+    "* replace operations not supported out of the box by `TensorRT` by a similar code supporting the operation.\n",
+    "\n",
+    "> concrete examples on `Roberta` architecture: in HF library, there is a `cumsum` in the position embedding generation. Something very simple. It takes as input an integer tensor and output an integer tensor. It happens that the `cumsum` operator from TensorRT supports float but not integer (https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md). It leads to a crash during the model conversion with a strange error message. Converting the input to float tensor fix the issue. Not complex, but requires some knowledge.\n",
+    "\n",
+    "The process below is a bit simpler than the method 1:\n",
+    "\n",
+    "* Calibrate\n",
+    "* Quantization Aware training (QAT)\n",
+    "\n",
+    "> there are many ways to get a QDQ model, you can modify Pytorch source code like here, patch ONNX graph (this approach is used at Microsoft for instance) or leverage the new FX Pytorch interface. Modifying the source code is the most straight forward so we choosed to do it that way.\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "### Calibration"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "7558e2e658444b8a8814a6c14ce3966a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "  0%|          | 0/4 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "input_desc = QuantDescriptor(num_bits=8, calib_method=\"histogram\")\n",
+    "# below we do per-channel quantization for weights, set axis to None to get a per tensor calibration\n",
+    "weight_desc = QuantDescriptor(num_bits=8, axis=(0,))\n",
+    "quant_nn.QuantLinear.set_default_quant_desc_input(input_desc)\n",
+    "quant_nn.QuantLinear.set_default_quant_desc_weight(weight_desc)\n",
+    "\n",
+    "# keep it on CPU\n",
+    "model_roberta: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(\n",
+    "    model_checkpoint, num_labels=num_labels\n",
+    ")\n",
+    "model_roberta.save_pretrained(\"roberta-untrained-quantized\")\n",
+    "del model_roberta\n",
+    "\n",
+    "model_roberta_q: PreTrainedModel = QDQRobertaForSequenceClassification.from_pretrained(\"roberta-untrained-quantized\")\n",
+    "model_roberta_q = calibrate(model=model_roberta_q, encoded_dataset=encoded_dataset)\n",
+    "model_roberta_q.save_pretrained(\"roberta-untrained-quantized\")\n",
+    "del model_roberta_q"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "source": [
+    "### Quantization Aware Training (QAT)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[INFO|trainer.py:437] 2021-12-08 11:40:25,911 >> Using amp half precision backend\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'loss': 0.7745, 'learning_rate': 9.1875814863103e-06, 'epoch': 0.08}\n",
+      "{'eval_loss': 0.5123801827430725, 'eval_accuracy': 0.8002037697401936, 'eval_runtime': 46.8364, 'eval_samples_per_second': 209.559, 'eval_steps_per_second': 3.288, 'epoch': 0.08}\n",
+      "{'loss': 0.5453, 'learning_rate': 8.372718383311604e-06, 'epoch': 0.16}\n",
+      "{'eval_loss': 0.4548088014125824, 'eval_accuracy': 0.8248599083036169, 'eval_runtime': 50.0504, 'eval_samples_per_second': 196.102, 'eval_steps_per_second': 3.077, 'epoch': 0.16}\n",
+      "{'loss': 0.5076, 'learning_rate': 7.558670143415907e-06, 'epoch': 0.24}\n",
+      "{'eval_loss': 0.4582265615463257, 'eval_accuracy': 0.82190524707081, 'eval_runtime': 48.9017, 'eval_samples_per_second': 200.709, 'eval_steps_per_second': 3.149, 'epoch': 0.24}\n",
+      "{'loss': 0.4843, 'learning_rate': 6.743807040417211e-06, 'epoch': 0.33}\n",
+      "{'eval_loss': 0.41166964173316956, 'eval_accuracy': 0.8402445236882323, 'eval_runtime': 47.7718, 'eval_samples_per_second': 205.456, 'eval_steps_per_second': 3.224, 'epoch': 0.33}\n",
+      "{'loss': 0.4668, 'learning_rate': 5.9289439374185145e-06, 'epoch': 0.41}\n",
+      "{'eval_loss': 0.4195743799209595, 'eval_accuracy': 0.8379011716760061, 'eval_runtime': 51.3011, 'eval_samples_per_second': 191.321, 'eval_steps_per_second': 3.002, 'epoch': 0.41}\n",
+      "{'loss': 0.4558, 'learning_rate': 5.114080834419818e-06, 'epoch': 0.49}\n",
+      "{'eval_loss': 0.4104989171028137, 'eval_accuracy': 0.8442180336220071, 'eval_runtime': 48.8771, 'eval_samples_per_second': 200.81, 'eval_steps_per_second': 3.151, 'epoch': 0.49}\n",
+      "{'loss': 0.4504, 'learning_rate': 4.30003259452412e-06, 'epoch': 0.57}\n",
+      "{'eval_loss': 0.38803720474243164, 'eval_accuracy': 0.8504330106979113, 'eval_runtime': 49.1089, 'eval_samples_per_second': 199.862, 'eval_steps_per_second': 3.136, 'epoch': 0.57}\n",
+      "{'loss': 0.4401, 'learning_rate': 3.4851694915254244e-06, 'epoch': 0.65}\n",
+      "{'eval_loss': 0.3891218900680542, 'eval_accuracy': 0.8535914416709118, 'eval_runtime': 49.191, 'eval_samples_per_second': 199.528, 'eval_steps_per_second': 3.131, 'epoch': 0.65}\n",
+      "{'loss': 0.4329, 'learning_rate': 2.670306388526728e-06, 'epoch': 0.73}\n",
+      "{'eval_loss': 0.3848048150539398, 'eval_accuracy': 0.8504330106979113, 'eval_runtime': 47.4583, 'eval_samples_per_second': 206.813, 'eval_steps_per_second': 3.245, 'epoch': 0.73}\n",
+      "{'loss': 0.423, 'learning_rate': 1.8554432855280313e-06, 'epoch': 0.81}\n",
+      "{'eval_loss': 0.3859354257583618, 'eval_accuracy': 0.8538970962812023, 'eval_runtime': 47.4611, 'eval_samples_per_second': 206.801, 'eval_steps_per_second': 3.245, 'epoch': 0.81}\n",
+      "{'loss': 0.4266, 'learning_rate': 1.0413950456323338e-06, 'epoch': 0.9}\n",
+      "{'eval_loss': 0.3780878782272339, 'eval_accuracy': 0.8534895568008151, 'eval_runtime': 48.0721, 'eval_samples_per_second': 204.173, 'eval_steps_per_second': 3.204, 'epoch': 0.9}\n",
+      "{'loss': 0.4272, 'learning_rate': 2.265319426336376e-07, 'epoch': 0.98}\n",
+      "{'eval_loss': 0.37839093804359436, 'eval_accuracy': 0.8561385634233316, 'eval_runtime': 49.087, 'eval_samples_per_second': 199.951, 'eval_steps_per_second': 3.137, 'epoch': 0.98}\n",
+      "{'train_runtime': 5220.9785, 'train_samples_per_second': 75.216, 'train_steps_per_second': 2.351, 'train_loss': 0.4849226055120396, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.37839093804359436, 'eval_accuracy': 0.8561385634233316, 'eval_runtime': 49.1273, 'eval_samples_per_second': 199.787, 'eval_steps_per_second': 3.135, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.37839093804359436, 'eval_accuracy': 0.8561385634233316, 'eval_runtime': 49.1273, 'eval_samples_per_second': 199.787, 'eval_steps_per_second': 3.135, 'epoch': 1.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_roberta_q: PreTrainedModel = QDQRobertaForSequenceClassification.from_pretrained(\n",
+    "    \"roberta-untrained-quantized\", num_labels=num_labels\n",
+    ")\n",
+    "model_roberta_q = model_roberta_q.cuda()\n",
+    "\n",
+    "args.learning_rate = 1e-5\n",
+    "trainer = Trainer(\n",
+    "    model_roberta_q,\n",
+    "    args,\n",
+    "    train_dataset=encoded_dataset[\"train\"],\n",
+    "    eval_dataset=encoded_dataset[validation_key],\n",
+    "    tokenizer=tokenizer,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "transformers.logging.set_verbosity_error()\n",
+    "trainer.train()\n",
+    "print(trainer.evaluate())\n",
+    "model_roberta_q.save_pretrained(\"roberta-trained-quantized\")\n",
+    "del model_roberta_q"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "pycharm": {
+     "name": "#%% md\n"
+    }
+   },
+   "source": [
+    "### Benchmark"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Export a `QDQ Pytorch` model on `ONNX`, we need to enable fake quantization mode from Pytorch."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/pytorch_quantization/nn/modules/tensor_quantizer.py:285: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
+      "  inputs, amax.item() / bound, 0,\n",
+      "/home/geantvert/.local/share/virtualenvs/fast_transformer/lib/python3.9/site-packages/pytorch_quantization/nn/modules/tensor_quantizer.py:291: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!\n",
+      "  quant_dim = list(amax.shape).index(list(amax_sequeeze.shape)[0])\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_roberta_q: PreTrainedModel = QDQRobertaForSequenceClassification.from_pretrained(\n",
+    "    \"roberta-trained-quantized\", num_labels=num_labels\n",
+    ")\n",
+    "model_roberta_q = model_roberta_q.cuda()\n",
+    "\n",
+    "data = encoded_dataset[\"train\"][1:3]\n",
+    "input_torch = convert_tensor(data, output=\"torch\")\n",
+    "\n",
+    "from pytorch_quantization.nn import TensorQuantizer\n",
+    "\n",
+    "TensorQuantizer.use_fb_fake_quant = True\n",
+    "convert_to_onnx(model_pytorch=model_roberta_q, output_path=\"roberta_q.onnx\", inputs_pytorch=input_torch, opset=13)\n",
+    "TensorQuantizer.use_fb_fake_quant = False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Convert `ONNX` graph to `TensorRT` engine"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "engine = build_engine(\n",
+    "    runtime=runtime,\n",
+    "    onnx_file_path=\"roberta_q.onnx\",\n",
+    "    logger=trt_logger,\n",
+    "    min_shape=(batch_size, max_seq_len),\n",
+    "    optimal_shape=(batch_size, max_seq_len),\n",
+    "    max_shape=(batch_size, max_seq_len),\n",
+    "    workspace_size=10000 * 1024 * 1024,\n",
+    "    fp16=False,\n",
+    "    int8=True,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    },
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "# same conversion from the terminal\n",
+    "#!/usr/src/tensorrt/bin/trtexec --onnx=roberta_q.onnx --shapes=input_ids:32x256,attention_mask:32x256 --int8 --workspace=10000 --saveEngine=\"test.plan\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Prepare input and output buffer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "stream: Stream = pycuda.driver.Stream()\n",
+    "context: IExecutionContext = engine.create_execution_context()\n",
+    "context.set_optimization_profile_async(profile_index=profile_index, stream_handle=stream.handle)\n",
+    "input_binding_idxs, output_binding_idxs = get_binding_idxs(engine, profile_index)  # type: List[int], List[int]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "data = encoded_dataset[\"train\"][0:batch_size]\n",
+    "input_torch: OD[str, torch.Tensor] = convert_tensor(data=data, output=\"torch\")\n",
+    "input_np: OD[str, np.ndarray] = convert_tensor(data=data, output=\"np\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Inference on `TensorRT`\n",
+    "\n",
+    "We first check that inference is working correctly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[array([[ 0.00858257,  1.5917815 , -1.8337398 ],\n",
+      "       [ 2.432996  , -1.3068045 , -1.9821789 ],\n",
+      "       [ 1.1561737 , -0.86323494, -1.0034285 ],\n",
+      "       [ 1.5863879 , -0.49799222, -1.7219063 ],\n",
+      "       [ 1.7697937 , -0.11104879, -2.3511643 ],\n",
+      "       [ 3.5160832 , -1.3530374 , -3.0601408 ],\n",
+      "       [-3.4769394 ,  2.0265098 ,  1.874698  ],\n",
+      "       [ 3.3827643 , -1.2117878 , -2.8793433 ],\n",
+      "       [-0.17693216, -1.1394652 ,  0.9083401 ],\n",
+      "       [-2.8701797 , -0.7220555 ,  4.0437098 ],\n",
+      "       [ 3.2363806 , -1.5264729 , -2.39297   ],\n",
+      "       [-2.4144251 , -0.68517655,  3.2756474 ],\n",
+      "       [-2.5281413 ,  2.697305  , -0.10096363],\n",
+      "       [-2.4246836 ,  2.7231753 , -0.41800928],\n",
+      "       [ 0.01045033, -0.68109804,  0.3442644 ],\n",
+      "       [ 2.307869  , -1.3556942 , -1.7211589 ],\n",
+      "       [ 3.5693195 , -1.0019355 , -3.1455066 ],\n",
+      "       [-2.253701  , -1.5583014 ,  4.6081343 ],\n",
+      "       [-2.986448  , -0.8324479 ,  4.4171877 ],\n",
+      "       [-2.3470848 ,  3.5537364 , -1.2475395 ],\n",
+      "       [ 3.5942395 , -1.2296011 , -3.0068402 ],\n",
+      "       [ 3.0203044 , -0.39700866, -3.3843446 ],\n",
+      "       [-2.5756757 , -0.686817  ,  3.5764308 ],\n",
+      "       [ 3.411901  , -1.0631186 , -3.2706409 ],\n",
+      "       [ 3.393027  , -1.42746   , -2.8274863 ],\n",
+      "       [-0.67953676,  0.03448357,  0.46320617],\n",
+      "       [-2.6152198 ,  0.57314056,  2.398291  ],\n",
+      "       [-2.6590538 ,  3.3507993 , -0.73685795],\n",
+      "       [-2.5252337 ,  0.72088015,  2.060882  ],\n",
+      "       [ 2.9799984 , -0.9674468 , -2.915716  ],\n",
+      "       [ 2.9330335 , -1.4430482 , -2.2108274 ],\n",
+      "       [ 3.1044042 , -0.9246039 , -3.1474404 ]], dtype=float32)]\n"
+     ]
+    }
+   ],
+   "source": [
+    "tensorrt_output = infer_tensorrt(\n",
+    "    context=context,\n",
+    "    host_inputs=input_np,\n",
+    "    input_binding_idxs=input_binding_idxs,\n",
+    "    output_binding_idxs=output_binding_idxs,\n",
+    "    stream=stream,\n",
+    ")\n",
+    "print(tensorrt_output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We warmup the GPU with few inferences and then start the measures:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[TensorRT (INT-8)] mean=15.77ms, sd=0.58ms, min=14.85ms, max=17.66ms, median=15.81ms, 95p=16.61ms, 99p=17.50ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "for _ in range(30):\n",
+    "    _ = infer_tensorrt(\n",
+    "        context=context,\n",
+    "        host_inputs=input_np,\n",
+    "        input_binding_idxs=input_binding_idxs,\n",
+    "        output_binding_idxs=output_binding_idxs,\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "time_buffer = list()\n",
+    "for _ in range(100):\n",
+    "    with track_infer_time(time_buffer):\n",
+    "        _ = infer_tensorrt(\n",
+    "            context=context,\n",
+    "            host_inputs=input_np,\n",
+    "            input_binding_idxs=input_binding_idxs,\n",
+    "            output_binding_idxs=output_binding_idxs,\n",
+    "            stream=stream,\n",
+    "        )\n",
+    "\n",
+    "print_timings(name=\"TensorRT (INT-8)\", timings=time_buffer)\n",
+    "del engine, context"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Pytorch baseline\n",
+    "\n",
+    "### Finetuning"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 50,
+   "metadata": {
+    "pycharm": {
+     "name": "#%%\n"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "[INFO|trainer.py:437] 2021-12-08 13:17:01,492 >> Using amp half precision backend\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{'loss': 0.65, 'learning_rate': 9.1875814863103e-06, 'epoch': 0.08}\n",
+      "{'eval_loss': 0.4644513428211212, 'eval_accuracy': 0.8212939378502292, 'eval_runtime': 18.8325, 'eval_samples_per_second': 521.174, 'eval_steps_per_second': 8.177, 'epoch': 0.08}\n",
+      "{'loss': 0.4912, 'learning_rate': 8.372718383311604e-06, 'epoch': 0.16}\n",
+      "{'eval_loss': 0.4196386933326721, 'eval_accuracy': 0.8379011716760061, 'eval_runtime': 19.1574, 'eval_samples_per_second': 512.335, 'eval_steps_per_second': 8.039, 'epoch': 0.16}\n",
+      "{'loss': 0.4631, 'learning_rate': 7.558670143415907e-06, 'epoch': 0.24}\n",
+      "{'eval_loss': 0.42019498348236084, 'eval_accuracy': 0.8382068262862965, 'eval_runtime': 18.5971, 'eval_samples_per_second': 527.772, 'eval_steps_per_second': 8.281, 'epoch': 0.24}\n",
+      "{'loss': 0.4455, 'learning_rate': 6.743807040417211e-06, 'epoch': 0.33}\n",
+      "{'eval_loss': 0.3791417181491852, 'eval_accuracy': 0.8584819154355579, 'eval_runtime': 18.955, 'eval_samples_per_second': 517.804, 'eval_steps_per_second': 8.124, 'epoch': 0.33}\n",
+      "{'loss': 0.4264, 'learning_rate': 5.929758800521513e-06, 'epoch': 0.41}\n",
+      "{'eval_loss': 0.38219019770622253, 'eval_accuracy': 0.8525725929699439, 'eval_runtime': 19.5476, 'eval_samples_per_second': 502.107, 'eval_steps_per_second': 7.878, 'epoch': 0.41}\n",
+      "{'loss': 0.4194, 'learning_rate': 5.1148956975228174e-06, 'epoch': 0.49}\n",
+      "{'eval_loss': 0.38966989517211914, 'eval_accuracy': 0.8525725929699439, 'eval_runtime': 19.41, 'eval_samples_per_second': 505.666, 'eval_steps_per_second': 7.934, 'epoch': 0.49}\n",
+      "{'loss': 0.416, 'learning_rate': 4.30003259452412e-06, 'epoch': 0.57}\n",
+      "{'eval_loss': 0.363924115896225, 'eval_accuracy': 0.8604177279673968, 'eval_runtime': 19.6734, 'eval_samples_per_second': 498.896, 'eval_steps_per_second': 7.828, 'epoch': 0.57}\n",
+      "{'loss': 0.4099, 'learning_rate': 3.4859843546284226e-06, 'epoch': 0.65}\n",
+      "{'eval_loss': 0.3566216826438904, 'eval_accuracy': 0.8620478858889455, 'eval_runtime': 19.5395, 'eval_samples_per_second': 502.317, 'eval_steps_per_second': 7.881, 'epoch': 0.65}\n",
+      "{'loss': 0.3995, 'learning_rate': 2.6711212516297265e-06, 'epoch': 0.73}\n",
+      "{'eval_loss': 0.3582080602645874, 'eval_accuracy': 0.8640855832908813, 'eval_runtime': 19.5525, 'eval_samples_per_second': 501.981, 'eval_steps_per_second': 7.876, 'epoch': 0.73}\n",
+      "{'loss': 0.3932, 'learning_rate': 1.8562581486310302e-06, 'epoch': 0.81}\n",
+      "{'eval_loss': 0.35252732038497925, 'eval_accuracy': 0.8660213958227203, 'eval_runtime': 19.3648, 'eval_samples_per_second': 506.847, 'eval_steps_per_second': 7.953, 'epoch': 0.81}\n",
+      "{'loss': 0.3941, 'learning_rate': 1.0422099087353325e-06, 'epoch': 0.9}\n",
+      "{'eval_loss': 0.3504713773727417, 'eval_accuracy': 0.8664289353031075, 'eval_runtime': 19.7085, 'eval_samples_per_second': 498.009, 'eval_steps_per_second': 7.814, 'epoch': 0.9}\n",
+      "{'loss': 0.3965, 'learning_rate': 2.2734680573663624e-07, 'epoch': 0.98}\n",
+      "{'eval_loss': 0.34929943084716797, 'eval_accuracy': 0.8682628629648497, 'eval_runtime': 18.6964, 'eval_samples_per_second': 524.966, 'eval_steps_per_second': 8.237, 'epoch': 0.98}\n",
+      "{'train_runtime': 2756.3926, 'train_samples_per_second': 142.47, 'train_steps_per_second': 4.452, 'train_loss': 0.4411108956964883, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.34929943084716797, 'eval_accuracy': 0.8682628629648497, 'eval_runtime': 18.8099, 'eval_samples_per_second': 521.801, 'eval_steps_per_second': 8.187, 'epoch': 1.0}\n",
+      "{'eval_loss': 0.34929943084716797, 'eval_accuracy': 0.8682628629648497, 'eval_runtime': 18.8099, 'eval_samples_per_second': 521.801, 'eval_steps_per_second': 8.187, 'epoch': 1.0}\n"
+     ]
+    }
+   ],
+   "source": [
+    "model_roberta: PreTrainedModel = AutoModelForSequenceClassification.from_pretrained(\n",
+    "    model_checkpoint, num_labels=num_labels\n",
+    ")\n",
+    "model_roberta = model_roberta.cuda()\n",
+    "\n",
+    "args.learning_rate = 1e-5\n",
+    "trainer = Trainer(\n",
+    "    model_roberta,\n",
+    "    args,\n",
+    "    train_dataset=encoded_dataset[\"train\"],\n",
+    "    eval_dataset=encoded_dataset[validation_key],\n",
+    "    tokenizer=tokenizer,\n",
+    "    compute_metrics=compute_metrics,\n",
+    ")\n",
+    "transformers.logging.set_verbosity_error()\n",
+    "trainer.train()\n",
+    "print(trainer.evaluate())\n",
+    "# {'eval_loss': 0.3559744358062744, 'eval_accuracy': 0.8655119714722364, 'eval_runtime': 19.6678, 'eval_samples_per_second': 499.04, 'eval_steps_per_second': 7.83, 'epoch': 0.98}\n",
+    "trainer.save_model(\"roberta-baseline\")\n",
+    "del model_roberta\n",
+    "del trainer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### GPU execution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To finish, we will measure vanilla Pytorch inference on both FP32 and FP16 precision, it will be our baseline:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 51,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Pytorch (FP32)] mean=83.53ms, sd=3.69ms, min=79.18ms, max=91.07ms, median=84.09ms, 95p=89.34ms, 99p=90.44ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "baseline_model = AutoModelForSequenceClassification.from_pretrained(\"roberta-baseline\", num_labels=num_labels)\n",
+    "baseline_model = baseline_model.cuda()\n",
+    "baseline_model = baseline_model.eval()\n",
+    "\n",
+    "data = encoded_dataset[\"train\"][0:batch_size]\n",
+    "input_torch: OD[str, torch.Tensor] = convert_tensor(data=data, output=\"torch\")\n",
+    "\n",
+    "with torch.inference_mode():\n",
+    "    for _ in range(30):\n",
+    "        _ = baseline_model(**input_torch)\n",
+    "        torch.cuda.synchronize()\n",
+    "    time_buffer = list()\n",
+    "    for _ in range(100):\n",
+    "        with track_infer_time(time_buffer):\n",
+    "            _ = baseline_model(**input_torch)\n",
+    "            torch.cuda.synchronize()\n",
+    "print_timings(name=\"Pytorch (FP32)\", timings=time_buffer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 52,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Pytorch (FP16)] mean=58.78ms, sd=1.59ms, min=57.74ms, max=64.04ms, median=58.15ms, 95p=62.80ms, 99p=63.88ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "from torch.cuda.amp import autocast\n",
+    "\n",
+    "with torch.inference_mode():\n",
+    "    with autocast():\n",
+    "        for _ in range(30):\n",
+    "            _ = baseline_model(**input_torch)\n",
+    "            torch.cuda.synchronize()\n",
+    "        time_buffer = []\n",
+    "        for _ in range(100):\n",
+    "            with track_infer_time(time_buffer):\n",
+    "                _ = baseline_model(**input_torch)\n",
+    "                torch.cuda.synchronize()\n",
+    "print_timings(name=\"Pytorch (FP16)\", timings=time_buffer)\n",
+    "del baseline_model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### CPU execution"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 53,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Pytorch (FP32) - CPU] mean=4406.68ms, sd=290.44ms, min=3908.02ms, max=4794.74ms, median=4486.10ms, 95p=4725.07ms, 99p=4780.80ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "baseline_model = AutoModelForSequenceClassification.from_pretrained(\"roberta-baseline\", num_labels=num_labels)\n",
+    "baseline_model = baseline_model.eval()\n",
+    "input_torch_cpu = {k: v.to(\"cpu\") for k, v in input_torch.items()}\n",
+    "\n",
+    "\n",
+    "with torch.inference_mode():\n",
+    "    for _ in range(3):\n",
+    "        _ = baseline_model(**input_torch_cpu)\n",
+    "        torch.cuda.synchronize()\n",
+    "    time_buffer = list()\n",
+    "    for _ in range(10):\n",
+    "        with track_infer_time(time_buffer):\n",
+    "            _ = baseline_model(**input_torch_cpu)\n",
+    "            torch.cuda.synchronize()\n",
+    "print_timings(name=\"Pytorch (FP32) - CPU\", timings=time_buffer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Pytorch (FP16) - CPU] mean=4255.15ms, sd=123.93ms, min=4103.51ms, max=4527.69ms, median=4206.06ms, 95p=4469.24ms, 99p=4516.00ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "with torch.inference_mode():\n",
+    "    with autocast():\n",
+    "        for _ in range(3):\n",
+    "            _ = baseline_model(**input_torch_cpu)\n",
+    "            torch.cuda.synchronize()\n",
+    "        time_buffer = []\n",
+    "        for _ in range(10):\n",
+    "            with track_infer_time(time_buffer):\n",
+    "                _ = baseline_model(**input_torch_cpu)\n",
+    "                torch.cuda.synchronize()\n",
+    "print_timings(name=\"Pytorch (FP16) - CPU\", timings=time_buffer)\n",
+    "del baseline_model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### TensorRT baseline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Below we export a randomly initialized `Roberta` model, the purpose is to only check the performance on mixed precision (FP16, no quantization)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 55,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "baseline_model = AutoModelForSequenceClassification.from_pretrained(\"roberta-baseline\", num_labels=num_labels)\n",
+    "baseline_model = baseline_model.cuda()\n",
+    "convert_to_onnx(baseline_model, output_path=\"baseline.onnx\", inputs_pytorch=input_torch, opset=12)\n",
+    "del baseline_model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 56,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[TensorRT (FP16)] mean=30.23ms, sd=0.25ms, min=29.92ms, max=31.51ms, median=30.14ms, 95p=30.74ms, 99p=30.95ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "engine = build_engine(\n",
+    "    runtime=runtime,\n",
+    "    onnx_file_path=\"baseline.onnx\",\n",
+    "    logger=trt_logger,\n",
+    "    min_shape=(batch_size, max_seq_len),\n",
+    "    optimal_shape=(batch_size, max_seq_len),\n",
+    "    max_shape=(batch_size, max_seq_len),\n",
+    "    workspace_size=10000 * 1024 * 1024,\n",
+    "    fp16=True,\n",
+    "    int8=False,\n",
+    ")\n",
+    "stream: Stream = pycuda.driver.Stream()\n",
+    "context: IExecutionContext = engine.create_execution_context()\n",
+    "context.set_optimization_profile_async(profile_index=profile_index, stream_handle=stream.handle)\n",
+    "input_binding_idxs, output_binding_idxs = get_binding_idxs(engine, profile_index)  # type: List[int], List[int]\n",
+    "for _ in range(30):\n",
+    "    _ = infer_tensorrt(\n",
+    "        context=context,\n",
+    "        host_inputs=input_np,\n",
+    "        input_binding_idxs=input_binding_idxs,\n",
+    "        output_binding_idxs=output_binding_idxs,\n",
+    "        stream=stream,\n",
+    "    )\n",
+    "time_buffer = list()\n",
+    "for _ in range(100):\n",
+    "    with track_infer_time(time_buffer):\n",
+    "        _ = infer_tensorrt(\n",
+    "            context=context,\n",
+    "            host_inputs=input_np,\n",
+    "            input_binding_idxs=input_binding_idxs,\n",
+    "            output_binding_idxs=output_binding_idxs,\n",
+    "            stream=stream,\n",
+    "        )\n",
+    "\n",
+    "print_timings(name=\"TensorRT (FP16)\", timings=time_buffer)\n",
+    "del engine, context"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "collapsed_sections": [
+    "whPRbBNbIrIl",
+    "n9qywopnIrJH",
+    "7k8ge1L1IrJk"
+   ],
+   "name": "Copie de Text Classification on GLUE",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.5"
+  },
+  "widgets": {
+   "application/vnd.jupyter.widget-state+json": {
+    "0022faf286b44e858e638ccd5ded38b0": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "023900ca566446eab5905b25b16a3de7": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "08286a6371584b4186014ecb5d5f164d": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_2d3a08166846438db79b0f89314fe76a",
+      "placeholder": "​",
+      "style": "IPY_MODEL_d1ecc3d380fc4758b03190b23686a2f1",
+      "value": " 481/481 [00:00&lt;00:00, 10.9kB/s]"
+     }
+    },
+    "092db03992f24951b494fbb81da5b9d6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_994cf2338c7c4899952e25723445693c",
+       "IPY_MODEL_6aa2f5d46f1f454198d8e69517549ff1",
+       "IPY_MODEL_72b8f11065254e5ca488cd346b5add54"
+      ],
+      "layout": "IPY_MODEL_023900ca566446eab5905b25b16a3de7"
+     }
+    },
+    "0dab554959dc44b3b313ee8ae91ca88d": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_e1f08cf954ae4aea818c90d893486c77",
+      "placeholder": "​",
+      "style": "IPY_MODEL_f01fdef82047471e8c1b780cae5379cc",
+      "value": " 420M/420M [00:13&lt;00:00, 33.6MB/s]"
+     }
+    },
+    "10678736bd534c63aebda414da01b4db": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "14648b8262944f5faac134a7c0184e47": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "154200a8bc0b44fe8d0419fd56c6539d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "15aae23369674f82888ed9fbd99739f2": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "163146c2f23440bcbf782116a35b5684": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_cf9597523c024514b9b3e66bc77e3fa8",
+      "max": 440473133,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_cced5f1cccc2400a8fbfd7a6eaedc666",
+      "value": 440473133
+     }
+    },
+    "167874df55014291be95cd390b1e60d3": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "17b83e0d0fb947d7bf20319ff930e8fc": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_854cfd13416543fba8221093b903658b",
+      "placeholder": "​",
+      "style": "IPY_MODEL_cbbb20b5d01a4450bfb8dfbf8048d64f",
+      "value": "Downloading: 100%"
+     }
+    },
+    "17bd5357081d41c6b0161d63bd00820a": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "196ffc99ad5a40109d9b1cfe12032b62": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "1bea379404df429b9852b62a938661ae": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_a4c444f06c0847c09a44917084d3908d",
+      "max": 1,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_300f01e3547648f3983a83d3d3118c54",
+      "value": 1
+     }
+    },
+    "1da1d80871f545bbb21bf5a84d2120a0": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_8585eab4b3fe4992bd7e7c4596e2483b",
+      "max": 570,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_7ec6da801d0d45c4bb80eeab5518e124",
+      "value": 570
+     }
+    },
+    "21ef195fa88f49c4a2c057f8028177a2": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "26bc2038bed74279813ab5af09a2724c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_0022faf286b44e858e638ccd5ded38b0",
+      "max": 456318,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_7ff32d18c9f0473893a6a6b2941c54b0",
+      "value": 456318
+     }
+    },
+    "28b7346a9b8c4b198dd9dbea1be013b6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "2d3a08166846438db79b0f89314fe76a": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "2eac6b4817e14d7fae396e6458b940fa": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_927ad6ade85a402594074fa90ab558c2",
+      "placeholder": "​",
+      "style": "IPY_MODEL_cae29b9c6d45412fab70977fcd0f3234",
+      "value": "Downloading: 100%"
+     }
+    },
+    "300f01e3547648f3983a83d3d3118c54": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "30646fa2c0dc494e9dbcbd4dc598410e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "360d6eb0e41543dba6d457912e32a77d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "37cda4cae81a4d94aa831fb40b5c3b26": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_56fd7584b0844590936519ec3851922e",
+      "placeholder": "​",
+      "style": "IPY_MODEL_5b1ad9f5d02c4b298a02ce6041692057",
+      "value": " 4/4 [00:00&lt;00:00,  5.97ba/s]"
+     }
+    },
+    "3bfff454943b4b04a12ec29bbe28e0aa": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_cedca6e55b84443e82f3d01471d61048",
+       "IPY_MODEL_a7d355f456eb4d3995dd91c5917a72c1",
+       "IPY_MODEL_b264b220d9c444bd9da46a7e6c8fd5ed"
+      ],
+      "layout": "IPY_MODEL_154200a8bc0b44fe8d0419fd56c6539d"
+     }
+    },
+    "3e7fbd1c0e534cb8abca18d1edfc9277": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "4320b12de9d14c459cc88319e2d7622a": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "4552ee8ca6bd4a0b956651cc23f4ff3c": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "4b13c3b3435f4689b29d48e0a35bebd6": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "4e91efae49b64f038fd3fbfcfd2be510": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "4fae966b76844c869cdea1e53891e26f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "54c0ad5ab737433190c4a824be128a48": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "561b1ede331a40c1a2bff9422e8eea0e": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "56fd7584b0844590936519ec3851922e": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "59418bbeb20547e5b5e1a5728262c757": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "5b1ad9f5d02c4b298a02ce6041692057": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "5e2185bd6e4f4a10b89ac606868a43bd": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_2eac6b4817e14d7fae396e6458b940fa",
+       "IPY_MODEL_af16284f77594397a69ad0e322b5e736",
+       "IPY_MODEL_a20579a9e7364fb485d79bdc4feb54dc"
+      ],
+      "layout": "IPY_MODEL_f44d2beebfe44186b0ac8016e89e4b49"
+     }
+    },
+    "5f032f56105f463a8680aa2482d0b162": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "65017db07d7f4e798ede741cc92488f0": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_86cc326e574a4fada7224e6f0c209e9a",
+       "IPY_MODEL_af5b646f89024c139c695a1f058fb772",
+       "IPY_MODEL_37cda4cae81a4d94aa831fb40b5c3b26"
+      ],
+      "layout": "IPY_MODEL_6fa74604c68543a38392fa0e1587f707"
+     }
+    },
+    "68c4c867096d41a78740fdee30edcadb": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "6aa2f5d46f1f454198d8e69517549ff1": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7b483d17d1d14fdd922600f0c906fc2f",
+      "max": 1355863,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_4320b12de9d14c459cc88319e2d7622a",
+      "value": 1355863
+     }
+    },
+    "6d48e5ce9a854a3bb0506d774665f428": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_fbdb7c7250d846b2880005a9012c484b",
+      "placeholder": "​",
+      "style": "IPY_MODEL_17bd5357081d41c6b0161d63bd00820a",
+      "value": " 478M/478M [00:15&lt;00:00, 34.7MB/s]"
+     }
+    },
+    "6e54ce781ca54ad283911fa4774e3361": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "6e604307427a466cab51d50d363ee86d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "6fa74604c68543a38392fa0e1587f707": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "728a9dcc79824e1eb2bfa49d915a8f08": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d314c0bb87e04893b96de0e18766d3ab",
+      "placeholder": "​",
+      "style": "IPY_MODEL_fa35b3acd9ce4cb098fcd69bb405db00",
+      "value": "Downloading: 100%"
+     }
+    },
+    "72b8f11065254e5ca488cd346b5add54": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_10678736bd534c63aebda414da01b4db",
+      "placeholder": "​",
+      "style": "IPY_MODEL_14648b8262944f5faac134a7c0184e47",
+      "value": " 1.29M/1.29M [00:00&lt;00:00, 2.22MB/s]"
+     }
+    },
+    "7701ec898fd443f1b35b187aea3651e9": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "78601982b0e04b80adaa502db2ef685a": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_d6426fea2eda41dd9a31cb3f35b0877e",
+       "IPY_MODEL_163146c2f23440bcbf782116a35b5684",
+       "IPY_MODEL_0dab554959dc44b3b313ee8ae91ca88d"
+      ],
+      "layout": "IPY_MODEL_167874df55014291be95cd390b1e60d3"
+     }
+    },
+    "788badadfd834f61926a39a43ef1d517": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "7a75099f99054645bf3fc1b778dac7e6": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "7b483d17d1d14fdd922600f0c906fc2f": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "7bb3b69a2f814e60b0cec253c759a16b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d731cfb34124448bbd8baab3d27b75db",
+      "placeholder": "​",
+      "style": "IPY_MODEL_cbb3e9bf5d07406d9768a98a6f0b5b64",
+      "value": "100%"
+     }
+    },
+    "7c875ecd9cb54405a6c45969bcb4b4c6": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "7d520bdde27742abb42803843721d101": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "7ec6da801d0d45c4bb80eeab5518e124": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "7ff32d18c9f0473893a6a6b2941c54b0": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "8399339998564d21ba5db6f0514c02c6": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "854cfd13416543fba8221093b903658b": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "8585eab4b3fe4992bd7e7c4596e2483b": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "86cc326e574a4fada7224e6f0c209e9a": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_561b1ede331a40c1a2bff9422e8eea0e",
+      "placeholder": "​",
+      "style": "IPY_MODEL_28b7346a9b8c4b198dd9dbea1be013b6",
+      "value": "100%"
+     }
+    },
+    "87d85ac2d3104f68b99db880b1089638": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_728a9dcc79824e1eb2bfa49d915a8f08",
+       "IPY_MODEL_c815bfd265f4480298c39c76b9eaf770",
+       "IPY_MODEL_6d48e5ce9a854a3bb0506d774665f428"
+      ],
+      "layout": "IPY_MODEL_6e604307427a466cab51d50d363ee86d"
+     }
+    },
+    "8a11c8fed672470b8335dc575a4a220e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_93dbcc6d23a743bab0da8af6ee5e2825",
+      "max": 481,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_f8a0053903c64e75ac25eab5b24d5871",
+      "value": 481
+     }
+    },
+    "8defdddee0e64a20b101e6c50bd7c60b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_7bb3b69a2f814e60b0cec253c759a16b",
+       "IPY_MODEL_d25cca081db3469b80163d6707f5a37d",
+       "IPY_MODEL_f8abc3e44ae3428885aafbea2b37384c"
+      ],
+      "layout": "IPY_MODEL_f485d2b19ffa4585a1da20986f28af29"
+     }
+    },
+    "927ad6ade85a402594074fa90ab558c2": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "93dbcc6d23a743bab0da8af6ee5e2825": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "969b6fdac1d6418d89a683db1e6ec6b2": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "990482eebca2424bb5ecbd114007e02c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "994cf2338c7c4899952e25723445693c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d9a0852554284d36b6b121f579b06b41",
+      "placeholder": "​",
+      "style": "IPY_MODEL_c7bd52ef524c4d279dfcaa3aebe4a2c5",
+      "value": "Downloading: 100%"
+     }
+    },
+    "99e94791043b4499b06601f7524f9b14": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_d5c8ff9e3bd849059fa7b30eab5fc940",
+      "placeholder": "​",
+      "style": "IPY_MODEL_196ffc99ad5a40109d9b1cfe12032b62",
+      "value": "Downloading: 100%"
+     }
+    },
+    "9bc6e14b912249e3b7d02f31bcc74667": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_969b6fdac1d6418d89a683db1e6ec6b2",
+      "placeholder": "​",
+      "style": "IPY_MODEL_6e54ce781ca54ad283911fa4774e3361",
+      "value": " 446k/446k [00:00&lt;00:00, 650kB/s]"
+     }
+    },
+    "a02624219ee84f50b1a3032eaa030a39": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "a0a2918e9772475cac51124b3b83fcaf": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a20579a9e7364fb485d79bdc4feb54dc": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_4b13c3b3435f4689b29d48e0a35bebd6",
+      "placeholder": "​",
+      "style": "IPY_MODEL_d5d015711ae04d2f801577fc50af6c15",
+      "value": " 878k/878k [00:00&lt;00:00, 1.33MB/s]"
+     }
+    },
+    "a3e2c73d393d4e58a371f3da3dd80e6d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a4c444f06c0847c09a44917084d3908d": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a51b461c062f4636bfa4b48823d0709b": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "a61d366d91c34697a55f62b754e1f3a5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_a9b98fd93fcd4fc4a2b2aa88c82835d0",
+      "placeholder": "​",
+      "style": "IPY_MODEL_b8722dc10d4447fe9630cbf169260cc8",
+      "value": "100%"
+     }
+    },
+    "a7d355f456eb4d3995dd91c5917a72c1": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_5f032f56105f463a8680aa2482d0b162",
+      "max": 2,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_a02624219ee84f50b1a3032eaa030a39",
+      "value": 2
+     }
+    },
+    "a9b98fd93fcd4fc4a2b2aa88c82835d0": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "ac14ba24dcf3404db9fd303dbb24d7a5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_17b83e0d0fb947d7bf20319ff930e8fc",
+       "IPY_MODEL_1da1d80871f545bbb21bf5a84d2120a0",
+       "IPY_MODEL_c593f2e45e244637821cc5721788bf2c"
+      ],
+      "layout": "IPY_MODEL_4e91efae49b64f038fd3fbfcfd2be510"
+     }
+    },
+    "aecf7f063234416abf3f24766481cb89": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "af16284f77594397a69ad0e322b5e736": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7a75099f99054645bf3fc1b778dac7e6",
+      "max": 898823,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_30646fa2c0dc494e9dbcbd4dc598410e",
+      "value": 898823
+     }
+    },
+    "af5b646f89024c139c695a1f058fb772": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_21ef195fa88f49c4a2c057f8028177a2",
+      "max": 4,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_aecf7f063234416abf3f24766481cb89",
+      "value": 4
+     }
+    },
+    "b264b220d9c444bd9da46a7e6c8fd5ed": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_8399339998564d21ba5db6f0514c02c6",
+      "placeholder": "​",
+      "style": "IPY_MODEL_7701ec898fd443f1b35b187aea3651e9",
+      "value": " 2/2 [00:00&lt;00:00,  6.46ba/s]"
+     }
+    },
+    "b4d3f284fc4c4061b58d43a738f9bc78": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_7d520bdde27742abb42803843721d101",
+      "placeholder": "​",
+      "style": "IPY_MODEL_68c4c867096d41a78740fdee30edcadb",
+      "value": "Downloading: 100%"
+     }
+    },
+    "b6be028de2ae4ff691538eedb33793af": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_b4d3f284fc4c4061b58d43a738f9bc78",
+       "IPY_MODEL_8a11c8fed672470b8335dc575a4a220e",
+       "IPY_MODEL_08286a6371584b4186014ecb5d5f164d"
+      ],
+      "layout": "IPY_MODEL_a3e2c73d393d4e58a371f3da3dd80e6d"
+     }
+    },
+    "b8722dc10d4447fe9630cbf169260cc8": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "bbe3a471efb04ea8b5aabc4be819d585": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_a61d366d91c34697a55f62b754e1f3a5",
+       "IPY_MODEL_1bea379404df429b9852b62a938661ae",
+       "IPY_MODEL_c801e1727de44b67aa7cb1c3d970e1fe"
+      ],
+      "layout": "IPY_MODEL_59418bbeb20547e5b5e1a5728262c757"
+     }
+    },
+    "be4affe852b348de8fe1362582b08da9": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HBoxModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HBoxModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HBoxView",
+      "box_style": "",
+      "children": [
+       "IPY_MODEL_99e94791043b4499b06601f7524f9b14",
+       "IPY_MODEL_26bc2038bed74279813ab5af09a2724c",
+       "IPY_MODEL_9bc6e14b912249e3b7d02f31bcc74667"
+      ],
+      "layout": "IPY_MODEL_c6c100b71f26405fb960598feb5eee03"
+     }
+    },
+    "c593f2e45e244637821cc5721788bf2c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_c92a19dfa84142af91522bc22f21fca6",
+      "placeholder": "​",
+      "style": "IPY_MODEL_990482eebca2424bb5ecbd114007e02c",
+      "value": " 570/570 [00:00&lt;00:00, 13.1kB/s]"
+     }
+    },
+    "c6c100b71f26405fb960598feb5eee03": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "c7bd52ef524c4d279dfcaa3aebe4a2c5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "c801e1727de44b67aa7cb1c3d970e1fe": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_4552ee8ca6bd4a0b956651cc23f4ff3c",
+      "placeholder": "​",
+      "style": "IPY_MODEL_7c875ecd9cb54405a6c45969bcb4b4c6",
+      "value": " 1/1 [00:00&lt;00:00,  7.22ba/s]"
+     }
+    },
+    "c815bfd265f4480298c39c76b9eaf770": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_15aae23369674f82888ed9fbd99739f2",
+      "max": 501200538,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_3e7fbd1c0e534cb8abca18d1edfc9277",
+      "value": 501200538
+     }
+    },
+    "c92a19dfa84142af91522bc22f21fca6": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "cae29b9c6d45412fab70977fcd0f3234": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "cbb3e9bf5d07406d9768a98a6f0b5b64": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "cbbb20b5d01a4450bfb8dfbf8048d64f": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "cced5f1cccc2400a8fbfd7a6eaedc666": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "cedca6e55b84443e82f3d01471d61048": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_a0a2918e9772475cac51124b3b83fcaf",
+      "placeholder": "​",
+      "style": "IPY_MODEL_4fae966b76844c869cdea1e53891e26f",
+      "value": "100%"
+     }
+    },
+    "cf9597523c024514b9b3e66bc77e3fa8": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d1ecc3d380fc4758b03190b23686a2f1": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "d25cca081db3469b80163d6707f5a37d": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "FloatProgressModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "FloatProgressModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "ProgressView",
+      "bar_style": "success",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_360d6eb0e41543dba6d457912e32a77d",
+      "max": 3,
+      "min": 0,
+      "orientation": "horizontal",
+      "style": "IPY_MODEL_788badadfd834f61926a39a43ef1d517",
+      "value": 3
+     }
+    },
+    "d314c0bb87e04893b96de0e18766d3ab": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d5c8ff9e3bd849059fa7b30eab5fc940": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d5d015711ae04d2f801577fc50af6c15": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "d6426fea2eda41dd9a31cb3f35b0877e": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_a51b461c062f4636bfa4b48823d0709b",
+      "placeholder": "​",
+      "style": "IPY_MODEL_f651eecbb6d44c24820cf6fe5ab92e7b",
+      "value": "Downloading: 100%"
+     }
+    },
+    "d731cfb34124448bbd8baab3d27b75db": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "d9a0852554284d36b6b121f579b06b41": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "e1f08cf954ae4aea818c90d893486c77": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "f01fdef82047471e8c1b780cae5379cc": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "f237ed04039945e9aa224d1b9d04e1b5": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "f44d2beebfe44186b0ac8016e89e4b49": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "f485d2b19ffa4585a1da20986f28af29": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    },
+    "f651eecbb6d44c24820cf6fe5ab92e7b": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "f8a0053903c64e75ac25eab5b24d5871": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "ProgressStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "ProgressStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "bar_color": null,
+      "description_width": ""
+     }
+    },
+    "f8abc3e44ae3428885aafbea2b37384c": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "HTMLModel",
+     "state": {
+      "_dom_classes": [],
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "HTMLModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/controls",
+      "_view_module_version": "1.5.0",
+      "_view_name": "HTMLView",
+      "description": "",
+      "description_tooltip": null,
+      "layout": "IPY_MODEL_54c0ad5ab737433190c4a824be128a48",
+      "placeholder": "​",
+      "style": "IPY_MODEL_f237ed04039945e9aa224d1b9d04e1b5",
+      "value": " 3/3 [00:00&lt;00:00, 52.79it/s]"
+     }
+    },
+    "fa35b3acd9ce4cb098fcd69bb405db00": {
+     "model_module": "@jupyter-widgets/controls",
+     "model_module_version": "1.5.0",
+     "model_name": "DescriptionStyleModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/controls",
+      "_model_module_version": "1.5.0",
+      "_model_name": "DescriptionStyleModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "StyleView",
+      "description_width": ""
+     }
+    },
+    "fbdb7c7250d846b2880005a9012c484b": {
+     "model_module": "@jupyter-widgets/base",
+     "model_module_version": "1.2.0",
+     "model_name": "LayoutModel",
+     "state": {
+      "_model_module": "@jupyter-widgets/base",
+      "_model_module_version": "1.2.0",
+      "_model_name": "LayoutModel",
+      "_view_count": null,
+      "_view_module": "@jupyter-widgets/base",
+      "_view_module_version": "1.2.0",
+      "_view_name": "LayoutView",
+      "align_content": null,
+      "align_items": null,
+      "align_self": null,
+      "border": null,
+      "bottom": null,
+      "display": null,
+      "flex": null,
+      "flex_flow": null,
+      "grid_area": null,
+      "grid_auto_columns": null,
+      "grid_auto_flow": null,
+      "grid_auto_rows": null,
+      "grid_column": null,
+      "grid_gap": null,
+      "grid_row": null,
+      "grid_template_areas": null,
+      "grid_template_columns": null,
+      "grid_template_rows": null,
+      "height": null,
+      "justify_content": null,
+      "justify_items": null,
+      "left": null,
+      "margin": null,
+      "max_height": null,
+      "max_width": null,
+      "min_height": null,
+      "min_width": null,
+      "object_fit": null,
+      "object_position": null,
+      "order": null,
+      "overflow": null,
+      "overflow_x": null,
+      "overflow_y": null,
+      "padding": null,
+      "right": null,
+      "top": null,
+      "visibility": null,
+      "width": null
+     }
+    }
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/requirements.txt b/requirements.txt
index 6d955a18..cebff0a5 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -9,6 +9,6 @@ sympy
 coloredlogs
 pytest
 colored
-black
+black[jupyter]
 isort
 flake8
diff --git a/requirements_gpu.txt b/requirements_gpu.txt
index 65017090..2fad68e0 100644
--- a/requirements_gpu.txt
+++ b/requirements_gpu.txt
@@ -1,10 +1,9 @@
 onnx
-onnxruntime-gpu
+onnxruntime-gpu==1.9.0
 nvidia-pyindex
 tritonclient[all]
 pycuda
 torch==1.10.0+cu113
-nvidia-pyindex
 nvidia-tensorrt
 onnx_graphsurgeon
 polygraphy
diff --git a/src/transformer_deploy/QDQModels/QDQRoberta.py b/src/transformer_deploy/QDQModels/QDQRoberta.py
new file mode 100644
index 00000000..4d9ddc77
--- /dev/null
+++ b/src/transformer_deploy/QDQModels/QDQRoberta.py
@@ -0,0 +1,1631 @@
+#  Copyright 2021, Lefebvre Sarrut Services
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+# coding=utf-8
+# Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
+# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# copied from Hugging Face transformers library
+# modified parts (outside imports) are preceded by -> # QDQ change below
+
+"""PyTorch RoBERTa model. """
+import math
+
+import torch
+import torch.utils.checkpoint
+from packaging import version
+from pytorch_quantization import nn as quant_nn
+from pytorch_quantization.nn.modules.tensor_quantizer import TensorQuantizer
+from torch import nn
+from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
+from transformers import RobertaConfig
+from transformers.activations import ACT2FN, gelu
+from transformers.file_utils import (
+    add_code_sample_docstrings,
+    add_start_docstrings,
+    add_start_docstrings_to_model_forward,
+    replace_return_docstrings,
+)
+from transformers.modeling_outputs import (
+    BaseModelOutputWithPastAndCrossAttentions,
+    BaseModelOutputWithPoolingAndCrossAttentions,
+    CausalLMOutputWithCrossAttentions,
+    MaskedLMOutput,
+    MultipleChoiceModelOutput,
+    QuestionAnsweringModelOutput,
+    SequenceClassifierOutput,
+    TokenClassifierOutput,
+)
+from transformers.modeling_utils import (
+    PreTrainedModel,
+    apply_chunking_to_forward,
+    find_pruneable_heads_and_indices,
+    prune_linear_layer,
+)
+from transformers.utils import logging
+
+
+logger = logging.get_logger(__name__)
+
+_CHECKPOINT_FOR_DOC = "roberta-base"
+_CONFIG_FOR_DOC = "RobertaConfig"
+_TOKENIZER_FOR_DOC = "RobertaTokenizer"
+
+ROBERTA_PRETRAINED_MODEL_ARCHIVE_LIST = [
+    "roberta-base",
+    "roberta-large",
+    "roberta-large-mnli",
+    "distilroberta-base",
+    "roberta-base-openai-detector",
+    "roberta-large-openai-detector",
+    # See all RoBERTa models at https://huggingface.co/models?filter=roberta
+]
+
+
+class RobertaEmbeddings(nn.Module):
+    """
+    Same as BertEmbeddings with a tiny tweak for positional embeddings indexing.
+    """
+
+    # Copied from transformers.models.bert.modeling_bert.BertEmbeddings.__init__
+    def __init__(self, config):
+        super().__init__()
+        self.word_embeddings = nn.Embedding(config.vocab_size, config.hidden_size, padding_idx=config.pad_token_id)
+        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.hidden_size)
+        self.token_type_embeddings = nn.Embedding(config.type_vocab_size, config.hidden_size)
+
+        # self.LayerNorm is not snake-cased to stick with TensorFlow model variable name and be able to load
+        # any TensorFlow checkpoint file
+        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        # position_ids (1, len position emb) is contiguous in memory and exported when serialized
+        self.position_embedding_type = getattr(config, "position_embedding_type", "absolute")
+        self.register_buffer("position_ids", torch.arange(config.max_position_embeddings).expand((1, -1)))
+        if version.parse(torch.__version__) > version.parse("1.6.0"):
+            self.register_buffer(
+                "token_type_ids",
+                torch.zeros(self.position_ids.size(), dtype=torch.long),
+                persistent=False,
+            )
+
+        # End copy
+        self.padding_idx = config.pad_token_id
+        self.position_embeddings = nn.Embedding(
+            config.max_position_embeddings, config.hidden_size, padding_idx=self.padding_idx
+        )
+
+    def forward(
+        self, input_ids=None, token_type_ids=None, position_ids=None, inputs_embeds=None, past_key_values_length=0
+    ):
+        if position_ids is None:
+            # TODO here?
+            if input_ids is not None:
+                # Create the position ids from the input token ids. Any padded tokens remain padded.
+                position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length)
+            else:
+                position_ids = self.create_position_ids_from_inputs_embeds(inputs_embeds)
+
+        if input_ids is not None:
+            input_shape = input_ids.size()
+        else:
+            input_shape = inputs_embeds.size()[:-1]
+
+        seq_length = input_shape[1]
+
+        # Setting the token_type_ids to the registered buffer in constructor where it is all zeros, which usually occurs
+        # when its auto-generated, registered buffer helps users when tracing the model without passing token_type_ids, solves # noqa: E501
+        # issue #5664
+        if token_type_ids is None:
+            if hasattr(self, "token_type_ids"):
+                buffered_token_type_ids = self.token_type_ids[:, :seq_length]
+                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(input_shape[0], seq_length)
+                token_type_ids = buffered_token_type_ids_expanded
+            else:
+                token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=self.position_ids.device)
+
+        if inputs_embeds is None:
+            inputs_embeds = self.word_embeddings(input_ids)
+        token_type_embeddings = self.token_type_embeddings(token_type_ids)
+
+        embeddings = inputs_embeds + token_type_embeddings
+        if self.position_embedding_type == "absolute":
+            position_embeddings = self.position_embeddings(position_ids)
+            embeddings += position_embeddings
+        embeddings = self.LayerNorm(embeddings)
+        embeddings = self.dropout(embeddings)
+        return embeddings
+
+    def create_position_ids_from_inputs_embeds(self, inputs_embeds):
+        """
+        We are provided embeddings directly. We cannot infer which are padded so just generate sequential position ids.
+
+        Args:
+            inputs_embeds: torch.Tensor
+
+        Returns: torch.Tensor
+        """
+        input_shape = inputs_embeds.size()[:-1]
+        sequence_length = input_shape[1]
+
+        position_ids = torch.arange(
+            self.padding_idx + 1, sequence_length + self.padding_idx + 1, dtype=torch.long, device=inputs_embeds.device
+        )
+        return position_ids.unsqueeze(0).expand(input_shape)
+
+
+# Copied from transformers.models.bert.modeling_bert.BertSelfAttention with Bert->Roberta
+class RobertaSelfAttention(nn.Module):
+    def __init__(self, config, position_embedding_type=None):
+        super().__init__()
+        if config.hidden_size % config.num_attention_heads != 0 and not hasattr(config, "embedding_size"):
+            raise ValueError(
+                f"The hidden size ({config.hidden_size}) is not a multiple of the number of attention "
+                f"heads ({config.num_attention_heads})"
+            )
+
+        self.num_attention_heads = config.num_attention_heads
+        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
+        self.all_head_size = self.num_attention_heads * self.attention_head_size
+        # QDQ change below
+        self.query = quant_nn.QuantLinear(config.hidden_size, self.all_head_size)
+        self.key = quant_nn.QuantLinear(config.hidden_size, self.all_head_size)
+        self.value = quant_nn.QuantLinear(config.hidden_size, self.all_head_size)
+
+        self.dropout = nn.Dropout(config.attention_probs_dropout_prob)
+        self.position_embedding_type = position_embedding_type or getattr(config, "position_embedding_type", "absolute")
+        if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
+            self.max_position_embeddings = config.max_position_embeddings
+            self.distance_embedding = nn.Embedding(2 * config.max_position_embeddings - 1, self.attention_head_size)
+
+        self.is_decoder = config.is_decoder
+        # QDQ change below
+        self.matmul_q_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+        self.matmul_k_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+        self.matmul_v_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+        self.matmul_a_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+
+    def transpose_for_scores(self, x):
+        new_x_shape = x.size()[:-1] + (self.num_attention_heads, self.attention_head_size)
+        x = x.view(*new_x_shape)
+        return x.permute(0, 2, 1, 3)
+
+    def forward(
+        self,
+        hidden_states,
+        attention_mask=None,
+        head_mask=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        past_key_value=None,
+        output_attentions=False,
+    ):
+        mixed_query_layer = self.query(hidden_states)
+
+        # If this is instantiated as a cross-attention module, the keys
+        # and values come from an encoder; the attention mask needs to be
+        # such that the encoder's padding tokens are not attended to.
+        is_cross_attention = encoder_hidden_states is not None
+
+        if is_cross_attention and past_key_value is not None:
+            # reuse k,v, cross_attentions
+            key_layer = past_key_value[0]
+            value_layer = past_key_value[1]
+            attention_mask = encoder_attention_mask
+        elif is_cross_attention:
+            key_layer = self.transpose_for_scores(self.key(encoder_hidden_states))
+            value_layer = self.transpose_for_scores(self.value(encoder_hidden_states))
+            attention_mask = encoder_attention_mask
+        elif past_key_value is not None:
+            key_layer = self.transpose_for_scores(self.key(hidden_states))
+            value_layer = self.transpose_for_scores(self.value(hidden_states))
+            key_layer = torch.cat([past_key_value[0], key_layer], dim=2)
+            value_layer = torch.cat([past_key_value[1], value_layer], dim=2)
+        else:
+            key_layer = self.transpose_for_scores(self.key(hidden_states))
+            value_layer = self.transpose_for_scores(self.value(hidden_states))
+
+        query_layer = self.transpose_for_scores(mixed_query_layer)
+
+        if self.is_decoder:
+            # if cross_attention save Tuple(torch.Tensor, torch.Tensor) of all cross attention key/value_states.
+            # Further calls to cross_attention layer can then reuse all cross-attention
+            # key/value_states (first "if" case)
+            # if uni-directional self-attention (decoder) save Tuple(torch.Tensor, torch.Tensor) of
+            # all previous decoder key/value_states. Further calls to uni-directional self-attention
+            # can concat previous decoder key/value_states to current projected key/value_states (third "elif" case)
+            # if encoder bi-directional self-attention `past_key_value` is always `None`
+            past_key_value = (key_layer, value_layer)
+
+        # Take the dot product between "query" and "key" to get the raw attention scores.
+        # QDQ change below
+        attention_scores = torch.matmul(
+            self.matmul_q_input_quantizer(query_layer), self.matmul_k_input_quantizer(key_layer.transpose(-1, -2))
+        )
+
+        if self.position_embedding_type == "relative_key" or self.position_embedding_type == "relative_key_query":
+            seq_length = hidden_states.size()[1]
+            position_ids_l = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(-1, 1)
+            position_ids_r = torch.arange(seq_length, dtype=torch.long, device=hidden_states.device).view(1, -1)
+            distance = position_ids_l - position_ids_r
+            positional_embedding = self.distance_embedding(distance + self.max_position_embeddings - 1)
+            positional_embedding = positional_embedding.to(dtype=query_layer.dtype)  # fp16 compatibility
+
+            if self.position_embedding_type == "relative_key":
+                relative_position_scores = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
+                attention_scores = attention_scores + relative_position_scores
+            elif self.position_embedding_type == "relative_key_query":
+                relative_position_scores_query = torch.einsum("bhld,lrd->bhlr", query_layer, positional_embedding)
+                relative_position_scores_key = torch.einsum("bhrd,lrd->bhlr", key_layer, positional_embedding)
+                attention_scores = attention_scores + relative_position_scores_query + relative_position_scores_key
+
+        attention_scores = attention_scores / math.sqrt(self.attention_head_size)
+        if attention_mask is not None:
+            # Apply the attention mask is (precomputed for all layers in RobertaModel forward() function)
+            attention_scores = attention_scores + attention_mask
+
+        # Normalize the attention scores to probabilities.
+        attention_probs = nn.Softmax(dim=-1)(attention_scores)
+
+        # This is actually dropping out entire tokens to attend to, which might
+        # seem a bit unusual, but is taken from the original Transformer paper.
+        attention_probs = self.dropout(attention_probs)
+
+        # Mask heads if we want to
+        if head_mask is not None:
+            attention_probs = attention_probs * head_mask
+        # QDQ change below
+        context_layer = torch.matmul(
+            self.matmul_a_input_quantizer(attention_probs), self.matmul_v_input_quantizer(value_layer)
+        )
+
+        context_layer = context_layer.permute(0, 2, 1, 3).contiguous()
+        new_context_layer_shape = context_layer.size()[:-2] + (self.all_head_size,)
+        context_layer = context_layer.view(*new_context_layer_shape)
+
+        outputs = (context_layer, attention_probs) if output_attentions else (context_layer,)
+
+        if self.is_decoder:
+            outputs = outputs + (past_key_value,)
+        return outputs
+
+
+# Copied from transformers.models.bert.modeling_bert.BertSelfOutput
+class RobertaSelfOutput(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        # QDQ change below
+        # Quantize Linear layer
+        self.dense = quant_nn.QuantLinear(config.hidden_size, config.hidden_size)
+
+        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        # QDQ change below
+        # Quantize the inputs to the residual add
+        self.add_local_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+        self.add_residual_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+
+    def forward(self, hidden_states, input_tensor):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.dropout(hidden_states)
+        # QDQ change below
+        # Quantize the inputs to the residual add
+        add_local = self.add_local_input_quantizer(hidden_states)
+        add_residual = self.add_residual_input_quantizer(input_tensor)
+        hidden_states = self.LayerNorm(add_local + add_residual)
+        return hidden_states
+
+
+# Copied from transformers.models.bert.modeling_bert.BertAttention with Bert->Roberta
+class RobertaAttention(nn.Module):
+    def __init__(self, config, position_embedding_type=None):
+        super().__init__()
+        self.self = RobertaSelfAttention(config, position_embedding_type=position_embedding_type)
+        self.output = RobertaSelfOutput(config)
+        self.pruned_heads = set()
+
+    def prune_heads(self, heads):
+        if len(heads) == 0:
+            return
+        heads, index = find_pruneable_heads_and_indices(
+            heads, self.self.num_attention_heads, self.self.attention_head_size, self.pruned_heads
+        )
+
+        # Prune linear layers
+        self.self.query = prune_linear_layer(self.self.query, index)
+        self.self.key = prune_linear_layer(self.self.key, index)
+        self.self.value = prune_linear_layer(self.self.value, index)
+        self.output.dense = prune_linear_layer(self.output.dense, index, dim=1)
+
+        # Update hyper params and store pruned heads
+        self.self.num_attention_heads = self.self.num_attention_heads - len(heads)
+        self.self.all_head_size = self.self.attention_head_size * self.self.num_attention_heads
+        self.pruned_heads = self.pruned_heads.union(heads)
+
+    def forward(
+        self,
+        hidden_states,
+        attention_mask=None,
+        head_mask=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        past_key_value=None,
+        output_attentions=False,
+    ):
+        self_outputs = self.self(
+            hidden_states,
+            attention_mask,
+            head_mask,
+            encoder_hidden_states,
+            encoder_attention_mask,
+            past_key_value,
+            output_attentions,
+        )
+        attention_output = self.output(self_outputs[0], hidden_states)
+        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
+        return outputs
+
+
+# Copied from transformers.models.bert.modeling_bert.BertIntermediate
+class RobertaIntermediate(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        # QDQ change below
+        self.dense = quant_nn.QuantLinear(config.hidden_size, config.intermediate_size)
+        if isinstance(config.hidden_act, str):
+            self.intermediate_act_fn = ACT2FN[config.hidden_act]
+        else:
+            self.intermediate_act_fn = config.hidden_act
+
+    def forward(self, hidden_states):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.intermediate_act_fn(hidden_states)
+        return hidden_states
+
+
+# Copied from transformers.models.bert.modeling_bert.BertOutput
+class RobertaOutput(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        # QDQ change below
+        # Quantize Linear layer
+        self.dense = quant_nn.QuantLinear(config.intermediate_size, config.hidden_size)
+        self.LayerNorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        # QDQ change below
+        # Quantize the inputs to the residual add
+        self.add_local_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+        self.add_residual_input_quantizer = TensorQuantizer(quant_nn.QuantLinear.default_quant_desc_input)
+
+    def forward(self, hidden_states, input_tensor):
+        hidden_states = self.dense(hidden_states)
+        hidden_states = self.dropout(hidden_states)
+        # QDQ change below
+        # Quantize the inputs to the residual add
+        add_local = self.add_local_input_quantizer(hidden_states)
+        add_residual = self.add_residual_input_quantizer(input_tensor)
+        hidden_states = self.LayerNorm(add_local + add_residual)
+        return hidden_states
+
+
+# Copied from transformers.models.bert.modeling_bert.BertLayer with Bert->Roberta
+class RobertaLayer(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.chunk_size_feed_forward = config.chunk_size_feed_forward
+        self.seq_len_dim = 1
+        self.attention = RobertaAttention(config)
+        self.is_decoder = config.is_decoder
+        self.add_cross_attention = config.add_cross_attention
+        if self.add_cross_attention:
+            if not self.is_decoder:
+                raise ValueError(f"{self} should be used as a decoder model if cross attention is added")
+            self.crossattention = RobertaAttention(config, position_embedding_type="absolute")
+        self.intermediate = RobertaIntermediate(config)
+        self.output = RobertaOutput(config)
+
+    def forward(
+        self,
+        hidden_states,
+        attention_mask=None,
+        head_mask=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        past_key_value=None,
+        output_attentions=False,
+    ):
+        # decoder uni-directional self-attention cached key/values tuple is at positions 1,2
+        self_attn_past_key_value = past_key_value[:2] if past_key_value is not None else None
+        self_attention_outputs = self.attention(
+            hidden_states,
+            attention_mask,
+            head_mask,
+            output_attentions=output_attentions,
+            past_key_value=self_attn_past_key_value,
+        )
+        attention_output = self_attention_outputs[0]
+
+        # if decoder, the last output is tuple of self-attn cache
+        if self.is_decoder:
+            outputs = self_attention_outputs[1:-1]
+            present_key_value = self_attention_outputs[-1]
+        else:
+            outputs = self_attention_outputs[1:]  # add self attentions if we output attention weights
+
+        cross_attn_present_key_value = None
+        if self.is_decoder and encoder_hidden_states is not None:
+            if not hasattr(self, "crossattention"):
+                raise ValueError(
+                    f"If `encoder_hidden_states` are passed, {self} has to be instantiated with cross-attention layers by setting `config.add_cross_attention=True`"  # noqa: E501
+                )
+
+            # cross_attn cached key/values tuple is at positions 3,4 of past_key_value tuple
+            cross_attn_past_key_value = past_key_value[-2:] if past_key_value is not None else None
+            cross_attention_outputs = self.crossattention(
+                attention_output,
+                attention_mask,
+                head_mask,
+                encoder_hidden_states,
+                encoder_attention_mask,
+                cross_attn_past_key_value,
+                output_attentions,
+            )
+            attention_output = cross_attention_outputs[0]
+            outputs = outputs + cross_attention_outputs[1:-1]  # add cross attentions if we output attention weights
+
+            # add cross-attn cache to positions 3,4 of present_key_value tuple
+            cross_attn_present_key_value = cross_attention_outputs[-1]
+            present_key_value = present_key_value + cross_attn_present_key_value
+
+        layer_output = apply_chunking_to_forward(
+            self.feed_forward_chunk, self.chunk_size_feed_forward, self.seq_len_dim, attention_output
+        )
+        outputs = (layer_output,) + outputs
+
+        # if decoder, return the attn key/values as the last output
+        if self.is_decoder:
+            outputs = outputs + (present_key_value,)
+
+        return outputs
+
+    def feed_forward_chunk(self, attention_output):
+        intermediate_output = self.intermediate(attention_output)
+        layer_output = self.output(intermediate_output, attention_output)
+        return layer_output
+
+
+# Copied from transformers.models.bert.modeling_bert.BertEncoder with Bert->Roberta
+class RobertaEncoder(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.config = config
+        self.layer = nn.ModuleList([RobertaLayer(config) for _ in range(config.num_hidden_layers)])
+        self.gradient_checkpointing = False
+
+    def forward(
+        self,
+        hidden_states,
+        attention_mask=None,
+        head_mask=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        past_key_values=None,
+        use_cache=None,
+        output_attentions=False,
+        output_hidden_states=False,
+        return_dict=True,
+    ):
+        all_hidden_states = () if output_hidden_states else None
+        all_self_attentions = () if output_attentions else None
+        all_cross_attentions = () if output_attentions and self.config.add_cross_attention else None
+
+        next_decoder_cache = () if use_cache else None
+        for i, layer_module in enumerate(self.layer):
+            if output_hidden_states:
+                all_hidden_states = all_hidden_states + (hidden_states,)
+
+            layer_head_mask = head_mask[i] if head_mask is not None else None
+            past_key_value = past_key_values[i] if past_key_values is not None else None
+
+            if self.gradient_checkpointing and self.training:
+
+                if use_cache:
+                    logger.warning(
+                        "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
+                    )
+                    use_cache = False
+
+                def create_custom_forward(module):
+                    def custom_forward(*inputs):
+                        return module(*inputs, past_key_value, output_attentions)
+
+                    return custom_forward
+
+                layer_outputs = torch.utils.checkpoint.checkpoint(
+                    create_custom_forward(layer_module),
+                    hidden_states,
+                    attention_mask,
+                    layer_head_mask,
+                    encoder_hidden_states,
+                    encoder_attention_mask,
+                )
+            else:
+                layer_outputs = layer_module(
+                    hidden_states,
+                    attention_mask,
+                    layer_head_mask,
+                    encoder_hidden_states,
+                    encoder_attention_mask,
+                    past_key_value,
+                    output_attentions,
+                )
+
+            hidden_states = layer_outputs[0]
+            if use_cache:
+                next_decoder_cache += (layer_outputs[-1],)
+            if output_attentions:
+                all_self_attentions = all_self_attentions + (layer_outputs[1],)
+                if self.config.add_cross_attention:
+                    all_cross_attentions = all_cross_attentions + (layer_outputs[2],)
+
+        if output_hidden_states:
+            all_hidden_states = all_hidden_states + (hidden_states,)
+
+        if not return_dict:
+            return tuple(
+                v
+                for v in [
+                    hidden_states,
+                    next_decoder_cache,
+                    all_hidden_states,
+                    all_self_attentions,
+                    all_cross_attentions,
+                ]
+                if v is not None
+            )
+        return BaseModelOutputWithPastAndCrossAttentions(
+            last_hidden_state=hidden_states,
+            past_key_values=next_decoder_cache,
+            hidden_states=all_hidden_states,
+            attentions=all_self_attentions,
+            cross_attentions=all_cross_attentions,
+        )
+
+
+# Copied from transformers.models.bert.modeling_bert.BertPooler
+class RobertaPooler(nn.Module):
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.activation = nn.Tanh()
+
+    def forward(self, hidden_states):
+        # We "pool" the model by simply taking the hidden state corresponding
+        # to the first token.
+        first_token_tensor = hidden_states[:, 0]
+        pooled_output = self.dense(first_token_tensor)
+        pooled_output = self.activation(pooled_output)
+        return pooled_output
+
+
+class RobertaPreTrainedModel(PreTrainedModel):
+    """
+    An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
+    models.
+    """
+
+    config_class = RobertaConfig
+    base_model_prefix = "roberta"
+    supports_gradient_checkpointing = True
+
+    # Copied from transformers.models.bert.modeling_bert.BertPreTrainedModel._init_weights
+    def _init_weights(self, module):
+        """Initialize the weights"""
+        if isinstance(module, nn.Linear):
+            # Slightly different from the TF version which uses truncated_normal for initialization
+            # cf https://github.com/pytorch/pytorch/pull/5617
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.bias is not None:
+                module.bias.data.zero_()
+        elif isinstance(module, nn.Embedding):
+            module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
+            if module.padding_idx is not None:
+                module.weight.data[module.padding_idx].zero_()
+        elif isinstance(module, nn.LayerNorm):
+            module.bias.data.zero_()
+            module.weight.data.fill_(1.0)
+
+    def _set_gradient_checkpointing(self, module, value=False):
+        if isinstance(module, RobertaEncoder):
+            module.gradient_checkpointing = value
+
+    def update_keys_to_ignore(self, config, del_keys_to_ignore):
+        """Remove some keys from ignore list"""
+        if not config.tie_word_embeddings:
+            # must make a new list, or the class variable gets modified!
+            self._keys_to_ignore_on_save = [k for k in self._keys_to_ignore_on_save if k not in del_keys_to_ignore]
+            self._keys_to_ignore_on_load_missing = [
+                k for k in self._keys_to_ignore_on_load_missing if k not in del_keys_to_ignore
+            ]
+
+
+ROBERTA_START_DOCSTRING = r"""
+
+    This model inherits from :class:`~transformers.PreTrainedModel`. Check the superclass documentation for the generic
+    methods the library implements for all its model (such as downloading or saving, resizing the input embeddings,
+    pruning heads etc.)
+
+    This model is also a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__
+    subclass. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to
+    general usage and behavior.
+
+    Parameters:
+        config (:class:`~transformers.RobertaConfig`): Model configuration class with all the parameters of the
+            model. Initializing with a config file does not load the weights associated with the model, only the
+            configuration. Check out the :meth:`~transformers.PreTrainedModel.from_pretrained` method to load the model
+            weights.
+"""
+
+ROBERTA_INPUTS_DOCSTRING = r"""
+    Args:
+        input_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`):
+            Indices of input sequence tokens in the vocabulary.
+
+            Indices can be obtained using :class:`~transformers.RobertaTokenizer`. See
+            :meth:`transformers.PreTrainedTokenizer.encode` and :meth:`transformers.PreTrainedTokenizer.__call__` for
+            details.
+
+            `What are input IDs? <../glossary.html#input-ids>`__
+        attention_mask (:obj:`torch.FloatTensor` of shape :obj:`({0})`, `optional`):
+            Mask to avoid performing attention on padding token indices. Mask values selected in ``[0, 1]``:
+
+            - 1 for tokens that are **not masked**,
+            - 0 for tokens that are **masked**.
+
+            `What are attention masks? <../glossary.html#attention-mask>`__
+        token_type_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
+            Segment token indices to indicate first and second portions of the inputs. Indices are selected in ``[0,
+            1]``:
+
+            - 0 corresponds to a `sentence A` token,
+            - 1 corresponds to a `sentence B` token.
+
+            `What are token type IDs? <../glossary.html#token-type-ids>`_
+        position_ids (:obj:`torch.LongTensor` of shape :obj:`({0})`, `optional`):
+            Indices of positions of each input sequence tokens in the position embeddings. Selected in the range ``[0,
+            config.max_position_embeddings - 1]``.
+
+            `What are position IDs? <../glossary.html#position-ids>`_
+        head_mask (:obj:`torch.FloatTensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`):
+            Mask to nullify selected heads of the self-attention modules. Mask values selected in ``[0, 1]``:
+
+            - 1 indicates the head is **not masked**,
+            - 0 indicates the head is **masked**.
+
+        inputs_embeds (:obj:`torch.FloatTensor` of shape :obj:`({0}, hidden_size)`, `optional`):
+            Optionally, instead of passing :obj:`input_ids` you can choose to directly pass an embedded representation.
+            This is useful if you want more control over how to convert :obj:`input_ids` indices into associated
+            vectors than the model's internal embedding lookup matrix.
+        output_attentions (:obj:`bool`, `optional`):
+            Whether or not to return the attentions tensors of all attention layers. See ``attentions`` under returned
+            tensors for more detail.
+        output_hidden_states (:obj:`bool`, `optional`):
+            Whether or not to return the hidden states of all layers. See ``hidden_states`` under returned tensors for
+            more detail.
+        return_dict (:obj:`bool`, `optional`):
+            Whether or not to return a :class:`~transformers.file_utils.ModelOutput` instead of a plain tuple.
+"""
+
+
+@add_start_docstrings(
+    "The bare RoBERTa Model transformer outputting raw hidden-states without any specific head on top.",
+    ROBERTA_START_DOCSTRING,
+)
+class RobertaModel(RobertaPreTrainedModel):
+    """
+
+    The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
+    cross-attention is added between the self-attention layers, following the architecture described in `Attention is
+    all you need`_ by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
+    Kaiser and Illia Polosukhin.
+
+    To behave as an decoder the model needs to be initialized with the :obj:`is_decoder` argument of the configuration
+    set to :obj:`True`. To be used in a Seq2Seq model, the model needs to initialized with both :obj:`is_decoder`
+    argument and :obj:`add_cross_attention` set to :obj:`True`; an :obj:`encoder_hidden_states` is then expected as an
+    input to the forward pass.
+
+    .. _`Attention is all you need`: https://arxiv.org/abs/1706.03762
+
+    """
+
+    _keys_to_ignore_on_load_missing = [r"position_ids"]
+
+    # Copied from transformers.models.bert.modeling_bert.BertModel.__init__ with Bert->Roberta
+    def __init__(self, config, add_pooling_layer=True):
+        super().__init__(config)
+        self.config = config
+
+        self.embeddings = RobertaEmbeddings(config)
+        self.encoder = RobertaEncoder(config)
+
+        self.pooler = RobertaPooler(config) if add_pooling_layer else None
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    def get_input_embeddings(self):
+        return self.embeddings.word_embeddings
+
+    def set_input_embeddings(self, value):
+        self.embeddings.word_embeddings = value
+
+    def _prune_heads(self, heads_to_prune):
+        """
+        Prunes heads of the model. heads_to_prune: dict of {layer_num: list of heads to prune in this layer} See base
+        class PreTrainedModel
+        """
+        for layer, heads in heads_to_prune.items():
+            self.encoder.layer[layer].attention.prune_heads(heads)
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
+    @add_code_sample_docstrings(
+        processor_class=_TOKENIZER_FOR_DOC,
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=BaseModelOutputWithPoolingAndCrossAttentions,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    # Copied from transformers.models.bert.modeling_bert.BertModel.forward
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        past_key_values=None,
+        use_cache=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
+            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
+            the model is configured as a decoder.
+        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
+            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
+            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:
+
+            - 1 for tokens that are **not masked**,
+            - 0 for tokens that are **masked**.
+        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
+            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
+
+            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
+            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
+            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
+        use_cache (:obj:`bool`, `optional`):
+            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
+            decoding (see :obj:`past_key_values`).
+        """  # noqa: E501
+        output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
+        output_hidden_states = (
+            output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
+        )
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        if self.config.is_decoder:
+            use_cache = use_cache if use_cache is not None else self.config.use_cache
+        else:
+            use_cache = False
+
+        if input_ids is not None and inputs_embeds is not None:
+            raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
+        elif input_ids is not None:
+            input_shape = input_ids.size()
+        elif inputs_embeds is not None:
+            input_shape = inputs_embeds.size()[:-1]
+        else:
+            raise ValueError("You have to specify either input_ids or inputs_embeds")
+
+        batch_size, seq_length = input_shape
+        device = input_ids.device if input_ids is not None else inputs_embeds.device
+
+        # past_key_values_length
+        past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
+
+        if attention_mask is None:
+            attention_mask = torch.ones(((batch_size, seq_length + past_key_values_length)), device=device)
+
+        if token_type_ids is None:
+            if hasattr(self.embeddings, "token_type_ids"):
+                buffered_token_type_ids = self.embeddings.token_type_ids[:, :seq_length]
+                buffered_token_type_ids_expanded = buffered_token_type_ids.expand(batch_size, seq_length)
+                token_type_ids = buffered_token_type_ids_expanded
+            else:
+                token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device)
+
+        # We can provide a self-attention mask of dimensions [batch_size, from_seq_length, to_seq_length]
+        # ourselves in which case we just need to make it broadcastable to all heads.
+        extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)
+
+        # If a 2D or 3D attention mask is provided for the cross-attention
+        # we need to make broadcastable to [batch_size, num_heads, seq_length, seq_length]
+        if self.config.is_decoder and encoder_hidden_states is not None:
+            encoder_batch_size, encoder_sequence_length, _ = encoder_hidden_states.size()
+            encoder_hidden_shape = (encoder_batch_size, encoder_sequence_length)
+            if encoder_attention_mask is None:
+                encoder_attention_mask = torch.ones(encoder_hidden_shape, device=device)
+            encoder_extended_attention_mask = self.invert_attention_mask(encoder_attention_mask)
+        else:
+            encoder_extended_attention_mask = None
+
+        # Prepare head mask if needed
+        # 1.0 in head_mask indicate we keep the head
+        # attention_probs has shape bsz x n_heads x N x N
+        # input head_mask has shape [num_heads] or [num_hidden_layers x num_heads]
+        # and head_mask is converted to shape [num_hidden_layers x batch x num_heads x seq_length x seq_length]
+        head_mask = self.get_head_mask(head_mask, self.config.num_hidden_layers)
+
+        embedding_output = self.embeddings(
+            input_ids=input_ids,
+            position_ids=position_ids,
+            token_type_ids=token_type_ids,
+            inputs_embeds=inputs_embeds,
+            past_key_values_length=past_key_values_length,
+        )
+        encoder_outputs = self.encoder(
+            embedding_output,
+            attention_mask=extended_attention_mask,
+            head_mask=head_mask,
+            encoder_hidden_states=encoder_hidden_states,
+            encoder_attention_mask=encoder_extended_attention_mask,
+            past_key_values=past_key_values,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = encoder_outputs[0]
+        pooled_output = self.pooler(sequence_output) if self.pooler is not None else None
+
+        if not return_dict:
+            return (sequence_output, pooled_output) + encoder_outputs[1:]
+
+        return BaseModelOutputWithPoolingAndCrossAttentions(
+            last_hidden_state=sequence_output,
+            pooler_output=pooled_output,
+            past_key_values=encoder_outputs.past_key_values,
+            hidden_states=encoder_outputs.hidden_states,
+            attentions=encoder_outputs.attentions,
+            cross_attentions=encoder_outputs.cross_attentions,
+        )
+
+
+@add_start_docstrings(
+    """RoBERTa Model with a `language modeling` head on top for CLM fine-tuning. """, ROBERTA_START_DOCSTRING
+)
+class RobertaForCausalLM(RobertaPreTrainedModel):
+    _keys_to_ignore_on_save = [r"lm_head.decoder.weight", r"lm_head.decoder.bias"]
+    _keys_to_ignore_on_load_missing = [r"position_ids", r"lm_head.decoder.weight", r"lm_head.decoder.bias"]
+    _keys_to_ignore_on_load_unexpected = [r"pooler"]
+
+    def __init__(self, config):
+        super().__init__(config)
+
+        if not config.is_decoder:
+            logger.warning("If you want to use `RobertaLMHeadModel` as a standalone, add `is_decoder=True.`")
+
+        self.roberta = RobertaModel(config, add_pooling_layer=False)
+        self.lm_head = RobertaLMHead(config)
+
+        # The LM head weights require special treatment only when they are tied with the word embeddings
+        self.update_keys_to_ignore(config, ["lm_head.decoder.weight"])
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    def get_output_embeddings(self):
+        return self.lm_head.decoder
+
+    def set_output_embeddings(self, new_embeddings):
+        self.lm_head.decoder = new_embeddings
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
+    @replace_return_docstrings(output_type=CausalLMOutputWithCrossAttentions, config_class=_CONFIG_FOR_DOC)
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        labels=None,
+        past_key_values=None,
+        use_cache=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        encoder_hidden_states  (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`, `optional`):
+            Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
+            the model is configured as a decoder.
+        encoder_attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
+            Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
+            the cross-attention if the model is configured as a decoder. Mask values selected in ``[0, 1]``:
+
+            - 1 for tokens that are **not masked**,
+            - 0 for tokens that are **masked**.
+
+        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
+            Labels for computing the left-to-right language modeling loss (next word prediction). Indices should be in
+            ``[-100, 0, ..., config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are
+            ignored (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``
+        past_key_values (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
+            Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
+
+            If :obj:`past_key_values` are used, the user can optionally input only the last :obj:`decoder_input_ids`
+            (those that don't have their past key value states given to this model) of shape :obj:`(batch_size, 1)`
+            instead of all :obj:`decoder_input_ids` of shape :obj:`(batch_size, sequence_length)`.
+        use_cache (:obj:`bool`, `optional`):
+            If set to :obj:`True`, :obj:`past_key_values` key value states are returned and can be used to speed up
+            decoding (see :obj:`past_key_values`).
+
+        Returns:
+
+        Example::
+
+            >>> from transformers import RobertaTokenizer, RobertaForCausalLM, RobertaConfig
+            >>> import torch
+
+            >>> tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
+            >>> config = RobertaConfig.from_pretrained("roberta-base")
+            >>> config.is_decoder = True
+            >>> model = RobertaForCausalLM.from_pretrained('roberta-base', config=config)
+
+            >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
+            >>> outputs = model(**inputs)
+
+            >>> prediction_logits = outputs.logits
+        """  # noqa: E501
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        if labels is not None:
+            use_cache = False
+
+        outputs = self.roberta(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            encoder_hidden_states=encoder_hidden_states,
+            encoder_attention_mask=encoder_attention_mask,
+            past_key_values=past_key_values,
+            use_cache=use_cache,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+
+        sequence_output = outputs[0]
+        prediction_scores = self.lm_head(sequence_output)
+
+        lm_loss = None
+        if labels is not None:
+            # we are doing next-token prediction; shift prediction scores and input ids by one
+            shifted_prediction_scores = prediction_scores[:, :-1, :].contiguous()
+            labels = labels[:, 1:].contiguous()
+            loss_fct = CrossEntropyLoss()
+            lm_loss = loss_fct(shifted_prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
+
+        if not return_dict:
+            output = (prediction_scores,) + outputs[2:]
+            return ((lm_loss,) + output) if lm_loss is not None else output
+
+        return CausalLMOutputWithCrossAttentions(
+            loss=lm_loss,
+            logits=prediction_scores,
+            past_key_values=outputs.past_key_values,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+            cross_attentions=outputs.cross_attentions,
+        )
+
+    def prepare_inputs_for_generation(self, input_ids, past=None, attention_mask=None, **model_kwargs):
+        input_shape = input_ids.shape
+        # if model is used as a decoder in encoder-decoder model, the decoder attention mask is created on the fly
+        if attention_mask is None:
+            attention_mask = input_ids.new_ones(input_shape)
+
+        # cut decoder_input_ids if past is used
+        if past is not None:
+            input_ids = input_ids[:, -1:]
+
+        return {"input_ids": input_ids, "attention_mask": attention_mask, "past_key_values": past}
+
+    def _reorder_cache(self, past, beam_idx):
+        reordered_past = ()
+        for layer_past in past:
+            reordered_past += (tuple(past_state.index_select(0, beam_idx) for past_state in layer_past),)
+        return reordered_past
+
+
+@add_start_docstrings("""RoBERTa Model with a `language modeling` head on top. """, ROBERTA_START_DOCSTRING)
+class RobertaForMaskedLM(RobertaPreTrainedModel):
+    _keys_to_ignore_on_save = [r"lm_head.decoder.weight", r"lm_head.decoder.bias"]
+    _keys_to_ignore_on_load_missing = [r"position_ids", r"lm_head.decoder.weight", r"lm_head.decoder.bias"]
+    _keys_to_ignore_on_load_unexpected = [r"pooler"]
+
+    def __init__(self, config):
+        super().__init__(config)
+
+        if config.is_decoder:
+            logger.warning(
+                "If you want to use `RobertaForMaskedLM` make sure `config.is_decoder=False` for "
+                "bi-directional self-attention."
+            )
+
+        self.roberta = RobertaModel(config, add_pooling_layer=False)
+        self.lm_head = RobertaLMHead(config)
+
+        # The LM head weights require special treatment only when they are tied with the word embeddings
+        self.update_keys_to_ignore(config, ["lm_head.decoder.weight"])
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    def get_output_embeddings(self):
+        return self.lm_head.decoder
+
+    def set_output_embeddings(self, new_embeddings):
+        self.lm_head.decoder = new_embeddings
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
+    @add_code_sample_docstrings(
+        processor_class=_TOKENIZER_FOR_DOC,
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=MaskedLMOutput,
+        config_class=_CONFIG_FOR_DOC,
+        mask="<mask>",
+    )
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        encoder_hidden_states=None,
+        encoder_attention_mask=None,
+        labels=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
+            Labels for computing the masked language modeling loss. Indices should be in ``[-100, 0, ...,
+            config.vocab_size]`` (see ``input_ids`` docstring) Tokens with indices set to ``-100`` are ignored
+            (masked), the loss is only computed for the tokens with labels in ``[0, ..., config.vocab_size]``
+        kwargs (:obj:`Dict[str, any]`, optional, defaults to `{}`):
+            Used to hide legacy arguments that have been deprecated.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        outputs = self.roberta(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            encoder_hidden_states=encoder_hidden_states,
+            encoder_attention_mask=encoder_attention_mask,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = outputs[0]
+        prediction_scores = self.lm_head(sequence_output)
+
+        masked_lm_loss = None
+        if labels is not None:
+            loss_fct = CrossEntropyLoss()
+            masked_lm_loss = loss_fct(prediction_scores.view(-1, self.config.vocab_size), labels.view(-1))
+
+        if not return_dict:
+            output = (prediction_scores,) + outputs[2:]
+            return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
+
+        return MaskedLMOutput(
+            loss=masked_lm_loss,
+            logits=prediction_scores,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+
+
+class RobertaLMHead(nn.Module):
+    """Roberta Head for masked language modeling."""
+
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        self.layer_norm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_eps)
+
+        self.decoder = nn.Linear(config.hidden_size, config.vocab_size)
+        self.bias = nn.Parameter(torch.zeros(config.vocab_size))
+        self.decoder.bias = self.bias
+
+    def forward(self, features, **kwargs):
+        x = self.dense(features)
+        x = gelu(x)
+        x = self.layer_norm(x)
+
+        # project back to size of vocabulary with bias
+        x = self.decoder(x)
+
+        return x
+
+    def _tie_weights(self):
+        # To tie those two weights if they get disconnected (on TPU or when the bias is resized)
+        self.bias = self.decoder.bias
+
+
+@add_start_docstrings(
+    """
+    RoBERTa Model transformer with a sequence classification/regression head on top (a linear layer on top of the
+    pooled output) e.g. for GLUE tasks.
+    """,
+    ROBERTA_START_DOCSTRING,
+)
+class QDQRobertaForSequenceClassification(RobertaPreTrainedModel):
+    _keys_to_ignore_on_load_missing = [r"position_ids"]
+
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.config = config
+
+        self.roberta = RobertaModel(config, add_pooling_layer=False)
+        self.classifier = RobertaClassificationHead(config)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
+    @add_code_sample_docstrings(
+        processor_class=_TOKENIZER_FOR_DOC,
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=SequenceClassifierOutput,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        labels=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
+            Labels for computing the sequence classification/regression loss. Indices should be in :obj:`[0, ...,
+            config.num_labels - 1]`. If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
+            If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        outputs = self.roberta(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        sequence_output = outputs[0]
+        logits = self.classifier(sequence_output)
+
+        loss = None
+        if labels is not None:
+            if self.config.problem_type is None:
+                if self.num_labels == 1:
+                    self.config.problem_type = "regression"
+                elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
+                    self.config.problem_type = "single_label_classification"
+                else:
+                    self.config.problem_type = "multi_label_classification"
+
+            if self.config.problem_type == "regression":
+                loss_fct = MSELoss()
+                if self.num_labels == 1:
+                    loss = loss_fct(logits.squeeze(), labels.squeeze())
+                else:
+                    loss = loss_fct(logits, labels)
+            elif self.config.problem_type == "single_label_classification":
+                loss_fct = CrossEntropyLoss()
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+            elif self.config.problem_type == "multi_label_classification":
+                loss_fct = BCEWithLogitsLoss()
+                loss = loss_fct(logits, labels)
+
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+
+        return SequenceClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+
+
+@add_start_docstrings(
+    """
+    Roberta Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a
+    softmax) e.g. for RocStories/SWAG tasks.
+    """,
+    ROBERTA_START_DOCSTRING,
+)
+class RobertaForMultipleChoice(RobertaPreTrainedModel):
+    _keys_to_ignore_on_load_missing = [r"position_ids"]
+
+    def __init__(self, config):
+        super().__init__(config)
+
+        self.roberta = RobertaModel(config)
+        self.dropout = nn.Dropout(config.hidden_dropout_prob)
+        self.classifier = nn.Linear(config.hidden_size, 1)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, num_choices, sequence_length"))
+    @add_code_sample_docstrings(
+        processor_class=_TOKENIZER_FOR_DOC,
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=MultipleChoiceModelOutput,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids=None,
+        token_type_ids=None,
+        attention_mask=None,
+        labels=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
+            Labels for computing the multiple choice classification loss. Indices should be in ``[0, ...,
+            num_choices-1]`` where :obj:`num_choices` is the size of the second dimension of the input tensors. (See
+            :obj:`input_ids` above)
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        num_choices = input_ids.shape[1] if input_ids is not None else inputs_embeds.shape[1]
+
+        flat_input_ids = input_ids.view(-1, input_ids.size(-1)) if input_ids is not None else None
+        flat_position_ids = position_ids.view(-1, position_ids.size(-1)) if position_ids is not None else None
+        flat_token_type_ids = token_type_ids.view(-1, token_type_ids.size(-1)) if token_type_ids is not None else None
+        flat_attention_mask = attention_mask.view(-1, attention_mask.size(-1)) if attention_mask is not None else None
+        flat_inputs_embeds = (
+            inputs_embeds.view(-1, inputs_embeds.size(-2), inputs_embeds.size(-1))
+            if inputs_embeds is not None
+            else None
+        )
+
+        outputs = self.roberta(
+            flat_input_ids,
+            position_ids=flat_position_ids,
+            token_type_ids=flat_token_type_ids,
+            attention_mask=flat_attention_mask,
+            head_mask=head_mask,
+            inputs_embeds=flat_inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+        pooled_output = outputs[1]
+
+        pooled_output = self.dropout(pooled_output)
+        logits = self.classifier(pooled_output)
+        reshaped_logits = logits.view(-1, num_choices)
+
+        loss = None
+        if labels is not None:
+            loss_fct = CrossEntropyLoss()
+            loss = loss_fct(reshaped_logits, labels)
+
+        if not return_dict:
+            output = (reshaped_logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+
+        return MultipleChoiceModelOutput(
+            loss=loss,
+            logits=reshaped_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+
+
+@add_start_docstrings(
+    """
+    Roberta Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for
+    Named-Entity-Recognition (NER) tasks.
+    """,
+    ROBERTA_START_DOCSTRING,
+)
+class RobertaForTokenClassification(RobertaPreTrainedModel):
+    _keys_to_ignore_on_load_unexpected = [r"pooler"]
+    _keys_to_ignore_on_load_missing = [r"position_ids"]
+
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+
+        self.roberta = RobertaModel(config, add_pooling_layer=False)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.classifier = nn.Linear(config.hidden_size, config.num_labels)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
+    @add_code_sample_docstrings(
+        processor_class=_TOKENIZER_FOR_DOC,
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=TokenClassifierOutput,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        labels=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`):
+            Labels for computing the token classification loss. Indices should be in ``[0, ..., config.num_labels -
+            1]``.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        outputs = self.roberta(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+
+        sequence_output = outputs[0]
+
+        sequence_output = self.dropout(sequence_output)
+        logits = self.classifier(sequence_output)
+
+        loss = None
+        if labels is not None:
+            loss_fct = CrossEntropyLoss()
+            # Only keep active parts of the loss
+            if attention_mask is not None:
+                active_loss = attention_mask.view(-1) == 1
+                active_logits = logits.view(-1, self.num_labels)
+                active_labels = torch.where(
+                    active_loss, labels.view(-1), torch.tensor(loss_fct.ignore_index).type_as(labels)
+                )
+                loss = loss_fct(active_logits, active_labels)
+            else:
+                loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
+
+        if not return_dict:
+            output = (logits,) + outputs[2:]
+            return ((loss,) + output) if loss is not None else output
+
+        return TokenClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+
+
+class RobertaClassificationHead(nn.Module):
+    """Head for sentence-level classification tasks."""
+
+    def __init__(self, config):
+        super().__init__()
+        self.dense = nn.Linear(config.hidden_size, config.hidden_size)
+        classifier_dropout = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.dropout = nn.Dropout(classifier_dropout)
+        self.out_proj = nn.Linear(config.hidden_size, config.num_labels)
+
+    def forward(self, features, **kwargs):
+        x = features[:, 0, :]  # take <s> token (equiv. to [CLS])
+        x = self.dropout(x)
+        x = self.dense(x)
+        x = torch.tanh(x)
+        x = self.dropout(x)
+        x = self.out_proj(x)
+        return x
+
+
+@add_start_docstrings(
+    """
+    Roberta Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
+    layers on top of the hidden-states output to compute `span start logits` and `span end logits`).
+    """,
+    ROBERTA_START_DOCSTRING,
+)
+class RobertaForQuestionAnswering(RobertaPreTrainedModel):
+    _keys_to_ignore_on_load_unexpected = [r"pooler"]
+    _keys_to_ignore_on_load_missing = [r"position_ids"]
+
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+
+        self.roberta = RobertaModel(config, add_pooling_layer=False)
+        self.qa_outputs = nn.Linear(config.hidden_size, config.num_labels)
+
+        # Initialize weights and apply final processing
+        self.post_init()
+
+    @add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
+    @add_code_sample_docstrings(
+        processor_class=_TOKENIZER_FOR_DOC,
+        checkpoint=_CHECKPOINT_FOR_DOC,
+        output_type=QuestionAnsweringModelOutput,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        start_positions=None,
+        end_positions=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,
+    ):
+        r"""
+        start_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
+            Labels for position (index) of the start of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the
+            sequence are not taken into account for computing the loss.
+        end_positions (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
+            Labels for position (index) of the end of the labelled span for computing the token classification loss.
+            Positions are clamped to the length of the sequence (:obj:`sequence_length`). Position outside of the
+            sequence are not taken into account for computing the loss.
+        """
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+
+        outputs = self.roberta(
+            input_ids,
+            attention_mask=attention_mask,
+            token_type_ids=token_type_ids,
+            position_ids=position_ids,
+            head_mask=head_mask,
+            inputs_embeds=inputs_embeds,
+            output_attentions=output_attentions,
+            output_hidden_states=output_hidden_states,
+            return_dict=return_dict,
+        )
+
+        sequence_output = outputs[0]
+
+        logits = self.qa_outputs(sequence_output)
+        start_logits, end_logits = logits.split(1, dim=-1)
+        start_logits = start_logits.squeeze(-1).contiguous()
+        end_logits = end_logits.squeeze(-1).contiguous()
+
+        total_loss = None
+        if start_positions is not None and end_positions is not None:
+            # If we are on multi-GPU, split add a dimension
+            if len(start_positions.size()) > 1:
+                start_positions = start_positions.squeeze(-1)
+            if len(end_positions.size()) > 1:
+                end_positions = end_positions.squeeze(-1)
+            # sometimes the start/end positions are outside our model inputs, we ignore these terms
+            ignored_index = start_logits.size(1)
+            start_positions = start_positions.clamp(0, ignored_index)
+            end_positions = end_positions.clamp(0, ignored_index)
+
+            loss_fct = CrossEntropyLoss(ignore_index=ignored_index)
+            start_loss = loss_fct(start_logits, start_positions)
+            end_loss = loss_fct(end_logits, end_positions)
+            total_loss = (start_loss + end_loss) / 2
+
+        if not return_dict:
+            output = (start_logits, end_logits) + outputs[2:]
+            return ((total_loss,) + output) if total_loss is not None else output
+
+        return QuestionAnsweringModelOutput(
+            loss=total_loss,
+            start_logits=start_logits,
+            end_logits=end_logits,
+            hidden_states=outputs.hidden_states,
+            attentions=outputs.attentions,
+        )
+
+
+def create_position_ids_from_input_ids(input_ids, padding_idx, past_key_values_length=0):
+    """
+    Replace non-padding symbols with their position numbers. Position numbers begin at padding_idx+1. Padding symbols
+    are ignored. This is modified from fairseq's `utils.make_positions`.
+
+    Args:
+        x: torch.Tensor x:
+
+    Returns: torch.Tensor
+    """
+    # QDQ change below
+    # The series of casts and type-conversions here are carefully balanced to both work with ONNX export and XLA.
+    # int() -> float() because of a limitations in cumsum operator implementation in TensorRT
+    mask = input_ids.ne(padding_idx).float()
+    incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
+    return incremental_indices.long() + padding_idx
diff --git a/src/transformer_deploy/QDQModels/__init__.py b/src/transformer_deploy/QDQModels/__init__.py
new file mode 100644
index 00000000..d754dd37
--- /dev/null
+++ b/src/transformer_deploy/QDQModels/__init__.py
@@ -0,0 +1,13 @@
+#  Copyright 2021, Lefebvre Sarrut Services
+#
+#  Licensed under the Apache License, Version 2.0 (the "License");
+#  you may not use this file except in compliance with the License.
+#  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
diff --git a/src/transformer_deploy/backends/ort_utils.py b/src/transformer_deploy/backends/ort_utils.py
index 5cf738c4..1f83f320 100644
--- a/src/transformer_deploy/backends/ort_utils.py
+++ b/src/transformer_deploy/backends/ort_utils.py
@@ -36,7 +36,9 @@ def create_model_for_provider(path: str, provider_to_use: str) -> InferenceSessi
     return InferenceSession(path, options, providers=provider_to_use)
 
 
-def convert_to_onnx(model_pytorch: PreTrainedModel, output_path: str, inputs_pytorch: OD[str, torch.Tensor]) -> None:
+def convert_to_onnx(
+    model_pytorch: PreTrainedModel, output_path: str, inputs_pytorch: OD[str, torch.Tensor], opset: int = 12
+) -> None:
     # dynamic axis == variable length axis
     dynamic_axis = OrderedDict()
     for k in inputs_pytorch.keys():
@@ -47,7 +49,7 @@ def convert_to_onnx(model_pytorch: PreTrainedModel, output_path: str, inputs_pyt
             model_pytorch,  # model to optimize
             args=tuple(inputs_pytorch.values()),  # tuple of multiple inputs
             f=output_path,  # output path / file object
-            opset_version=12,  # the ONNX version to use
+            opset_version=opset,  # the ONNX version to use, 13 if quantized model, 12 for not quantized ones
             do_constant_folding=True,  # simplify model (replace constant expressions)
             input_names=list(inputs_pytorch.keys()),  # input names
             output_names=["output"],  # output axis name
@@ -65,7 +67,7 @@ def optimize_onnx(onnx_path: str, onnx_optim_fp16_path: str, use_cuda: bool) ->
         model_type="bert",
         use_gpu=use_cuda,
         opt_level=1,
-        num_heads=0,  # automatic detection
+        num_heads=0,  # automatic detection don't work with opset 13
         hidden_size=0,  # automatic detection
         optimization_options=optimization_options,
     )
diff --git a/src/transformer_deploy/backends/trt_utils.py b/src/transformer_deploy/backends/trt_utils.py
index 6034c601..6cd294b1 100644
--- a/src/transformer_deploy/backends/trt_utils.py
+++ b/src/transformer_deploy/backends/trt_utils.py
@@ -77,20 +77,20 @@ def setup_binding_shapes(
     host_inputs: List[np.ndarray],
     input_binding_idxs: List[int],
     output_binding_idxs: List[int],
-):
+) -> Tuple[List[np.ndarray], List[DeviceAllocation]]:
     # explicitly set dynamic input shapes, so dynamic output shapes can be computed internally
     for host_input, binding_index in zip(host_inputs, input_binding_idxs):
         context.set_binding_shape(binding_index, host_input.shape)
     assert context.all_binding_shapes_specified
-    host_outputs = []
-    device_outputs = []
+    host_outputs: List[np.ndarray] = []
+    device_outputs: List[DeviceAllocation] = []
     for binding_index in output_binding_idxs:
         output_shape = context.get_binding_shape(binding_index)
-    # allocate buffers to hold output results after copying back to host
-    buffer = np.empty(output_shape, dtype=np.float32)
-    host_outputs.append(buffer)
-    # allocate output buffers on device
-    device_outputs.append(cuda.mem_alloc(buffer.nbytes))
+        # allocate buffers to hold output results after copying back to host
+        buffer = np.empty(output_shape, dtype=np.float32)
+        host_outputs.append(buffer)
+        # allocate output buffers on device
+        device_outputs.append(cuda.mem_alloc(buffer.nbytes))
     return host_outputs, device_outputs
 
 
@@ -136,6 +136,8 @@ def build_engine(
     optimal_shape: Tuple[int, int],
     max_shape: Tuple[int, int],
     workspace_size: int,
+    fp16: bool,
+    int8: bool,
 ) -> ICudaEngine:
     with trt.Builder(logger) as builder:  # type: Builder
         with builder.create_network(
@@ -144,8 +146,6 @@ def build_engine(
             with trt.OnnxParser(network_definition, logger) as parser:  # type: OnnxParser
                 builder.max_batch_size = max_shape[0]  # max batch size
                 config: IBuilderConfig = builder.create_builder_config()
-                # config.min_timing_iterations = 1
-                # config.avg_timing_iterations = 1
                 config.max_workspace_size = workspace_size
                 # to enable complete trt inspector debugging, only for TensorRT >= 8.2
                 # config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
@@ -153,13 +153,13 @@ def build_engine(
                 config.set_tactic_sources(
                     tactic_sources=1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT)
                 )
-                # config.set_flag(trt.BuilderFlag.INT8)
-                # config.set_quantization_flag(trt.QuantizationFlag.CALIBRATE_BEFORE_FUSION)
-                # config.int8_calibrator = Calibrator()
-                config.set_flag(trt.BuilderFlag.FP16)
+                if int8:
+                    config.set_flag(trt.BuilderFlag.INT8)
+                if fp16:
+                    config.set_flag(trt.BuilderFlag.FP16)
                 config.set_flag(trt.BuilderFlag.DISABLE_TIMING_CACHE)
                 # https://github.com/NVIDIA/TensorRT/issues/1196 (sometimes big diff in output when using FP16)
-                config.set_flag(trt.BuilderFlag.STRICT_TYPES)
+                config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
                 with open(onnx_file_path, "rb") as f:
                     parser.parse(f.read())
                 profile: IOptimizationProfile = builder.create_optimization_profile()
@@ -171,12 +171,8 @@ def build_engine(
                         max=max_shape,
                     )
                 config.add_optimization_profile(profile)
-                # for i in range(network.num_layers):
-                #     layer: ILayer = network.get_layer(i)
-                #     if "gemm" in str(layer.name).lower():
-                #         for g in range(layer.num_outputs):
-                #             layer.precision = trt.DataType.FLOAT
-                network_definition = fix_fp16_network(network_definition)
+                if fp16:
+                    network_definition = fix_fp16_network(network_definition)
                 trt_engine = builder.build_serialized_network(network_definition, config)
                 engine: ICudaEngine = runtime.deserialize_cuda_engine(trt_engine)
                 assert engine is not None, "error during engine generation, check error messages above :-("
@@ -200,16 +196,20 @@ def infer_tensorrt(
     output_binding_idxs: List[int],
     stream: Stream,
 ) -> np.ndarray:
-    # warning: small change in output if int64 is used instead of int32
-    input_list: List[ndarray] = [tensor.astype(np.int32) for tensor in host_inputs.values()]
-    # allocate GPU memory for input tensors
-    device_inputs = [cuda.mem_alloc(tensor.nbytes) for tensor in input_list]
-    for h_input, d_input in zip(input_list, device_inputs):
-        cuda.memcpy_htod_async(d_input, h_input)  # host to GPU
+    input_list: List[ndarray] = list()
+    device_inputs: List[DeviceAllocation] = list()
+    for tensor in host_inputs.values():
+        # warning: small change in output if int64 is used instead of int32
+        tensor_int32: np.ndarray = np.asarray(tensor, dtype=np.int32)
+        input_list.append(tensor_int32)
+        # allocate GPU memory for input tensors
+        device_input: DeviceAllocation = cuda.mem_alloc(tensor_int32.nbytes)
+        device_inputs.append(device_input)
+        cuda.memcpy_htod_async(device_input, tensor_int32.ravel(), stream)
     # calculate input shape, bind it, allocate GPU memory for the output
     host_outputs, device_outputs = setup_binding_shapes(context, input_list, input_binding_idxs, output_binding_idxs)
     bindings = device_inputs + device_outputs
-    context.execute_async_v2(bindings, stream.handle)
+    assert context.execute_async_v2(bindings, stream_handle=stream.handle), "failure during execution of inference"
     for h_output, d_output in zip(host_outputs, device_outputs):
         cuda.memcpy_dtoh_async(h_output, d_output)  # GPU to host
     stream.synchronize()  # sync all CUDA ops
diff --git a/src/transformer_deploy/convert.py b/src/transformer_deploy/convert.py
index f2e950bf..5e115289 100644
--- a/src/transformer_deploy/convert.py
+++ b/src/transformer_deploy/convert.py
@@ -25,6 +25,7 @@
 import tensorrt as trt
 import torch
 from pycuda._driver import Stream
+from pytorch_quantization.nn import TensorQuantizer
 from tensorrt.tensorrt import IExecutionContext, Logger, Runtime
 from torch.cuda import get_device_name
 from torch.cuda.amp import autocast
@@ -47,12 +48,13 @@ def main():
     parser = argparse.ArgumentParser(
         description="optimize and deploy transformers", formatter_class=argparse.ArgumentDefaultsHelpFormatter
     )
-    parser.add_argument("-m", "--model", required=True, help="path to model or URL to Hugging Face Hub")
+    parser.add_argument("-m", "--model", required=True, help="path to model or URL to Hugging Face hub")
+    parser.add_argument("-t", "--tokenizer", help="path to tokenizer or URL to Hugging Face hub")
     parser.add_argument(
         "--auth-token",
         default=None,
         help=(
-            "HuggingFace Hub auth token. Set to `None` (default) for public models. "
+            "Hugging Face Hub auth token. Set to `None` (default) for public models. "
             "For private models, use `True` to use local cached token, or a string of your HF API token"
         ),
     )
@@ -72,6 +74,7 @@ def main():
         type=int,
         nargs=3,
     )
+    parser.add_argument("-q", "--quantization", action="store_true", help="int-8 GPU quantization support")
     parser.add_argument("-w", "--workspace-size", default=10000, help="workspace size in MiB (TensorRT)", type=int)
     parser.add_argument("-o", "--output", default="triton_models", help="name to be used for ")
     parser.add_argument("-n", "--name", default="transformer", help="model name to be used in triton server")
@@ -81,7 +84,7 @@ def main():
         default=["onnx"],
         help="backend to use. One of [onnx,tensorrt, pytorch] or all",
         nargs="*",
-        choices=["onnx", "tensorrt", "pytorch"],
+        choices=["onnx", "tensorrt"],
     )
     parser.add_argument("--nb-instances", default=1, help="# of model instances, may improve troughput", type=int)
     parser.add_argument("--warmup", default=100, help="# of inferences to warm each model", type=int)
@@ -107,7 +110,8 @@ def main():
     tensorrt_path = os.path.join(args.output, "model.plan")
 
     assert torch.cuda.is_available(), "CUDA is not available. Please check your CUDA installation"
-    tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained(args.model, use_auth_token=auth_token)
+    tokenizer_path = args.tokenizer if args.tokenizer else args.model
+    tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained(tokenizer_path, use_auth_token=auth_token)
     input_names: List[str] = tokenizer.model_input_names
     logging.info(f"axis: {input_names}")
     include_token_ids = "token_type_ids" in input_names
@@ -130,16 +134,45 @@ def main():
     logging.info(f"[Pytorch] input shape {inputs_pytorch['input_ids'].shape}")
     logging.info(f"[Pytorch] output shape: {output_pytorch.shape}")
     # create onnx model and compare results
-    convert_to_onnx(model_pytorch=model_pytorch, output_path=onnx_model_path, inputs_pytorch=inputs_pytorch)
+    opset = 12
+    if args.quantization:
+        TensorQuantizer.use_fb_fake_quant = True
+        opset = 13
+
+    convert_to_onnx(
+        model_pytorch=model_pytorch, output_path=onnx_model_path, inputs_pytorch=inputs_pytorch, opset=opset
+    )
+    if args.quantization:
+        TensorQuantizer.use_fb_fake_quant = False
     onnx_model = create_model_for_provider(path=onnx_model_path, provider_to_use="CUDAExecutionProvider")
     output_onnx = onnx_model.run(None, inputs_onnx)
     assert np.allclose(a=output_onnx, b=output_pytorch, atol=args.atol)
     del onnx_model
-    if "pytorch" not in args.backend:
-        del model_pytorch
 
     timings = {}
 
+    with torch.inference_mode():
+        for _ in range(args.warmup):
+            _ = model_pytorch(**inputs_pytorch)
+            torch.cuda.synchronize()
+        time_buffer = []
+        for _ in range(args.nb_measures):
+            with track_infer_time(time_buffer):
+                _ = model_pytorch(**inputs_pytorch)
+                torch.cuda.synchronize()
+        timings["Pytorch (FP32)"] = time_buffer
+        with autocast():
+            for _ in range(args.warmup):
+                _ = model_pytorch(**inputs_pytorch)
+                torch.cuda.synchronize()
+            time_buffer = []
+            for _ in range(args.nb_measures):
+                with track_infer_time(time_buffer):
+                    _ = model_pytorch(**inputs_pytorch)
+                    torch.cuda.synchronize()
+            timings["Pytorch (FP16)"] = time_buffer
+    del model_pytorch
+
     if "tensorrt" in args.backend:
         trt_logger: Logger = trt.Logger(trt.Logger.INFO if args.verbose else trt.Logger.WARNING)
         runtime: Runtime = trt.Runtime(trt_logger)
@@ -151,6 +184,8 @@ def main():
             optimal_shape=tensor_shapes[1],
             max_shape=tensor_shapes[2],
             workspace_size=args.workspace_size * 1024 * 1024,
+            fp16=not args.quantization,
+            int8=args.quantization,
         )
         save_engine(engine=engine, engine_file_path=tensorrt_path)
         # important to check the engine has been correctly serialized
@@ -242,28 +277,6 @@ def main():
         )
         conf.create_folders(tokenizer=tokenizer, model_path=onnx_optim_fp16_path)
 
-    if "pytorch" in args.backend:
-        with torch.inference_mode():
-            for _ in range(args.warmup):
-                _ = model_pytorch(**inputs_pytorch)
-                torch.cuda.synchronize()
-            time_buffer = []
-            for _ in range(args.nb_measures):
-                with track_infer_time(time_buffer):
-                    _ = model_pytorch(**inputs_pytorch)
-                    torch.cuda.synchronize()
-            timings["Pytorch (FP32)"] = time_buffer
-            with autocast():
-                for _ in range(args.warmup):
-                    _ = model_pytorch(**inputs_pytorch)
-                    torch.cuda.synchronize()
-                time_buffer = []
-                for _ in range(args.nb_measures):
-                    with track_infer_time(time_buffer):
-                        _ = model_pytorch(**inputs_pytorch)
-                        torch.cuda.synchronize()
-                timings["Pytorch (FP16)"] = time_buffer
-
     print(f"Inference done on {get_device_name(0)}")
     print("latencies:")
     for name, time_buffer in timings.items():