Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

PyTorch BERT Base INFERENCE - NLP Azure

Description

This pipeline provides instructions on how to run inference using BERT Base model on infrastructure provided by Azure Machine Learning with make and docker compose.

Project Structure

├── azureml @ v1.0.1
├── Makefile
├── README.md
└── docker-compose.yml

Makefile

AZURE_CONFIG_FILE ?= $$(pwd)/config.json
FINAL_IMAGE_NAME ?= 
FP32_TRAINED_MODEL ?= $$(pwd)/../training/azureml/notebooks/fp32_model_output

nlp-azure:
	mkdir -p ./azureml/notebooks/fp32_model_output && cp -r ${FP32_TRAINED_MODEL} ./azureml/notebooks/
	FINAL_IMAGE_NAME=${FINAL_IMAGE_NAME} \
	AZURE_CONFIG_FILE=${AZURE_CONFIG_FILE} \
	docker compose up nlp-azure --build

clean:
	docker compose down
	rm -rf ./azureml/notebooks/fp32_model_output

docker-compose.yml

services:
  nlp-azure:
    build:
      args:
        http_proxy: ${http_proxy}
        https_proxy: ${https_proxy}
        no_proxy: ${no_proxy}
      dockerfile: ./azureml/Dockerfile
    command: sh -c "jupyter nbconvert --to python 1.0-intel-azureml-inference.ipynb && python3 1.0-intel-azureml-inference.py"
    environment:
      - http_proxy=${http_proxy}
      - https_proxy=${https_proxy}
      - no_proxy=${no_proxy}
    image: ${FINAL_IMAGE_NAME}:inference-ubuntu-20.04
    network_mode: "host"
    privileged: true
    volumes:
      - ./azureml/notebooks:/root/notebooks
      - ./azureml/src:/root/src
      - /${AZURE_CONFIG_FILE}:/root/notebooks/config.json
    working_dir: /root/notebooks

Azure Machine Learning

End-to-End AI workflow using the Azure ML Cloud Infrastructure for executing inference using the BERT Base model. More Information here. The pipeline runs the 1.0-intel-azureml-inference.ipynb of the Azure ML project.

Quick Start

  • Make sure that the enviroment setup pre-requisites are satisfied per the document here.

  • Pull and configure the dependent repo submodule git submodule update --init --recursive .

  • Install Pipeline Repository Dependencies.

  • Use the quickstart link to setup your Azure ML resources.

    • If required, create virtual networks and NAT gateway by following this link.
  • Download the config.json file from your Azure ML Studio Workspace.

  • This pipeline requires the pre-trained FP32 model. Please run the training pipeline before running inference to get the model.

  • Other Variables:

Variable Name Default Notes
AZURE_CONFIG_FILE $$(pwd)/config.json Azure Workspace Configuration file
FINAL_IMAGE_NAME nlp-azure Final Docker Image Name
FP32_TRAINED_MODEL $$(pwd)/../training/azureml/notebooks/fp32_model_output FP32 model obtained from Training

Build and Run

Build and run with defaults:

make nlp-azure

Build and Run Example

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 32B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/library/ubuntu:20.04
#3 DONE 0.7s

#4 [1/3] FROM docker.io/library/ubuntu:20.04@sha256:9c2004872a3a9fcec8cc757ad65c042de1dad4da27de4c70739a6e36402213e3
#4 DONE 0.0s

#5 [2/3] RUN apt-get update &&     apt-get install --no-install-recommends curl=7.68.0-1ubuntu2.13 -y &&     apt-get install --no-install-recommends python3-pip=20.0.2-5ubuntu1.6 -y &&     rm -r /var/lib/apt/lists/*
#5 CACHED

#6 [3/3] RUN pip install --no-cache-dir azureml-sdk==1.45.0 && pip install --no-cache-dir notebook==6.4.12
#6 CACHED

#7 exporting to image
#7 exporting layers done
#7 writing image sha256:b4b0d17ff3f251644447a83a133d0d41a7f42129b05739ba4d843ecced862eeb done
#7 naming to docker.io/library/nlp-azure:inference-ubuntu-20.04 done
#7 DONE 0.0s
Attaching to inference-nlp-azure-1
inference-nlp-azure-1  | [NbConvertApp] Converting notebook 1.0-intel-azureml-inference.ipynb to python
inference-nlp-azure-1  | [NbConvertApp] Writing 9806 bytes to 1.0-intel-azureml-inference.py
inference-nlp-azure-1  | Failure while loading azureml_run_type_providers. Failed to load entrypoint hyperdrive = azureml.train.hyperdrive:HyperDriveRun._from_run_dto with exception (cryptography 37.0.4 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('cryptography<39,>=38.0.0'), {'pyopenssl', 'PyOpenSSL'}).
inference-nlp-azure-1  | Failure while loading azureml_run_type_providers. Failed to load entrypoint automl = azureml.train.automl.run:AutoMLRun._from_run_dto with exception (cryptography 37.0.4 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('cryptography<39,>=38.0.0'), {'pyopenssl', 'PyOpenSSL'}).
inference-nlp-azure-1  | Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.PipelineRun = azureml.pipeline.core.run:PipelineRun._from_dto with exception (cryptography 37.0.4 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('cryptography<39,>=38.0.0'), {'pyopenssl', 'PyOpenSSL'}).
inference-nlp-azure-1  | Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.ReusedStepRun = azureml.pipeline.core.run:StepRun._from_reused_dto with exception (cryptography 37.0.4 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('cryptography<39,>=38.0.0'), {'pyopenssl', 'PyOpenSSL'}).
inference-nlp-azure-1  | Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.StepRun = azureml.pipeline.core.run:StepRun._from_dto with exception (cryptography 37.0.4 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('cryptography<39,>=38.0.0'), {'pyopenssl', 'PyOpenSSL'}).
inference-nlp-azure-1  | Failure while loading azureml_run_type_providers. Failed to load entrypoint azureml.scriptrun = azureml.core.script_run:ScriptRun._from_run_dto with exception (cryptography 37.0.4 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('cryptography<39,>=38.0.0'), {'pyopenssl', 'PyOpenSSL'}).
inference-nlp-azure-1  | Loaded existing workspace configuration
inference-nlp-azure-1  | Validating arguments.
inference-nlp-azure-1  | Arguments validated.
inference-nlp-azure-1  | Uploading file to /inc/ptq_config
inference-nlp-azure-1  | Uploading an estimated of 1 files
inference-nlp-azure-1  | Uploading ../src/inference_container/config/ptq.yaml
inference-nlp-azure-1  | Uploaded ../src/inference_container/config/ptq.yaml, 1 files out of an estimated total of 1
inference-nlp-azure-1  | Uploaded 1 files
inference-nlp-azure-1  | Creating new dataset
inference-nlp-azure-1  | Validating arguments.
inference-nlp-azure-1  | Arguments validated.
inference-nlp-azure-1  | Uploading file to /trained_fp32_hf_model
inference-nlp-azure-1  | Uploading an estimated of 11 files
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/training_args.bin
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/training_args.bin, 1 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/config.json
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/config.json, 2 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/training_args.bin
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/training_args.bin, 3 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/trainer_state.json
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/trainer_state.json, 4 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/scheduler.pt
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/scheduler.pt, 5 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/config.json
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/config.json, 6 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/rng_state_0.pth
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/rng_state_0.pth, 7 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/rng_state_1.pth
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/rng_state_1.pth, 8 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/pytorch_model.bin
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/pytorch_model.bin, 9 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/pytorch_model.bin
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/pytorch_model.bin, 10 files out of an estimated total of 11
inference-nlp-azure-1  | Uploading ./fp32_model_output/outputs/trained_model/checkpoint-500/optimizer.pt
inference-nlp-azure-1  | Uploaded ./fp32_model_output/outputs/trained_model/checkpoint-500/optimizer.pt, 11 files out of an estimated total of 11
inference-nlp-azure-1  | Uploaded 11 files
inference-nlp-azure-1  | Creating new dataset
inference-nlp-azure-1  | Found existing cluster, use it.
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Running
inference-nlp-azure-1  | RunId: INC_PTQ_1666128985_788b95f3
inference-nlp-azure-1  | Web View: https://ml.azure.com/runs/INC_PTQ_1666128985_788b95f3?wsid=/subscriptions/0a5dbdd4-ee35-483f-b248-93e05a52cd9f/resourcegroups/intel_azureml_resource/workspaces/cloud_t7_i9&tid=46c98d88-e344-4ed4-8496-4ed7712e255d
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Streaming user_logs/std_log.txt
inference-nlp-azure-1  | ===============================
inference-nlp-azure-1  | 
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading builder script:   0%|          | 0.00/7.78k [00:00<?, ?B/s]
inference-nlp-azure-1  | Downloading builder script: 28.8kB [00:00, 12.3MB/s]                   
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading metadata:   0%|          | 0.00/4.47k [00:00<?, ?B/s]
inference-nlp-azure-1  | Downloading metadata: 28.7kB [00:00, 14.7MB/s]                   
inference-nlp-azure-1  | Downloading and preparing dataset glue/mrpc (download: 1.43 MiB, generated: 1.43 MiB, post-processed: Unknown size, total: 2.85 MiB) to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad...
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data files:   0%|          | 0/3 [00:00<?, ?it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data: 0.00B [00:00, ?B/s]�[A
inference-nlp-azure-1  | Downloading data: 6.22kB [00:00, 4.12MB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data files:  33%|███▎      | 1/3 [00:00<00:00,  2.24it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data: 0.00B [00:00, ?B/s]�[A
inference-nlp-azure-1  | Downloading data: 1.05MB [00:00, 18.6MB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data files:  67%|██████▋   | 2/3 [00:00<00:00,  2.30it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data: 0.00B [00:00, ?B/s]�[A
inference-nlp-azure-1  | Downloading data: 441kB [00:00, 13.3MB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading data files: 100%|██████████| 3/3 [00:01<00:00,  2.50it/s]
inference-nlp-azure-1  | Downloading data files: 100%|██████████| 3/3 [00:01<00:00,  2.43it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Generating train split:   0%|          | 0/3668 [00:00<?, ? examples/s]
inference-nlp-azure-1  | Generating train split:  38%|███▊      | 1390/3668 [00:00<00:00, 13893.19 examples/s]
inference-nlp-azure-1  | Generating train split:  78%|███████▊  | 2867/3668 [00:00<00:00, 14406.34 examples/s]
inference-nlp-azure-1  |                                                                                      
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Generating validation split:   0%|          | 0/408 [00:00<?, ? examples/s]
inference-nlp-azure-1  | Generating validation split:  96%|█████████▌| 391/408 [00:00<00:00, 3869.76 examples/s]
inference-nlp-azure-1  |                                                                                        
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Generating test split:   0%|          | 0/1725 [00:00<?, ? examples/s]
inference-nlp-azure-1  |                                                                       
inference-nlp-azure-1  | Dataset glue downloaded and prepared to /root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad. Subsequent calls will reuse this data.
inference-nlp-azure-1  | 
inference-nlp-azure-1  |   0%|          | 0/2 [00:00<?, ?it/s]
inference-nlp-azure-1  | 100%|██████████| 2/2 [00:00<00:00, 708.80it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  |   0%|          | 0/4 [00:00<?, ?ba/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]�[A
inference-nlp-azure-1  | Downloading tokenizer_config.json: 100%|██████████| 28.0/28.0 [00:00<00:00, 23.8kB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]�[A
inference-nlp-azure-1  | Downloading config.json: 100%|██████████| 570/570 [00:00<00:00, 457kB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]�[A
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading vocab.txt:  12%|█▏        | 28.0k/226k [00:00<00:01, 195kB/s]�[A
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading vocab.txt:  69%|██████▉   | 157k/226k [00:00<00:00, 605kB/s] �[A
inference-nlp-azure-1  | Downloading vocab.txt: 100%|██████████| 226k/226k [00:00<00:00, 775kB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading tokenizer.json:   0%|          | 0.00/455k [00:00<?, ?B/s]�[A
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading tokenizer.json:   9%|▉         | 40.0k/455k [00:00<00:01, 277kB/s]�[A
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading tokenizer.json:  24%|██▎       | 108k/455k [00:00<00:00, 391kB/s] �[A
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading tokenizer.json:  91%|█████████ | 412k/455k [00:00<00:00, 1.17MB/s]�[A
inference-nlp-azure-1  | Downloading tokenizer.json: 100%|██████████| 455k/455k [00:00<00:00, 1.04MB/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  |  25%|██▌       | 1/4 [00:05<00:15,  5.31s/ba]
inference-nlp-azure-1  |  50%|█████     | 2/4 [00:08<00:07,  3.99s/ba]
inference-nlp-azure-1  |  75%|███████▌  | 3/4 [00:11<00:03,  3.54s/ba]
inference-nlp-azure-1  | 100%|██████████| 4/4 [00:14<00:00,  3.31s/ba]
inference-nlp-azure-1  | 100%|██████████| 4/4 [00:14<00:00,  3.58s/ba]
inference-nlp-azure-1  | 
inference-nlp-azure-1  |   0%|          | 0/2 [00:00<?, ?ba/s]
inference-nlp-azure-1  |  50%|█████     | 1/2 [00:03<00:03,  3.01s/ba]
inference-nlp-azure-1  | 100%|██████████| 2/2 [00:05<00:00,  2.98s/ba]
inference-nlp-azure-1  | 100%|██████████| 2/2 [00:05<00:00,  2.98s/ba]
inference-nlp-azure-1  | 2022-10-18 21:39:50 [INFO] Created a worker pool for first use
inference-nlp-azure-1  | 2022-10-18 21:39:50 [WARNING] Reusing dataset glue (/root/.cache/huggingface/datasets/glue/mrpc/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
inference-nlp-azure-1  | 
inference-nlp-azure-1  |   0%|          | 0/2 [00:00<?, ?it/s]
inference-nlp-azure-1  | 100%|██████████| 2/2 [00:00<00:00, 662.66it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  |   0%|          | 0/4 [00:00<?, ?ba/s]loading configuration file https://huggingface.co/bert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/3c61d016573b14f7f008c02c4e51a366c67ab274726fe2910691e2a761acf43e.37395cee442ab11005bcd270f3c34464dc1704b715b5d7d52b1a461abe3b9e4e
inference-nlp-azure-1  | Model config BertConfig {
inference-nlp-azure-1  |   "_name_or_path": "bert-base-uncased",
inference-nlp-azure-1  |   "architectures": [
inference-nlp-azure-1  |     "BertForMaskedLM"
inference-nlp-azure-1  |   ],
inference-nlp-azure-1  |   "attention_probs_dropout_prob": 0.1,
inference-nlp-azure-1  |   "classifier_dropout": null,
inference-nlp-azure-1  |   "gradient_checkpointing": false,
inference-nlp-azure-1  |   "hidden_act": "gelu",
inference-nlp-azure-1  |   "hidden_dropout_prob": 0.1,
inference-nlp-azure-1  |   "hidden_size": 768,
inference-nlp-azure-1  |   "initializer_range": 0.02,
inference-nlp-azure-1  |   "intermediate_size": 3072,
inference-nlp-azure-1  |   "layer_norm_eps": 1e-12,
inference-nlp-azure-1  |   "max_position_embeddings": 512,
inference-nlp-azure-1  |   "model_type": "bert",
inference-nlp-azure-1  |   "num_attention_heads": 12,
inference-nlp-azure-1  |   "num_hidden_layers": 12,
inference-nlp-azure-1  |   "pad_token_id": 0,
inference-nlp-azure-1  |   "position_embedding_type": "absolute",
inference-nlp-azure-1  |   "transformers_version": "4.21.1",
inference-nlp-azure-1  |   "type_vocab_size": 2,
inference-nlp-azure-1  |   "use_cache": true,
inference-nlp-azure-1  |   "vocab_size": 30522
inference-nlp-azure-1  | 
inference-nlp-azure-1  | 100%|██████████| 4/4 [00:11<00:00,  2.93s/ba]
inference-nlp-azure-1  | 100%|██████████| 4/4 [00:11<00:00,  2.93s/ba]
inference-nlp-azure-1  | 2022-10-18 21:40:03 [INFO] Pass query framework capability elapsed time: 554.0 ms
inference-nlp-azure-1  | 2022-10-18 21:40:03 [INFO] Get FP32 model baseline.
inference-nlp-azure-1  | The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence2, idx, sentence1. If sentence2, idx, sentence1 are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
inference-nlp-azure-1  | /opt/miniconda/lib/python3.8/site-packages/intel_extension_for_pytorch/frontend.py:261: UserWarning: Conv BatchNorm folding failed during the optimize process.
inference-nlp-azure-1  |   warnings.warn("Conv BatchNorm folding failed during the optimize process.")
inference-nlp-azure-1  | ***** Running Evaluation *****
inference-nlp-azure-1  |   Num examples = 1725
inference-nlp-azure-1  |   Batch size = 8
inference-nlp-azure-1  | 
inference-nlp-azure-1  |   0%|          | 0/216 [00:00<?, ?it/s]
inference-nlp-azure-1  |   1%|          | 2/216 [00:00<01:35,  2.25it/s]
inference-nlp-azure-1  |   1%|▏         | 3/216 [00:01<02:16,  1.56it/s]
inference-nlp-azure-1  |   2%|▏         | 4/216 [00:02<02:30,  1.41it/s]
inference-nlp-azure-1  |   2%|▏         | 5/216 [00:03<02:39,  1.32it/s]
inference-nlp-azure-1  |   3%|▎         | 6/216 [00:04<02:44,  1.28it/s]
inference-nlp-azure-1  |   3%|▎         | 7/216 [00:05<02:45,  1.26it/s]
inference-nlp-azure-1  |   4%|▎         | 8/216 [00:05<02:45,  1.25it/s]
inference-nlp-azure-1  |   4%|▍         | 9/216 [00:06<02:44,  1.26it/s]
inference-nlp-azure-1  |   5%|▍         | 10/216 [00:07<02:44,  1.25it/s]
inference-nlp-azure-1  |   5%|▌         | 11/216 [00:08<02:44,  1.25it/s]
inference-nlp-azure-1  |   6%|▌         | 12/216 [00:09<02:47,  1.22it/s]
inference-nlp-azure-1  |   6%|▌         | 13/216 [00:10<02:46,  1.22it/s]
inference-nlp-azure-1  |   6%|▋         | 14/216 [00:10<02:45,  1.22it/s]
inference-nlp-azure-1  |   7%|▋         | 15/216 [00:11<02:43,  1.23it/s]
inference-nlp-azure-1  |   7%|▋         | 16/216 [00:12<02:43,  1.22it/s]
inference-nlp-azure-1  |   8%|▊         | 17/216 [00:13<02:42,  1.23it/s]
inference-nlp-azure-1  |   8%|▊         | 18/216 [00:14<02:40,  1.23it/s]
inference-nlp-azure-1  |   9%|▉         | 19/216 [00:14<02:41,  1.22it/s]
inference-nlp-azure-1  |   9%|▉         | 20/216 [00:15<02:39,  1.23it/s]
inference-nlp-azure-1  |  10%|▉         | 21/216 [00:16<02:37,  1.24it/s]
inference-nlp-azure-1  |  10%|█         | 22/216 [00:17<02:37,  1.23it/s]
inference-nlp-azure-1  |  11%|█         | 23/216 [00:18<02:37,  1.23it/s]
inference-nlp-azure-1  |  11%|█         | 24/216 [00:18<02:35,  1.23it/s]
inference-nlp-azure-1  |  12%|█▏        | 25/216 [00:19<02:35,  1.23it/s]
inference-nlp-azure-1  |  12%|█▏        | 26/216 [00:20<02:34,  1.23it/s]
inference-nlp-azure-1  |  12%|█▎        | 27/216 [00:21<02:34,  1.22it/s]
inference-nlp-azure-1  |  13%|█▎        | 28/216 [00:22<02:32,  1.23it/s]
inference-nlp-azure-1  |  13%|█▎        | 29/216 [00:23<02:31,  1.23it/s]
inference-nlp-azure-1  |  14%|█▍        | 30/216 [00:23<02:32,  1.22it/s]
inference-nlp-azure-1  |  14%|█▍        | 31/216 [00:24<02:37,  1.17it/s]
inference-nlp-azure-1  |  15%|█▍        | 32/216 [00:25<02:33,  1.20it/s]
inference-nlp-azure-1  |  15%|█▌        | 33/216 [00:26<02:30,  1.21it/s]
inference-nlp-azure-1  |  16%|█▌        | 34/216 [00:27<02:27,  1.23it/s]
inference-nlp-azure-1  |  16%|█▌        | 35/216 [00:27<02:26,  1.24it/s]
inference-nlp-azure-1  |  17%|█▋        | 36/216 [00:28<02:25,  1.24it/s]
inference-nlp-azure-1  |  17%|█▋        | 37/216 [00:29<02:24,  1.24it/s]
inference-nlp-azure-1  |  18%|█▊        | 38/216 [00:30<02:27,  1.21it/s]
inference-nlp-azure-1  |  18%|█▊        | 39/216 [00:31<02:24,  1.23it/s]
inference-nlp-azure-1  |  19%|█▊        | 40/216 [00:32<02:23,  1.23it/s]
inference-nlp-azure-1  |  19%|█▉        | 41/216 [00:32<02:22,  1.23it/s]
inference-nlp-azure-1  |  19%|█▉        | 42/216 [00:33<02:21,  1.23it/s]
inference-nlp-azure-1  |  20%|█▉        | 43/216 [00:34<02:19,  1.24it/s]
inference-nlp-azure-1  |  20%|██        | 44/216 [00:35<02:18,  1.24it/s]
inference-nlp-azure-1  |  21%|██        | 45/216 [00:36<02:17,  1.24it/s]
inference-nlp-azure-1  |  21%|██▏       | 46/216 [00:36<02:18,  1.23it/s]
inference-nlp-azure-1  |  22%|██▏       | 47/216 [00:37<02:20,  1.21it/s]
inference-nlp-azure-1  |  22%|██▏       | 48/216 [00:38<02:19,  1.21it/s]
inference-nlp-azure-1  |  23%|██▎       | 49/216 [00:39<02:19,  1.19it/s]
inference-nlp-azure-1  |  23%|██▎       | 50/216 [00:40<02:16,  1.21it/s]
inference-nlp-azure-1  |  24%|██▎       | 51/216 [00:41<02:16,  1.21it/s]
inference-nlp-azure-1  |  24%|██▍       | 52/216 [00:41<02:14,  1.22it/s]
inference-nlp-azure-1  |  25%|██▍       | 53/216 [00:42<02:12,  1.23it/s]
inference-nlp-azure-1  |  25%|██▌       | 54/216 [00:43<02:11,  1.23it/s]
inference-nlp-azure-1  |  25%|██▌       | 55/216 [00:44<02:10,  1.24it/s]
inference-nlp-azure-1  |  26%|██▌       | 56/216 [00:45<02:08,  1.24it/s]
inference-nlp-azure-1  |  26%|██▋       | 57/216 [00:45<02:09,  1.23it/s]
inference-nlp-azure-1  |  27%|██▋       | 58/216 [00:46<02:08,  1.23it/s]
inference-nlp-azure-1  |  27%|██▋       | 59/216 [00:47<02:06,  1.24it/s]
inference-nlp-azure-1  |  28%|██▊       | 60/216 [00:48<02:07,  1.22it/s]
inference-nlp-azure-1  |  28%|██▊       | 61/216 [00:49<02:06,  1.23it/s]
inference-nlp-azure-1  |  29%|██▊       | 62/216 [00:49<02:04,  1.24it/s]
inference-nlp-azure-1  |  29%|██▉       | 63/216 [00:50<02:04,  1.23it/s]
inference-nlp-azure-1  |  30%|██▉       | 64/216 [00:51<02:03,  1.23it/s]
inference-nlp-azure-1  |  30%|███       | 65/216 [00:52<02:02,  1.23it/s]
inference-nlp-azure-1  |  31%|███       | 66/216 [00:53<02:00,  1.24it/s]
inference-nlp-azure-1  |  31%|███       | 67/216 [00:53<01:59,  1.25it/s]
inference-nlp-azure-1  |  31%|███▏      | 68/216 [00:54<02:02,  1.21it/s]
inference-nlp-azure-1  |  32%|███▏      | 69/216 [00:55<02:00,  1.22it/s]
inference-nlp-azure-1  |  32%|███▏      | 70/216 [00:56<01:59,  1.23it/s]
inference-nlp-azure-1  |  33%|███▎      | 71/216 [00:57<01:57,  1.24it/s]
inference-nlp-azure-1  |  33%|███▎      | 72/216 [00:58<01:56,  1.24it/s]
inference-nlp-azure-1  |  34%|███▍      | 73/216 [00:58<01:55,  1.23it/s]
inference-nlp-azure-1  |  34%|███▍      | 74/216 [00:59<01:54,  1.24it/s]
inference-nlp-azure-1  |  35%|███▍      | 75/216 [01:00<01:54,  1.24it/s]
inference-nlp-azure-1  |  35%|███▌      | 76/216 [01:01<01:54,  1.23it/s]
inference-nlp-azure-1  |  36%|███▌      | 77/216 [01:02<01:53,  1.22it/s]
inference-nlp-azure-1  |  36%|███▌      | 78/216 [01:02<01:52,  1.22it/s]
inference-nlp-azure-1  |  37%|███▋      | 79/216 [01:03<01:51,  1.23it/s]
inference-nlp-azure-1  |  37%|███▋      | 80/216 [01:04<01:50,  1.23it/s]
inference-nlp-azure-1  |  38%|███▊      | 81/216 [01:05<01:48,  1.24it/s]
inference-nlp-azure-1  |  38%|███▊      | 82/216 [01:06<01:49,  1.22it/s]
inference-nlp-azure-1  |  38%|███▊      | 83/216 [01:07<01:50,  1.20it/s]
inference-nlp-azure-1  |  39%|███▉      | 84/216 [01:07<01:48,  1.22it/s]
inference-nlp-azure-1  |  39%|███▉      | 85/216 [01:08<01:47,  1.22it/s]
inference-nlp-azure-1  |  40%|███▉      | 86/216 [01:09<01:51,  1.16it/s]
inference-nlp-azure-1  |  40%|████      | 87/216 [01:10<01:49,  1.18it/s]
inference-nlp-azure-1  |  41%|████      | 88/216 [01:11<01:46,  1.20it/s]
inference-nlp-azure-1  |  41%|████      | 89/216 [01:12<01:44,  1.21it/s]
inference-nlp-azure-1  |  42%|████▏     | 90/216 [01:12<01:43,  1.21it/s]
inference-nlp-azure-1  |  42%|████▏     | 91/216 [01:13<01:41,  1.23it/s]
inference-nlp-azure-1  |  43%|████▎     | 92/216 [01:14<01:41,  1.22it/s]
inference-nlp-azure-1  |  43%|████▎     | 93/216 [01:15<01:42,  1.20it/s]
inference-nlp-azure-1  |  44%|████▎     | 94/216 [01:16<01:40,  1.21it/s]
inference-nlp-azure-1  |  44%|████▍     | 95/216 [01:17<01:39,  1.22it/s]
inference-nlp-azure-1  |  44%|████▍     | 96/216 [01:17<01:38,  1.22it/s]
inference-nlp-azure-1  |  45%|████▍     | 97/216 [01:18<01:36,  1.23it/s]
inference-nlp-azure-1  |  45%|████▌     | 98/216 [01:19<01:35,  1.24it/s]
inference-nlp-azure-1  |  46%|████▌     | 99/216 [01:20<01:33,  1.25it/s]
inference-nlp-azure-1  |  46%|████▋     | 100/216 [01:21<01:32,  1.25it/s]
inference-nlp-azure-1  |  47%|████▋     | 101/216 [01:21<01:32,  1.24it/s]
inference-nlp-azure-1  |  47%|████▋     | 102/216 [01:22<01:32,  1.23it/s]
inference-nlp-azure-1  |  48%|████▊     | 103/216 [01:23<01:31,  1.23it/s]
inference-nlp-azure-1  |  48%|████▊     | 104/216 [01:24<01:35,  1.17it/s]
inference-nlp-azure-1  |  49%|████▊     | 105/216 [01:25<01:33,  1.19it/s]
inference-nlp-azure-1  |  49%|████▉     | 106/216 [01:26<01:31,  1.20it/s]
inference-nlp-azure-1  |  50%|████▉     | 107/216 [01:26<01:29,  1.22it/s]
inference-nlp-azure-1  |  50%|█████     | 108/216 [01:27<01:28,  1.21it/s]
inference-nlp-azure-1  |  50%|█████     | 109/216 [01:28<01:27,  1.23it/s]
inference-nlp-azure-1  |  51%|█████     | 110/216 [01:29<01:26,  1.23it/s]
inference-nlp-azure-1  |  51%|█████▏    | 111/216 [01:30<01:25,  1.23it/s]
inference-nlp-azure-1  |  52%|█████▏    | 112/216 [01:30<01:23,  1.24it/s]
inference-nlp-azure-1  |  52%|█████▏    | 113/216 [01:31<01:21,  1.26it/s]
inference-nlp-azure-1  |  53%|█████▎    | 114/216 [01:32<01:21,  1.25it/s]
inference-nlp-azure-1  |  53%|█████▎    | 115/216 [01:33<01:20,  1.26it/s]
inference-nlp-azure-1  |  54%|█████▎    | 116/216 [01:34<01:20,  1.25it/s]
inference-nlp-azure-1  |  54%|█████▍    | 117/216 [01:34<01:19,  1.25it/s]
inference-nlp-azure-1  |  55%|█████▍    | 118/216 [01:35<01:18,  1.25it/s]
inference-nlp-azure-1  |  55%|█████▌    | 119/216 [01:36<01:18,  1.23it/s]
inference-nlp-azure-1  |  56%|█████▌    | 120/216 [01:37<01:18,  1.23it/s]
inference-nlp-azure-1  |  56%|█████▌    | 121/216 [01:38<01:18,  1.21it/s]
inference-nlp-azure-1  |  56%|█████▋    | 122/216 [01:38<01:16,  1.22it/s]
inference-nlp-azure-1  |  57%|█████▋    | 123/216 [01:39<01:18,  1.19it/s]
inference-nlp-azure-1  |  57%|█████▋    | 124/216 [01:40<01:16,  1.21it/s]
inference-nlp-azure-1  |  58%|█████▊    | 125/216 [01:41<01:15,  1.21it/s]
inference-nlp-azure-1  |  58%|█████▊    | 126/216 [01:42<01:15,  1.20it/s]
inference-nlp-azure-1  |  59%|█████▉    | 127/216 [01:43<01:14,  1.19it/s]
inference-nlp-azure-1  |  59%|█████▉    | 128/216 [01:44<01:13,  1.19it/s]
inference-nlp-azure-1  |  60%|█████▉    | 129/216 [01:44<01:12,  1.21it/s]
inference-nlp-azure-1  |  60%|██████    | 130/216 [01:45<01:11,  1.20it/s]
inference-nlp-azure-1  |  61%|██████    | 131/216 [01:46<01:11,  1.19it/s]
inference-nlp-azure-1  |  61%|██████    | 132/216 [01:47<01:09,  1.22it/s]
inference-nlp-azure-1  |  62%|██████▏   | 133/216 [01:48<01:08,  1.21it/s]
inference-nlp-azure-1  |  62%|██████▏   | 134/216 [01:48<01:07,  1.22it/s]
inference-nlp-azure-1  |  62%|██████▎   | 135/216 [01:49<01:05,  1.24it/s]
inference-nlp-azure-1  |  63%|██████▎   | 136/216 [01:50<01:04,  1.24it/s]
inference-nlp-azure-1  |  63%|██████▎   | 137/216 [01:51<01:03,  1.25it/s]
inference-nlp-azure-1  |  64%|██████▍   | 138/216 [01:52<01:02,  1.25it/s]
inference-nlp-azure-1  |  64%|██████▍   | 139/216 [01:52<01:01,  1.25it/s]
inference-nlp-azure-1  |  65%|██████▍   | 140/216 [01:53<01:00,  1.26it/s]
inference-nlp-azure-1  |  65%|██████▌   | 141/216 [01:54<01:03,  1.19it/s]
inference-nlp-azure-1  |  66%|██████▌   | 142/216 [01:55<01:01,  1.20it/s]
inference-nlp-azure-1  |  66%|██████▌   | 143/216 [01:56<01:00,  1.20it/s]
inference-nlp-azure-1  |  67%|██████▋   | 144/216 [01:57<00:59,  1.21it/s]
inference-nlp-azure-1  |  67%|██████▋   | 145/216 [01:57<00:58,  1.21it/s]
inference-nlp-azure-1  |  68%|██████▊   | 146/216 [01:58<00:57,  1.21it/s]
inference-nlp-azure-1  |  68%|██████▊   | 147/216 [01:59<00:56,  1.22it/s]
inference-nlp-azure-1  |  69%|██████▊   | 148/216 [02:00<00:55,  1.22it/s]
inference-nlp-azure-1  |  69%|██████▉   | 149/216 [02:01<00:54,  1.22it/s]
inference-nlp-azure-1  |  69%|██████▉   | 150/216 [02:02<00:53,  1.23it/s]
inference-nlp-azure-1  |  70%|██████▉   | 151/216 [02:02<00:52,  1.23it/s]
inference-nlp-azure-1  |  70%|███████   | 152/216 [02:03<00:52,  1.23it/s]
inference-nlp-azure-1  |  71%|███████   | 153/216 [02:04<00:50,  1.24it/s]
inference-nlp-azure-1  |  71%|███████▏  | 154/216 [02:05<00:50,  1.24it/s]
inference-nlp-azure-1  |  72%|███████▏  | 155/216 [02:06<00:49,  1.22it/s]
inference-nlp-azure-1  |  72%|███████▏  | 156/216 [02:06<00:48,  1.23it/s]
inference-nlp-azure-1  |  73%|███████▎  | 157/216 [02:07<00:48,  1.22it/s]
inference-nlp-azure-1  |  73%|███████▎  | 158/216 [02:08<00:47,  1.22it/s]
inference-nlp-azure-1  |  74%|███████▎  | 159/216 [02:09<00:47,  1.20it/s]
inference-nlp-azure-1  |  74%|███████▍  | 160/216 [02:10<00:46,  1.20it/s]
inference-nlp-azure-1  |  75%|███████▍  | 161/216 [02:11<00:45,  1.22it/s]
inference-nlp-azure-1  |  75%|███████▌  | 162/216 [02:11<00:43,  1.23it/s]
inference-nlp-azure-1  |  75%|███████▌  | 163/216 [02:12<00:43,  1.22it/s]
inference-nlp-azure-1  |  76%|███████▌  | 164/216 [02:13<00:42,  1.21it/s]
inference-nlp-azure-1  |  76%|███████▋  | 165/216 [02:14<00:41,  1.22it/s]
inference-nlp-azure-1  |  77%|███████▋  | 166/216 [02:15<00:40,  1.22it/s]
inference-nlp-azure-1  |  77%|███████▋  | 167/216 [02:15<00:40,  1.22it/s]
inference-nlp-azure-1  |  78%|███████▊  | 168/216 [02:16<00:39,  1.21it/s]
inference-nlp-azure-1  |  78%|███████▊  | 169/216 [02:17<00:38,  1.21it/s]
inference-nlp-azure-1  |  79%|███████▊  | 170/216 [02:18<00:37,  1.22it/s]
inference-nlp-azure-1  |  79%|███████▉  | 171/216 [02:19<00:37,  1.20it/s]
inference-nlp-azure-1  |  80%|███████▉  | 172/216 [02:20<00:36,  1.21it/s]
inference-nlp-azure-1  |  80%|████████  | 173/216 [02:20<00:35,  1.21it/s]
inference-nlp-azure-1  |  81%|████████  | 174/216 [02:21<00:34,  1.23it/s]
inference-nlp-azure-1  |  81%|████████  | 175/216 [02:22<00:33,  1.24it/s]
inference-nlp-azure-1  |  81%|████████▏ | 176/216 [02:23<00:32,  1.24it/s]
inference-nlp-azure-1  |  82%|████████▏ | 177/216 [02:24<00:32,  1.19it/s]
inference-nlp-azure-1  |  82%|████████▏ | 178/216 [02:25<00:31,  1.20it/s]
inference-nlp-azure-1  |  83%|████████▎ | 179/216 [02:25<00:30,  1.22it/s]
inference-nlp-azure-1  |  83%|████████▎ | 180/216 [02:26<00:29,  1.24it/s]
inference-nlp-azure-1  |  84%|████████▍ | 181/216 [02:27<00:28,  1.24it/s]
inference-nlp-azure-1  |  84%|████████▍ | 182/216 [02:28<00:27,  1.24it/s]
inference-nlp-azure-1  |  85%|████████▍ | 183/216 [02:29<00:26,  1.24it/s]
inference-nlp-azure-1  |  85%|████████▌ | 184/216 [02:29<00:25,  1.24it/s]
inference-nlp-azure-1  |  86%|████████▌ | 185/216 [02:30<00:24,  1.25it/s]
inference-nlp-azure-1  |  86%|████████▌ | 186/216 [02:31<00:24,  1.24it/s]
inference-nlp-azure-1  |  87%|████████▋ | 187/216 [02:32<00:23,  1.25it/s]
inference-nlp-azure-1  |  87%|████████▋ | 188/216 [02:32<00:22,  1.25it/s]
inference-nlp-azure-1  |  88%|████████▊ | 189/216 [02:33<00:21,  1.24it/s]
inference-nlp-azure-1  |  88%|████████▊ | 190/216 [02:34<00:21,  1.24it/s]
inference-nlp-azure-1  |  88%|████████▊ | 191/216 [02:35<00:20,  1.23it/s]
inference-nlp-azure-1  |  89%|████████▉ | 192/216 [02:36<00:19,  1.24it/s]
inference-nlp-azure-1  |  89%|████████▉ | 193/216 [02:37<00:18,  1.23it/s]
inference-nlp-azure-1  |  90%|████████▉ | 194/216 [02:37<00:17,  1.24it/s]
inference-nlp-azure-1  |  90%|█████████ | 195/216 [02:38<00:16,  1.24it/s]
inference-nlp-azure-1  |  91%|█████████ | 196/216 [02:39<00:16,  1.21it/s]
inference-nlp-azure-1  |  91%|█████████ | 197/216 [02:40<00:15,  1.21it/s]
inference-nlp-azure-1  |  92%|█████████▏| 198/216 [02:41<00:14,  1.20it/s]
inference-nlp-azure-1  |  92%|█████████▏| 199/216 [02:42<00:13,  1.22it/s]
inference-nlp-azure-1  |  93%|█████████▎| 200/216 [02:42<00:13,  1.22it/s]
inference-nlp-azure-1  |  93%|█████████▎| 201/216 [02:43<00:12,  1.23it/s]
inference-nlp-azure-1  |  94%|█████████▎| 202/216 [02:44<00:11,  1.23it/s]
inference-nlp-azure-1  |  94%|█████████▍| 203/216 [02:45<00:10,  1.21it/s]
inference-nlp-azure-1  |  94%|█████████▍| 204/216 [02:46<00:09,  1.20it/s]
inference-nlp-azure-1  |  95%|█████████▍| 205/216 [02:46<00:09,  1.21it/s]
inference-nlp-azure-1  |  95%|█████████▌| 206/216 [02:47<00:08,  1.22it/s]
inference-nlp-azure-1  |  96%|█████████▌| 207/216 [02:48<00:07,  1.23it/s]
inference-nlp-azure-1  |  96%|█████████▋| 208/216 [02:49<00:06,  1.24it/s]
inference-nlp-azure-1  |  97%|█████████▋| 209/216 [02:50<00:05,  1.24it/s]
inference-nlp-azure-1  |  97%|█████████▋| 210/216 [02:50<00:04,  1.24it/s]
inference-nlp-azure-1  |  98%|█████████▊| 211/216 [02:51<00:03,  1.25it/s]
inference-nlp-azure-1  |  98%|█████████▊| 212/216 [02:52<00:03,  1.25it/s]
inference-nlp-azure-1  |  99%|█████████▊| 213/216 [02:53<00:02,  1.25it/s]
inference-nlp-azure-1  |  99%|█████████▉| 214/216 [02:54<00:01,  1.23it/s]
inference-nlp-azure-1  | 100%|█████████▉| 215/216 [02:55<00:00,  1.21it/s]
inference-nlp-azure-1  | 100%|██████████| 216/216 [02:55<00:00,  1.36it/s]
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Downloading builder script:   0%|          | 0.00/1.65k [00:00<?, ?B/s]�[A
inference-nlp-azure-1  | Downloading builder script: 4.21kB [00:00, 3.90MB/s]                   
inference-nlp-azure-1  | 
inference-nlp-azure-1  | 100%|██████████| 216/216 [02:55<00:00,  1.23it/s]
inference-nlp-azure-1  | 2022-10-18 21:43:00 [INFO] Save tuning history to /mnt/azureml/cr/j/e4712a572fab403692800d480981321b/exe/wd/nc_workspace/2022-10-18_21-39-23/./history.snapshot.
inference-nlp-azure-1  | 2022-10-18 21:43:00 [INFO] FP32 baseline is: [Accuracy: 0.8394, Duration (seconds): 177.0837]
inference-nlp-azure-1  | /opt/miniconda/lib/python3.8/site-packages/torch/ao/quantization/qconfig.py:92: UserWarning: QConfigDynamic is going to be deprecated in PyTorch 1.12, please use QConfig instead
inference-nlp-azure-1  |   warnings.warn("QConfigDynamic is going to be deprecated in PyTorch 1.12, please use QConfig instead")
inference-nlp-azure-1  | 2022-10-18 21:43:00 [INFO] Fx trace of the entire model failed, We will conduct auto quantization
inference-nlp-azure-1  | /opt/miniconda/lib/python3.8/site-packages/torch/ao/quantization/observer.py:176: UserWarning: Please use quant_min and quant_max to specify the range for observers.                     reduce_range will be deprecated in a future release of PyTorch.
inference-nlp-azure-1  |   warnings.warn(
inference-nlp-azure-1  | /opt/miniconda/lib/python3.8/site-packages/torch/nn/quantized/_reference/modules/utils.py:25: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
inference-nlp-azure-1  |   torch.tensor(weight_qparams["scale"], dtype=torch.float, device=device))
inference-nlp-azure-1  | /opt/miniconda/lib/python3.8/site-packages/torch/nn/quantized/_reference/modules/utils.py:28: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
inference-nlp-azure-1  |   torch.tensor(weight_qparams["zero_point"], dtype=zero_point_dtype, device=device))
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |*********Mixed Precision Statistics********|
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] +---------------------+-------+------+------+
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |       Op Type       | Total | INT8 | FP32 |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] +---------------------+-------+------+------+
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |      Embedding      |   3   |  3   |  0   |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |      LayerNorm      |   25  |  0   |  25  |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] | quantize_per_tensor |   74  |  74  |  0   |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |        Linear       |   74  |  74  |  0   |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |      dequantize     |   74  |  74  |  0   |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |     input_tensor    |   24  |  24  |  0   |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] |       Dropout       |   24  |  0   |  24  |
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] +---------------------+-------+------+------+
inference-nlp-azure-1  | 2022-10-18 21:43:30 [INFO] Pass quantize model elapsed time: 30514.29 ms
inference-nlp-azure-1  | The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: sentence2, idx, sentence1. If sentence2, idx, sentence1 are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
inference-nlp-azure-1  | /opt/miniconda/lib/python3.8/site-packages/intel_extension_for_pytorch/frontend.py:261: UserWarning: Conv BatchNorm folding failed during the optimize process.
inference-nlp-azure-1  |   warnings.warn("Conv BatchNorm folding failed during the optimize process.")

...

inference-nlp-azure-1  | tokenizer config file saved in ./outputs/tokenizer_config.json
inference-nlp-azure-1  | Special tokens file saved in ./outputs/special_tokens_map.json
inference-nlp-azure-1  | Configuration saved in ./outputs/config.json
inference-nlp-azure-1  | Convertion complete!
inference-nlp-azure-1  | Cleaning up all outstanding Run operations, waiting 300.0 seconds
inference-nlp-azure-1  | 1 items cleaning up...
inference-nlp-azure-1  | Cleanup took 0.050061702728271484 seconds
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Execution Summary
inference-nlp-azure-1  | =================
inference-nlp-azure-1  | RunId: INC_PTQ_1666128985_788b95f3
inference-nlp-azure-1  | Web View: https://ml.azure.com/runs/INC_PTQ_1666128985_788b95f3?wsid=/subscriptions/0a5dbdd4-ee35-483f-b248-93e05a52cd9f/resourcegroups/intel_azureml_resource/workspaces/cloud_t7_i9&tid=46c98d88-e344-4ed4-8496-4ed7712e255d
inference-nlp-azure-1  | 
inference-nlp-azure-1  | Registering model inc_ptq_bert_model_mrpc
inference-nlp-azure-1  | Found existing cluster, use it.
inference-nlp-azure-1  | Service hf-aks-1
inference-nlp-azure-1  | Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
inference-nlp-azure-1  | Running
inference-nlp-azure-1  | 2022-10-19 02:44:14+00:00 Creating Container Registry if not exists.
inference-nlp-azure-1  | 2022-10-19 02:44:14+00:00 Registering the environment.
inference-nlp-azure-1  | 2022-10-19 02:44:15+00:00 Use the existing image.
inference-nlp-azure-1  | 2022-10-19 02:44:17+00:00 Creating resources in AKS.
inference-nlp-azure-1  | 2022-10-19 02:44:18+00:00 Submitting deployment to compute.
inference-nlp-azure-1  | 2022-10-19 02:44:18+00:00 Checking the status of deployment hf-aks-1..
inference-nlp-azure-1  | 2022-10-19 02:45:01+00:00 Checking the status of inference endpoint hf-aks-1.
inference-nlp-azure-1  | Succeeded
inference-nlp-azure-1  | AKS service creation operation finished, operation "Succeeded"
inference-nlp-azure-1  | Healthy
inference-nlp-azure-1  | {'result': '0', 'sentence1': 'Shares of Genentech, a much larger company with several products on the market, rose more than 2 percent.', 'sentence2': 'Shares of Xoma fell 16 percent in early trade, while shares of Genentech, a much larger company with several products on the market, were up 2 percent.', 'logits': 'tensor([[ 2.3388, -2.3361]], grad_fn=<AddmmBackward0>)', 'probability': 'tensor([0.9908, 0.0092], grad_fn=<SoftmaxBackward0>)', 'input_data': "{'input_ids': tensor([[  101,  6661,  1997,  4962, 10111,  2818,  1010,  1037,  2172,  3469,\n          2194,  2007,  2195,  3688,  2006,  1996,  3006,  1010,  3123,  2062,\n          2084,  1016,  3867,  1012,   102,  6661,  1997,  1060,  9626,  3062,\n          2385,  3867,  1999,  2220,  3119,  1010,  2096,  6661,  1997,  4962,\n         10111,  2818,  1010,  1037,  2172,  3469,  2194,  2007,  2195,  3688,\n          2006,  1996,  3006,  1010,  2020,  2039,  1016,  3867,  1012,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n         0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}", 'model_path': '/var/azureml-app/azureml-models/inc_ptq_bert_model_mrpc/2/outputs'}
inference-nlp-azure-1  | Classification result: 0
inference-nlp-azure-1 exited with code 0