Name		Name	Last commit message	Last commit date
parent directory ..
1.densenet-local.ipynb		1.densenet-local.ipynb
2.bidaf-aks-v100.ipynb		2.bidaf-aks-v100.ipynb
README.md		README.md
deploymentconfig.json		deploymentconfig.json

README.md

Real-time inference on GPUs in Azure Machine Learning

description: learn how to efficiently deploy to GPUs using triton inference server

The notebooks in this directory show how to take advantage of the interoperability between Azure Machine Learning and NVIDIA Triton Inference Server for cost-effective real time inference on GPUs.

Python instructions

Open either of the sample notebooks in this directory to run Triton in Python.

CLI instructions

You must have the latest version of the Azure Machine Learning CLI installed to run these commands. Follow the instructions here to download or upgrade the CLI.

python ../../code/deployment/triton/model_utils.py
az ml model register -p ../../models/triton -n bidaf-model --model-framework=Multi
az ml model deploy -n triton-webservice -m bidaf-model:1 --dc deploymentconfig.json --compute-target aks-gpu-deploy

Once you have deployed, try querying the model metadata endpoint:

# Get the scoring URI
az ml service show --name triton-webservice
# Get the keys
az ml service get-keys --name triton-webservice
curl -H "Authorization: Bearer <primaryKey>" -v <scoring-uri>v2/ready

Read more about the KFServing predict API here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy-triton

deploy-triton

README.md

Real-time inference on GPUs in Azure Machine Learning

Python instructions

CLI instructions

Files

deploy-triton

Directory actions

More options

Directory actions

More options

Latest commit

History

deploy-triton

Folders and files

parent directory

README.md

Real-time inference on GPUs in Azure Machine Learning

Python instructions

CLI instructions