Skip to content

Latest commit

 

History

History
 
 

deploy-triton

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Real-time inference on GPUs in Azure Machine Learning

description: learn how to efficiently deploy to GPUs using triton inference server

The notebooks in this directory show how to take advantage of the interoperability between Azure Machine Learning and NVIDIA Triton Inference Server for cost-effective real time inference on GPUs.

Python instructions

Open either of the sample notebooks in this directory to run Triton in Python.

CLI instructions

You must have the latest version of the Azure Machine Learning CLI installed to run these commands. Follow the instructions here to download or upgrade the CLI.

python ../../code/deployment/triton/model_utils.py
az ml model register -p ../../models/triton -n bidaf-model --model-framework=Multi
az ml model deploy -n triton-webservice -m bidaf-model:1 --dc deploymentconfig.json --compute-target aks-gpu-deploy

Once you have deployed, try querying the model metadata endpoint:

# Get the scoring URI
az ml service show --name triton-webservice
# Get the keys
az ml service get-keys --name triton-webservice
curl -H "Authorization: Bearer <primaryKey>" -v <scoring-uri>v2/ready

Read more about the KFServing predict API here.