From c1369fcbf1b78d38b76933e0b0233a61ff20241e Mon Sep 17 00:00:00 2001 From: udaij12 Date: Tue, 2 Jul 2024 20:02:02 -0700 Subject: [PATCH 1/4] updating examples --- .../README.md | 22 ++++----- examples/Huggingface_Transformers/README.md | 14 +++--- examples/MMF-activity-recognition/README.md | 29 ++++++----- .../dog_breed_classification/README.md | 2 +- .../nmt_transformers_pipeline/README.md | 2 +- .../cloud_storage_stream_inference/README.md | 2 +- examples/cloudformation/README.md | 2 +- examples/dcgan_fashiongen/Readme.md | 12 ++--- examples/diffusers/Readme.md | 2 +- examples/image_classifier/alexnet/README.md | 8 ++-- .../image_classifier/densenet_161/README.md | 6 +-- examples/image_classifier/mnist/README.md | 18 +++---- .../near_real_time_video/README.md | 4 +- .../resnet_152_batch/README.md | 10 ++-- examples/image_classifier/resnet_18/README.md | 6 +-- .../resnet_18/ReactJSExample/README.md | 2 +- .../image_classifier/squeezenet/README.md | 4 +- examples/image_classifier/vgg_16/README.md | 10 ++-- examples/image_segmenter/deeplabv3/README.md | 4 +- examples/image_segmenter/fcn/README.md | 4 +- .../intel_extension_for_pytorch/README.md | 48 +++++++++---------- .../Huggingface_accelerate/Readme.md | 7 +-- .../Huggingface_accelerate/llama/Readme.md | 2 +- examples/large_models/deepspeed/Readme.md | 2 +- examples/large_models/deepspeed_mii/Readme.md | 2 +- .../large_models/diffusion_fast/README.md | 2 +- examples/large_models/gpt_fast/README.md | 2 +- .../gpt_fast_mixtral_moe/README.md | 2 +- .../segment_anything_fast/README.md | 2 +- examples/large_models/vllm/llama3/Readme.md | 2 +- examples/large_models/vllm/lora/Readme.md | 2 +- examples/large_models/vllm/mistral/Readme.md | 2 +- examples/micro_batching/README.md | 2 +- examples/nmt_transformer/README.md | 28 +++++------ examples/nvidia_dali/README.md | 4 +- examples/object_detector/fast-rcnn/README.md | 6 +-- examples/object_detector/maskrcnn/README.md | 4 +- .../object_detector/yolo/yolov8/README.md | 2 +- examples/pt2/torch_compile/README.md | 2 +- examples/pt2/torch_compile_hpu/README.md | 2 +- examples/pt2/torch_compile_openvino/README.md | 2 +- .../stable_diffusion/README.md | 2 +- .../pt2/torch_export_aot_compile/README.md | 2 +- examples/pt2/torch_inductor_caching/README.md | 4 +- examples/speech2text_wav2vec2/README.md | 2 +- examples/text_classification/README.md | 14 +++--- .../README.md | 2 +- .../SpeechT5/README.md | 2 +- .../WaveGlow/README.md | 2 +- .../torch_tensorrt/torchcompile/README.md | 3 +- examples/torch_tensorrt/torchscript/README.md | 2 +- examples/torchrec_dlrm/README.md | 4 +- examples/xgboost_classfication/README.md | 2 +- 53 files changed, 162 insertions(+), 169 deletions(-) diff --git a/examples/FasterTransformer_HuggingFace_Bert/README.md b/examples/FasterTransformer_HuggingFace_Bert/README.md index af66501b20..20af07bccb 100644 --- a/examples/FasterTransformer_HuggingFace_Bert/README.md +++ b/examples/FasterTransformer_HuggingFace_Bert/README.md @@ -1,8 +1,8 @@ -## Faster Transformer +## Faster Transformer Batch inferencing with Transformers faces two challenges -- Large batch sizes suffer from higher latency and small or medium-sized batches this will become kernel latency launch bound. +- Large batch sizes suffer from higher latency and small or medium-sized batches this will become kernel latency launch bound. - Padding wastes a lot of compute, (batchsize, seq_length) requires to pad the sequence to (batchsize, max_length) where difference between avg_length and max_length results in a considerable waste of computation, increasing the batch size worsen this situation. [Faster Transformers](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/bert/run_glue.py) (FT) from Nvidia along with [Efficient Transformers](https://github.com/bytedance/effective_transformer) (EFFT) that is built on top of FT address the above two challenges, by fusing the CUDA kernels and dynamically removing padding during computations. The current implementation from [Faster Transformers](https://github.com/NVIDIA/FasterTransformer/blob/main/examples/pytorch/bert/run_glue.py) support BERT like encoder and decoder layers. In this example, we show how to get a Torchscripted (traced) EFFT variant of Bert models from HuggingFace (HF) for sequence classification and question answering and serve it. @@ -10,7 +10,7 @@ Batch inferencing with Transformers faces two challenges ### How to get a Torchscripted (Traced) EFFT of HF Bert model and serving it -**Requirements** +**Requirements** Running Faster Transformer at this point is recommended through [NVIDIA docker and NGC container](https://github.com/NVIDIA/FasterTransformer#requirements), also it requires [Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/) or [Turing](https://www.nvidia.com/en-us/geforce/turing/) or [Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/) based GPU. For this example we have used a **g4dn.2xlarge** EC2 instance that has a T4 GPU. @@ -34,9 +34,9 @@ mkdir -p build cd build -cmake -DSM=75 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON .. # -DSM = 70 for V100 gpu ------- 60 (P40) or 61 (P4) or 70 (V100) or 75(T4) or 80 (A100), +cmake -DSM=75 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON .. # -DSM = 70 for V100 gpu ------- 60 (P40) or 61 (P4) or 70 (V100) or 75(T4) or 80 (A100), -make +make pip install transformers==2.5.1 @@ -45,8 +45,8 @@ cd /workspace # clone Torchserve to access examples git clone https://github.com/pytorch/serve.git -# install torchserve -cd serve +# install torchserve +cd serve pip install -r requirements/common.txt @@ -99,7 +99,7 @@ mkdir model_store mv BERTSeqClassification.mar model_store/ -torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs +torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --disable-token-auth --enable-model-api curl -X POST http://127.0.0.1:8080/predictions/my_tc -T ../Huggingface_Transformers/Seq_classification_artifacts/sample_text_captum_input.txt @@ -132,7 +132,7 @@ cd /workspace/FasterTransformer/build/ # --data_type can be fp16 or fp32 python pytorch/Bert_FT_trace.py --mode question_answering --model_name_or_path "/workspace/serve/Transformer_model" --tokenizer_name "bert-base-uncased" --batch_size 1 --data_type fp16 --model_type thsext -cd - +cd - # make sure to change the ../Huggingface_Transformers/setup_config.json "save_mode":"torchscript" @@ -142,10 +142,10 @@ mkdir model_store mv BERTQA.mar model_store/ -torchserve --start --model-store model_store --models my_tc=BERTQA.mar --ncs +torchserve --start --model-store model_store --models my_tc=BERTQA.mar --ncs --disable-token-auth --enable-model-api curl -X POST http://127.0.0.1:8080/predictions/my_tc -T ../Huggingface_Transformers/QA_artifacts/sample_text_captum_input.txt ``` -#### \ No newline at end of file +#### diff --git a/examples/Huggingface_Transformers/README.md b/examples/Huggingface_Transformers/README.md index c93c7eae95..a555190977 100644 --- a/examples/Huggingface_Transformers/README.md +++ b/examples/Huggingface_Transformers/README.md @@ -114,7 +114,7 @@ To register the model on TorchServe using the above model archive file, we run t ``` mkdir model_store mv BERTSeqClassification.mar model_store/ -torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --disable-token --ncs +torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --disable-token --ncs --disable-token-auth --enable-model-api ``` @@ -164,7 +164,7 @@ torch-model-archiver --model-name BERTTokenClassification --version 1.0 --serial ``` mkdir model_store mv BERTTokenClassification.mar model_store -torchserve --start --model-store model_store --models my_tc=BERTTokenClassification.mar --disable-token --ncs +torchserve --start --model-store model_store --models my_tc=BERTTokenClassification.mar --disable-token --ncs --disable-token-auth --enable-model-api ``` ### Run an inference @@ -208,7 +208,7 @@ torch-model-archiver --model-name BERTQA --version 1.0 --serialized-file Transfo ``` mkdir model_store mv BERTQA.mar model_store -torchserve --start --model-store model_store --models my_tc=BERTQA.mar --disable-token --ncs +torchserve --start --model-store model_store --models my_tc=BERTQA.mar --disable-token --ncs --disable-token-auth --enable-model-api ``` ### Run an inference To run an inference: `curl -X POST http://127.0.0.1:8080/predictions/my_tc -T QA_artifacts/sample_text_captum_input.txt` @@ -255,7 +255,7 @@ To register the model on TorchServe using the above model archive file, we run t ``` mkdir model_store mv Textgeneration.mar model_store/ -torchserve --start --model-store model_store --models my_tc=Textgeneration.mar --disable-token --ncs +torchserve --start --model-store model_store --models my_tc=Textgeneration.mar --disable-token --ncs --disable-token-auth --enable-model-api ``` ### Run an inference @@ -272,7 +272,7 @@ For batch inference the main difference is that you need set the batch size whil ``` mkdir model_store mv BERTSeqClassification.mar model_store/ - torchserve --start --model-store model_store --disable-token --ncs + torchserve --start --model-store model_store --disable-token --ncs --disable-token-auth --enable-model-api curl -X POST "localhost:8081/models?model_name=BERTSeqClassification&url=BERTSeqClassification.mar&batch_size=4&max_batch_delay=5000&initial_workers=3&synchronous=true" ``` @@ -297,7 +297,7 @@ For batch inference the main difference is that you need set the batch size whil ``` mkdir model_store mv BERTSeqClassification.mar model_store/ - torchserve --start --model-store model_store --ts-config config.properties --models BERTSeqClassification= BERTSeqClassification.mar + torchserve --start --model-store model_store --ts-config config.properties --models BERTSeqClassification= BERTSeqClassification.mar --disable-token-auth --enable-model-api ``` Now to run the batch inference following command can be used: @@ -377,7 +377,7 @@ To register the model on TorchServe using the above model archive file, we run t ``` mkdir model_store mv Textgeneration.mar model_store/ -torchserve --start --model-store model_store --disable-token +torchserve --start --model-store model_store --disable-token --disable-token-auth --enable-model-api curl -X POST "localhost:8081/models?model_name=Textgeneration&url=Textgeneration.mar&batch_size=1&max_batch_delay=5000&initial_workers=1&synchronous=true" ``` diff --git a/examples/MMF-activity-recognition/README.md b/examples/MMF-activity-recognition/README.md index b9965a0222..42bc5ff53c 100644 --- a/examples/MMF-activity-recognition/README.md +++ b/examples/MMF-activity-recognition/README.md @@ -1,16 +1,16 @@ ### MultiModal (MMF) Framework -Multi modality learning helps the AI solutions to get signals from different input sources such as language, video, audio and combine their results to improve the inferences. +Multi modality learning helps the AI solutions to get signals from different input sources such as language, video, audio and combine their results to improve the inferences. -[MultiModal (MMF) framework](https://ai.facebook.com/blog/announcing-mmf-a-framework-for-multimodal-ai-models/) is a modular deep learning framework for vision and language multimodal research. MMF provides starter code for several multimodal challenges, including the Hateful Memes, VQA, TextVQA, and TextCaps challenges. You can learn more about MMF from their [website](https://mmf.readthedocs.io/en/latest/?=IwAR3P8zccSXqNt1XCCUv4Ysq0qkD515T6K9JnhUwpNcz0zzRl75FNSio9REU) a [Github](https://github.com/facebookresearch/mmf?fbclid=IwAR2OZi-8rQaxO3uwLxwvvvr9cuY8J6h0JP_g6BBM-qM7wpnNYEZEmWOQ6mc). +[MultiModal (MMF) framework](https://ai.facebook.com/blog/announcing-mmf-a-framework-for-multimodal-ai-models/) is a modular deep learning framework for vision and language multimodal research. MMF provides starter code for several multimodal challenges, including the Hateful Memes, VQA, TextVQA, and TextCaps challenges. You can learn more about MMF from their [website](https://mmf.readthedocs.io/en/latest/?=IwAR3P8zccSXqNt1XCCUv4Ysq0qkD515T6K9JnhUwpNcz0zzRl75FNSio9REU) a [Github](https://github.com/facebookresearch/mmf?fbclid=IwAR2OZi-8rQaxO3uwLxwvvvr9cuY8J6h0JP_g6BBM-qM7wpnNYEZEmWOQ6mc). In the following, we first show how to serve the MMF model with Torchserve using a pre-trained MMF model for activity recognition, then, we will discuss the details of the custom handler and how to train your activity recognition model in MMF. ### Serving Activity Recognition MMF Model with Torchserve -This section, we have the trained MMF model for activity recognition, it can be served in production with [Torchserve](https://github.com/pytorch/serve). +This section, we have the trained MMF model for activity recognition, it can be served in production with [Torchserve](https://github.com/pytorch/serve). To serve a model using Torchserve, we need to bundle the model artifacts and a handler into a mar file which is an archive format that torchserve uses to serve our model, model_archiver package does this step. The mar file will get extracted in a temp directory and the Path will be added to the PYTHONPATH. @@ -41,7 +41,7 @@ After training the MMF model, the final checkpoints are saved in the mmf/save/ d `wget https://mmfartifacts.s3-us-west-2.amazonaws.com/MMF_activity_recognition.mar` - If mar file is downloaded then skip this and move to the next step. The other option is to download a pre-trained model, along with labels and config for this example and package them to a mar file. + If mar file is downloaded then skip this and move to the next step. The other option is to download a pre-trained model, along with labels and config for this example and package them to a mar file. ``` wget https://mmfartifacts.s3-us-west-2.amazonaws.com/mmf_transformer_Charades_final.pth @@ -55,7 +55,7 @@ torch-model-archiver --model-name MMF_activity_recognition --version 1.0 --seria Running the above commands will result in MMF_activity_recognition mar in the current directory. -Note as MMF uses torch.cuda.current_device() to decide if inputs are on correct device, we used device context manager in the handler. This means you won't be able to set the number_of_gpu to zero in this example, basically to serve this example on cpu, you will need to run on a cpu instance or masking the cuda devices using export CUDA_VISIBLE_DEVICES="". +Note as MMF uses torch.cuda.current_device() to decide if inputs are on correct device, we used device context manager in the handler. This means you won't be able to set the number_of_gpu to zero in this example, basically to serve this example on cpu, you will need to run on a cpu instance or masking the cuda devices using export CUDA_VISIBLE_DEVICES="". The **next step** is to make a model_store and move the .mar file to it: @@ -67,7 +67,7 @@ mv MMF_activity_recognition.mar model_store Now we can start serving our model: ``` -torchserve --start --model-store model_store +torchserve --start --model-store model_store --disable-token-auth --enable-model-api curl -X POST "localhost:8081/models?model_name=MMF_activity_recognition&url=MMF_activity_recognition.mar&batch_size=1&max_batch_delay=5000&initial_workers=1&synchronous=true" ``` @@ -85,14 +85,14 @@ This will write the results in response.txt in the current directory. #### Tochserve Custom Handler -For the activity recognition MMF model, we need to provide a custom handler. The handler generally extends the [Base handler](https://github.com/pytorch/serve/tree/master/ts/torch_handler/base_handler.py). The [handler](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py) for MMF model, needs to load and initialize the model in the initialize method and then in the preprocess, mimics the logic in dataset processors to make a sample form the input video and its related text ( preprocess the video and make the related tensors to video, audio and text). The inference method runs the preprocessed samples through the MMF model and sends the outputs to the post process method. +For the activity recognition MMF model, we need to provide a custom handler. The handler generally extends the [Base handler](https://github.com/pytorch/serve/tree/master/ts/torch_handler/base_handler.py). The [handler](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py) for MMF model, needs to load and initialize the model in the initialize method and then in the preprocess, mimics the logic in dataset processors to make a sample form the input video and its related text ( preprocess the video and make the related tensors to video, audio and text). The inference method runs the preprocessed samples through the MMF model and sends the outputs to the post process method. -To initialize the MMF model in the [initialization method](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L65), there are few points to consider. We need to load [config](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L68) using OmegaConfing, then [setup_very_basic_config](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L70) function from mmf.utils.logger and [setup_imports](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L71) from mmf.utils.env need to be called to setup the environment for loading the model. Finally to load the [model](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L72), we pass the model config to the [MMFTransformer](https://github.com/facebookresearch/mmf/tree/master/mmf/models/mmf_transformer.py) model. +To initialize the MMF model in the [initialization method](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L65), there are few points to consider. We need to load [config](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L68) using OmegaConfing, then [setup_very_basic_config](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L70) function from mmf.utils.logger and [setup_imports](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L71) from mmf.utils.env need to be called to setup the environment for loading the model. Finally to load the [model](https://github.com/pytorch/serve/tree/master/examples/MMF-activity-recognition/handler.py#L72), we pass the model config to the [MMFTransformer](https://github.com/facebookresearch/mmf/tree/master/mmf/models/mmf_transformer.py) model. ### Activity Recognition from Videos using MMF -We are going to present an example of activity recognition on Charades video dataset. Basically in this example three modalities will be used to classify the activity from the video, image, audio and text. Images are extracted frames from the video, and audio is extracted from the video and text is the captions related to frames in the video. In this case, embedding for each of the modalities are captured and then [MMFTransformer](https://github.com/facebookresearch/mmf/tree/master/mmf/models/mmf_transformer.py) from the model zoo has been used for fusion of the embeddings. +We are going to present an example of activity recognition on Charades video dataset. Basically in this example three modalities will be used to classify the activity from the video, image, audio and text. Images are extracted frames from the video, and audio is extracted from the video and text is the captions related to frames in the video. In this case, embedding for each of the modalities are captured and then [MMFTransformer](https://github.com/facebookresearch/mmf/tree/master/mmf/models/mmf_transformer.py) from the model zoo has been used for fusion of the embeddings. There are a number of steps based on the discussed concepts on MMF website: @@ -100,25 +100,25 @@ We are going to present an example of activity recognition on Charades video dat 2. Define a model for our training, which is MMFTransformer model in this case. 3. Set up configs for dataset, model, the experiment (configs for training job). -In the following we discuss each of the steps in more details. +In the following we discuss each of the steps in more details. #### New Dataset -In this example Charades dataset has been used which is a video dataset added in the [dataset zoo](https://github.com/facebookresearch/mmf/tree/master/mmf/datasets/builders/charades). We can define a new dataset in MMF by following this [guide](https://mmf.sh/docs/tutorials/dataset). **To add a new dataset**, we need to define a new dataset class which extends the Basedataset class from mmf.datasets.base_dataset, where we need to override three methods, __init__, __getitem__ and __len__. These methods basically define how to initialize ( set the path to the dataset), get each item from the dataset and then provide the length of the dataset. The Charades dataset class can be found [here](https://github.com/facebookresearch/mmf/tree/master/mmf/datasets/builders/charades/dataset.py#L16). Also, we are able to set the [processors](https://github.com/facebookresearch/mmf/tree/master/mmf/configs/datasets/charades/defaults.yaml#L22) in the dataset config file and initialize them in the dataset class. +In this example Charades dataset has been used which is a video dataset added in the [dataset zoo](https://github.com/facebookresearch/mmf/tree/master/mmf/datasets/builders/charades). We can define a new dataset in MMF by following this [guide](https://mmf.sh/docs/tutorials/dataset). **To add a new dataset**, we need to define a new dataset class which extends the Basedataset class from mmf.datasets.base_dataset, where we need to override three methods, __init__, __getitem__ and __len__. These methods basically define how to initialize ( set the path to the dataset), get each item from the dataset and then provide the length of the dataset. The Charades dataset class can be found [here](https://github.com/facebookresearch/mmf/tree/master/mmf/datasets/builders/charades/dataset.py#L16). Also, we are able to set the [processors](https://github.com/facebookresearch/mmf/tree/master/mmf/configs/datasets/charades/defaults.yaml#L22) in the dataset config file and initialize them in the dataset class. The **next step** is to define a dataset builder class which extends the "BaseDatasetBuilder" class from mmf.datasets.base_dataset_builder. In this class essentially we need to override three methods, __init__, __build__ and __load__. Where in the __init __ method, the dataset class name is set (as we defined in the previous step), the __build__ method, is responsible for downloading the dataset and __load__ method is taking care of loading the dataset, builds an object of class inheriting "BaseDataset" which contains your dataset logic and returns it. The dataset builder code is also available [here](https://github.com/facebookresearch/mmf/tree/master/mmf/datasets/builders/charades/builder.py). -**Final step** is to register the dataset builder with mmf, where we can use the registry function as decorator, such as @registry.register_builder("charades"). +**Final step** is to register the dataset builder with mmf, where we can use the registry function as decorator, such as @registry.register_builder("charades"). #### Model Definition To train a multimodal model, we need to define a model that will take the features from our modalities (using modality encoders) as inputs and trains for the task in hand, where in this example is activity recognition. For activity recognition basically the task is classification on activity labels. In this example, [MMFTransformer](https://github.com/facebookresearch/mmf/tree/master/mmf/models/mmf_transformer.py) is used that extend [base transformers](https://github.com/facebookresearch/mmf/tree/master/mmf/models/transformers/base.py) from mmf.models.transformers.base model available in the MMF model zoo. - Generally, any defined MMF model class requires to extend the BaseModel from mmf.models.base_model, where we need to pass the configs in the __init__ method and implement the build and forward method. Init method takes the related config and build method builds all the essential module used in the model including encoders. + Generally, any defined MMF model class requires to extend the BaseModel from mmf.models.base_model, where we need to pass the configs in the __init__ method and implement the build and forward method. Init method takes the related config and build method builds all the essential module used in the model including encoders. #### Configurations -As indicated in [MMF docs](https://mmf.sh/docs/notes/configuration), there are separate config files for datasets, models and experiments, all the configs can be found in config directory. For this example, [dataset config](https://github.com/facebookresearch/mmf/tree/master/mmf/configs/datasets/charades/defaults.yaml) sets the path to different sets (train/val/test), and processors and their related parameters. Similarly, [model config](https://github.com/facebookresearch/mmf/tree/master/mmf/configs/models/mmf_transformer/defaults.yaml) can set the specifics for the model including in hierarchical mode , different modalities, the encoder used in each modality, the model head type, loss function and etc. [Experiment config](https://github.com/facebookresearch/mmf/tree/master/projects/mmf_transformer/configs/charades/direct.yaml) is where setting for experiment such as optimizer specifics, scheduler, evaluation metrics, training parameters such as batch-size, number of iterations and so on can be configured. +As indicated in [MMF docs](https://mmf.sh/docs/notes/configuration), there are separate config files for datasets, models and experiments, all the configs can be found in config directory. For this example, [dataset config](https://github.com/facebookresearch/mmf/tree/master/mmf/configs/datasets/charades/defaults.yaml) sets the path to different sets (train/val/test), and processors and their related parameters. Similarly, [model config](https://github.com/facebookresearch/mmf/tree/master/mmf/configs/models/mmf_transformer/defaults.yaml) can set the specifics for the model including in hierarchical mode , different modalities, the encoder used in each modality, the model head type, loss function and etc. [Experiment config](https://github.com/facebookresearch/mmf/tree/master/projects/mmf_transformer/configs/charades/direct.yaml) is where setting for experiment such as optimizer specifics, scheduler, evaluation metrics, training parameters such as batch-size, number of iterations and so on can be configured. #### Running the experiment @@ -129,4 +129,3 @@ mmf_run config=projects/mmf_transformer/configs/charades/direct.yaml run_type=t ``` Settings for each of the training parameters can be specified from command line as well. At the end of the training, the checkpoints are saved in mmf/save directory. We will use the saved checkpoints in the for serving the model. - diff --git a/examples/Workflows/dog_breed_classification/README.md b/examples/Workflows/dog_breed_classification/README.md index 2159b488bd..2babc69336 100644 --- a/examples/Workflows/dog_breed_classification/README.md +++ b/examples/Workflows/dog_breed_classification/README.md @@ -24,7 +24,7 @@ $ torch-workflow-archiver -f --workflow-name dog_breed_wf --spec-file workflow_d ## Serve the workflow ``` -$ torchserve --start --model-store model_store/ --workflow-store wf_store/ --ncs +$ torchserve --start --model-store model_store/ --workflow-store wf_store/ --ncs --disable-token-auth --enable-model-api $ curl -X POST "http://127.0.0.1:8081/workflows?url=dog_breed_wf.war" { "status": "Workflow dog_breed_wf has been registered and scaled successfully." diff --git a/examples/Workflows/nmt_transformers_pipeline/README.md b/examples/Workflows/nmt_transformers_pipeline/README.md index 02118d4168..5c2d733c0a 100644 --- a/examples/Workflows/nmt_transformers_pipeline/README.md +++ b/examples/Workflows/nmt_transformers_pipeline/README.md @@ -31,7 +31,7 @@ $ mkdir model_store wf_store $ mv $TORCH_SERVE_DIR/examples/nmt_transformer/model_store/*.mar model_store/ $ torch-workflow-archiver -f --workflow-name nmt_wf_dual --spec-file nmt_workflow_dualtranslation.yaml --handler nmt_workflow_handler_dualtranslation.py --export-path wf_store/ $ torch-workflow-archiver -f --workflow-name nmt_wf_re --spec-file nmt_workflow_retranslation.yaml --handler nmt_workflow_handler_retranslation.py --export-path wf_store/ -$ torchserve --start --model-store model_store/ --workflow-store wf_store/ --ncs --ts-config config.properties +$ torchserve --start --model-store model_store/ --workflow-store wf_store/ --ncs --ts-config config.properties --disable-token-auth --enable-model-api ``` ## Serve the workflow diff --git a/examples/cloud_storage_stream_inference/README.md b/examples/cloud_storage_stream_inference/README.md index 722ca6df16..8606b39861 100644 --- a/examples/cloud_storage_stream_inference/README.md +++ b/examples/cloud_storage_stream_inference/README.md @@ -65,7 +65,7 @@ torch-model-archiver --model-name BERTSeqClassification --version 1.0 --serializ ``` mkdir model_store mv BERTSeqClassification.mar model_store/ -torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ts-config=config.properties --ncs +torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ts-config=config.properties --ncs --disable-token-auth --enable-model-api ``` 15) To check if the model is running diff --git a/examples/cloudformation/README.md b/examples/cloudformation/README.md index 1150bdc427..a14f2a6719 100644 --- a/examples/cloudformation/README.md +++ b/examples/cloudformation/README.md @@ -176,7 +176,7 @@ conda init bash # IMPORTANT: You may need to close and restart your shell after running 'conda init'. conda activate torchserve torchserve --stop -torchserve --start --model-store ./model_store --ts-config /etc/torchserve/config.properties +torchserve --start --model-store ./model_store --ts-config /etc/torchserve/config.properties --disable-token-auth --enable-model-api ``` * To terminate the instance and delete the stack you can run `aws cloudformation delete-stack --stack-name ` diff --git a/examples/dcgan_fashiongen/Readme.md b/examples/dcgan_fashiongen/Readme.md index f2c595a0b7..3df48aa959 100644 --- a/examples/dcgan_fashiongen/Readme.md +++ b/examples/dcgan_fashiongen/Readme.md @@ -1,6 +1,6 @@ # GAN(Generative Adversarial Networks) models using TorchServe - In this example we will demonstrate how to serve a GAN model using TorchServe. -- We have used a pretrained DCGAN model from [facebookresearch/pytorch_GAN_zoo](https://github.com/facebookresearch/pytorch_GAN_zoo) +- We have used a pretrained DCGAN model from [facebookresearch/pytorch_GAN_zoo](https://github.com/facebookresearch/pytorch_GAN_zoo) (Introduction to [DCGAN on FashionGen](https://pytorch.org/hub/facebookresearch_pytorch-gan-zoo_dcgan/)) ### 1. Create a Torch Model Archive @@ -13,13 +13,13 @@ The [create_mar.sh](create_mar.sh) script does the following : - Download a checkpoint file [DCGAN_fashionGen-1d67302.pth](https://dl.fbaipublicfiles.com/gan_zoo/DCGAN_fashionGen-1d67302.pth). (`--serialized-file`) - Provide a custom handler - [dcgan_fashiongen_handler.py](dcgan_fashiongen_handler.py). (`--handler`) -Alternatively, you can directly [download the dcgan_fashiongen.mar](https://torchserve.s3.amazonaws.com/mar_files/dcgan_fashiongen.mar) +Alternatively, you can directly [download the dcgan_fashiongen.mar](https://torchserve.s3.amazonaws.com/mar_files/dcgan_fashiongen.mar) ### 2. Start TorchServe and Register Model ``` mkdir modelstore mv dcgan_fashiongen.mar modelstore/ -torchserve --start --ncs --model-store ./modelstore --models dcgan_fashiongen.mar +torchserve --start --ncs --model-store ./modelstore --models dcgan_fashiongen.mar --disable-token-auth --enable-model-api ``` ### 3. Generate Images @@ -34,19 +34,19 @@ Invoke the predictions API and pass following payload(JSON) ``` curl -X POST -d '{"number_of_images":1}' -H "Content-Type: application/json" http://localhost:8080/predictions/dcgan_fashiongen -o img1.jpg ``` - > Result image should be similar to the one below - + > Result image should be similar to the one below - > ![Sample Image 1](sample-output/img1.jpg) 2. **Create '64' images of 'Men' wearing 'Shirts' in 'id_gridfs_1' pose** ``` curl -X POST -d '{"number_of_images":64, "input_gender":"Men", "input_category":"SHIRTS", "input_pose":"id_gridfs_1"}' -H "Content-Type: application/json" http://localhost:8080/predictions/dcgan_fashiongen -o img2.jpg ``` - > Result image should be similar to the one below - + > Result image should be similar to the one below - > ![Sample Image 2](sample-output/img2.jpg) 3. **Create '32' images of 'Women' wearing 'Dresses' in 'id_gridfs_3' pose** ``` curl -X POST -d '{"number_of_images":32, "input_gender":"Women", "input_category":"DRESSES", "input_pose":"id_gridfs_3"}' -H "Content-Type: application/json" http://localhost:8080/predictions/dcgan_fashiongen -o img3.jpg ``` - > Result image should be similar to the one below - + > Result image should be similar to the one below - > ![Sample Image 3](sample-output/img3.jpg) diff --git a/examples/diffusers/Readme.md b/examples/diffusers/Readme.md index 56d1b908e7..97ee301ccb 100644 --- a/examples/diffusers/Readme.md +++ b/examples/diffusers/Readme.md @@ -39,7 +39,7 @@ torch-model-archiver --model-name stable-diffusion --version 1.0 --handler stabl Update config.properties and start torchserve ```bash -torchserve --start --ts-config config.properties +torchserve --start --ts-config config.properties --disable-token-auth --enable-model-api ``` ### Step 5: Run inference diff --git a/examples/image_classifier/alexnet/README.md b/examples/image_classifier/alexnet/README.md index e29adc9179..c5ffa69a8e 100644 --- a/examples/image_classifier/alexnet/README.md +++ b/examples/image_classifier/alexnet/README.md @@ -7,7 +7,7 @@ wget https://download.pytorch.org/models/alexnet-owt-7be5be79.pth torch-model-archiver --model-name alexnet --version 1.0 --model-file ./serve/examples/image_classifier/alexnet/model.py --serialized-file alexnet-owt-7be5be79.pth --handler image_classifier --extra-files ./serve/examples/image_classifier/index_to_name.json mkdir model_store mv alexnet.mar model_store/ -torchserve --start --model-store model_store --models alexnet=alexnet.mar +torchserve --start --model-store model_store --models alexnet=alexnet.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/alexnet -T ./serve/examples/image_classifier/kitten.jpg ``` @@ -35,14 +35,14 @@ curl http://127.0.0.1:8080/predictions/alexnet -T ./serve/examples/image_classif example_input = torch.rand(1, 3, 224, 224) traced_script_module = torch.jit.trace(model, example_input) traced_script_module.save("alexnet.pt") - ``` - + ``` + * Use following commands to register alexnet torchscript model on TorchServe and run image prediction ```bash torch-model-archiver --model-name alexnet --version 1.0 --serialized-file alexnet.pt --extra-files ./serve/examples/image_classifier/index_to_name.json --handler image_classifier mkdir model_store mv alexnet.mar model_store/ - torchserve --start --model-store model_store --models alexnet=alexnet.mar + torchserve --start --model-store model_store --models alexnet=alexnet.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/alexnet -T ./serve/examples/image_classifier/kitten.jpg ``` diff --git a/examples/image_classifier/densenet_161/README.md b/examples/image_classifier/densenet_161/README.md index 54c805970a..911db41dca 100644 --- a/examples/image_classifier/densenet_161/README.md +++ b/examples/image_classifier/densenet_161/README.md @@ -12,7 +12,7 @@ Sample command to start torchserve with torch.compile: wget https://download.pytorch.org/models/densenet161-8d451a50.pth mkdir model_store torch-model-archiver --model-name densenet161 --version 1.0 --model-file model.py --serialized-file densenet161-8d451a50.pth --export-path model_store --extra-files ../../image_classifier/index_to_name.json --handler image_classifier --config-file model-config.yaml -f -torchserve --start --ncs --model-store model_store --models densenet161.mar +torchserve --start --ncs --model-store model_store --models densenet161.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/densenet161 -T ../../image_classifier/kitten.jpg ``` @@ -38,7 +38,7 @@ wget https://download.pytorch.org/models/densenet161-8d451a50.pth torch-model-archiver --model-name densenet161 --version 1.0 --model-file examples/image_classifier/densenet_161/model.py --serialized-file densenet161-8d451a50.pth --handler image_classifier --extra-files examples/image_classifier/index_to_name.json mkdir model_store mv densenet161.mar model_store/ -torchserve --start --model-store model_store --models densenet161=densenet161.mar +torchserve --start --model-store model_store --models densenet161=densenet161.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg ``` @@ -74,6 +74,6 @@ traced_script_module.save("densenet161.pt") torch-model-archiver --model-name densenet161_ts --version 1.0 --serialized-file densenet161.pt --extra-files examples/image_classifier/index_to_name.json --handler image_classifier mkdir model_store mv densenet161_ts.mar model_store/ -torchserve --start --model-store model_store --models densenet161=densenet161_ts.mar +torchserve --start --model-store model_store --models densenet161=densenet161_ts.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/densenet161 -T examples/image_classifier/kitten.jpg ``` diff --git a/examples/image_classifier/mnist/README.md b/examples/image_classifier/mnist/README.md index 6f09e036ad..c9494842a8 100644 --- a/examples/image_classifier/mnist/README.md +++ b/examples/image_classifier/mnist/README.md @@ -20,31 +20,31 @@ Run the commands given in following steps from the parent directory of the root * Step - 2: Train a MNIST digit recognition model using https://github.com/pytorch/examples/blob/master/mnist/main.py and save the state dict of model. We have added the pre-created [state dict](mnist_cnn.pt) of this model. * Step - 3: Write a custom handler to run the inference on your model. In this example, we have added a [custom_handler](mnist_handler.py) which runs the inference on the input grayscale images using the above model and recognizes the digit in the image. * Step - 4: Create a torch model archive using the torch-model-archiver utility to archive the above files. - + ```bash torch-model-archiver --model-name mnist --version 1.0 --model-file examples/image_classifier/mnist/mnist.py --serialized-file examples/image_classifier/mnist/mnist_cnn.pt --handler examples/image_classifier/mnist/mnist_handler.py ``` - + Step 5 is optional. Perform this step to use pytorch profiler - + * Step - 5: To enable pytorch profiler, set the following environment variable. - + ``` export ENABLE_TORCH_PROFILER=true ``` - + * Step - 6: Register the model on TorchServe using the above model archive file and run digit recognition inference - + ```bash mkdir model_store mv mnist.mar model_store/ - torchserve --start --model-store model_store --models mnist=mnist.mar --ts-config config.properties + torchserve --start --model-store model_store --models mnist=mnist.mar --ts-config config.properties --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/mnist -T examples/image_classifier/mnist/test_data/0.png ``` # Profiling inference output -The profiler information is printed in the torchserve logs / console +The profiler information is printed in the torchserve logs / console ![Profiler Stats](screenshots/mnist_profiler_stats.png) @@ -52,7 +52,7 @@ By default the pytorch profiler trace files are generated under "/tmp/pytorch_pr The path can be overridden by setting `on_trace_ready` parameter in `profiler_args` - [Example here](../../../test/pytest/profiler_utils/resnet_profiler_override.py) -And the trace files can be loaded in tensorboard using torch-tb-profiler. Check the following link for more information - https://github.com/pytorch/kineto/tree/main/tb_plugin +And the trace files can be loaded in tensorboard using torch-tb-profiler. Check the following link for more information - https://github.com/pytorch/kineto/tree/main/tb_plugin Install torch-tb-profiler and run the following command to view the results in UI diff --git a/examples/image_classifier/near_real_time_video/README.md b/examples/image_classifier/near_real_time_video/README.md index d7abde2321..3c97a535d9 100644 --- a/examples/image_classifier/near_real_time_video/README.md +++ b/examples/image_classifier/near_real_time_video/README.md @@ -41,7 +41,7 @@ Run the commands given in following steps from the parent directory of the root ```bash python examples/image_classifier/near_real_time_video/create_mar_file.py -torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --ts-config examples/image_classifier/near_real_time_video/config.properties +torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --ts-config examples/image_classifier/near_real_time_video/config.properties --disable-token-auth --enable-model-api python examples/image_classifier/near_real_time_video/request.py ``` @@ -95,7 +95,7 @@ Run the commands given in following steps from the parent directory of the root ```bash python examples/image_classifier/near_real_time_video/create_mar_file.py --client-batching -torchserve --start --model-store model_store --models resnet-18=resnet-18.mar +torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --disable-token-auth --enable-model-api python examples/image_classifier/near_real_time_video/request.py --client-batching ``` diff --git a/examples/image_classifier/resnet_152_batch/README.md b/examples/image_classifier/resnet_152_batch/README.md index d63a18d5f9..7f5e7341dd 100644 --- a/examples/image_classifier/resnet_152_batch/README.md +++ b/examples/image_classifier/resnet_152_batch/README.md @@ -6,7 +6,7 @@ wget https://download.pytorch.org/models/resnet152-394f9c45.pth torch-model-archiver --model-name resnet-152-batch --version 1.0 --model-file examples/image_classifier/resnet_152_batch/model.py --serialized-file resnet152-394f9c45.pth --handler image_classifier --extra-files examples/image_classifier/index_to_name.json mkdir model-store mv resnet-152-batch.mar model-store/ -torchserve --start --model-store model-store +torchserve --start --model-store model-store --disable-token-auth --enable-model-api curl -X POST "localhost:8081/models?model_name=resnet152&url=resnet-152-batch.mar&batch_size=4&max_batch_delay=5000&initial_workers=3&synchronous=true" ``` @@ -49,7 +49,7 @@ curl http://127.0.0.1:8080/predictions/resnet152 -T examples/image_classifier/re example_input = torch.rand(1, 3, 224, 224) traced_script_module = torch.jit.trace(model, example_input) traced_script_module.save("resnet-152-batch.pt") - ``` + ``` * For batch inference you need to set the batch size while registering the model. This can be done either through the management API or if using Torchserve 0.4.1 and above, it can be set through config.properties as well. Here is how to register Resnet152-batch torchscript with batch size setting with management API and through config.properties. You can read more on batch inference in Torchserve [here](https://github.com/pytorch/serve/tree/master/docs/batch_inference_with_ts.md). @@ -60,7 +60,7 @@ curl http://127.0.0.1:8080/predictions/resnet152 -T examples/image_classifier/re torch-model-archiver --model-name resnet-152-batch --version 1.0 --serialized-file resnet-152-batch.pt --extra-files examples/image_classifier/index_to_name.json --handler image_classifier mkdir model_store mv resnet-152-batch.mar model_store/ - torchserve --start --model-store model_store --models resnet_152=resnet-152-batch.mar + torchserve --start --model-store model_store --models resnet_152=resnet-152-batch.mar --disable-token-auth --enable-model-api curl -X POST "localhost:8081/models?model_name=resnet152&url=resnet-152-batch.mar&batch_size=4&max_batch_delay=5000&initial_workers=3&synchronous=true" ``` @@ -80,9 +80,9 @@ curl http://127.0.0.1:8080/predictions/resnet152 -T examples/image_classifier/re }\ }\ } - ``` + ``` ```bash - torchserve --start --model-store model_store --ts-config config.properties + torchserve --start --model-store model_store --ts-config config.properties --disable-token-auth --enable-model-api ``` * To test batch inference execute the following commands within the specified max_batch_delay time : diff --git a/examples/image_classifier/resnet_18/README.md b/examples/image_classifier/resnet_18/README.md index 60e9b6bed4..9799f072fb 100644 --- a/examples/image_classifier/resnet_18/README.md +++ b/examples/image_classifier/resnet_18/README.md @@ -7,7 +7,7 @@ wget https://download.pytorch.org/models/resnet18-f37072fd.pth torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json mkdir model_store mv resnet-18.mar model_store/ -torchserve --start --model-store model_store --models resnet-18=resnet-18.mar +torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/resnet-18 -T ./examples/image_classifier/kitten.jpg ``` @@ -37,7 +37,7 @@ wget https://download.pytorch.org/models/resnet18-f37072fd.pth torch-model-archiver --model-name resnet-18 --version 1.0 --model-file model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ../index_to_name.json --config-file model-config.yaml mkdir model_store mv resnet-18.mar model_store/ -torchserve --start --model-store model_store --models resnet-18=resnet-18.mar +torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/resnet-18 -T ../kitten.jpg ``` @@ -84,7 +84,7 @@ produces the output torch-model-archiver --model-name resnet-18 --version 1.0 --serialized-file resnet-18.pt --extra-files ./serve/examples/image_classifier/index_to_name.json --handler image_classifier mkdir model_store mv resnet-18.mar model_store/ - torchserve --start --model-store model_store --models resnet-18=resnet-18.mar + torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/resnet-18 -T ./serve/examples/image_classifier/kitten.jpg ``` diff --git a/examples/image_classifier/resnet_18/ReactJSExample/README.md b/examples/image_classifier/resnet_18/ReactJSExample/README.md index 865ab9694c..7e466c8a89 100644 --- a/examples/image_classifier/resnet_18/ReactJSExample/README.md +++ b/examples/image_classifier/resnet_18/ReactJSExample/README.md @@ -19,7 +19,7 @@ wget https://download.pytorch.org/models/resnet18-f37072fd.pth torch-model-archiver --model-name resnet-18 --version 1.0 --model-file ./examples/image_classifier/resnet_18/model.py --serialized-file resnet18-f37072fd.pth --handler image_classifier --extra-files ./examples/image_classifier/index_to_name.json mkdir model_store mv resnet-18.mar model_store/ -torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --ts-config config.properties +torchserve --start --model-store model_store --models resnet-18=resnet-18.mar --ts-config config.properties --disable-token-auth --enable-model-api ``` diff --git a/examples/image_classifier/squeezenet/README.md b/examples/image_classifier/squeezenet/README.md index ec2292ce76..f0253b5ef8 100644 --- a/examples/image_classifier/squeezenet/README.md +++ b/examples/image_classifier/squeezenet/README.md @@ -7,7 +7,7 @@ wget https://download.pytorch.org/models/squeezenet1_1-b8a52dc0.pth torch-model-archiver --model-name squeezenet1_1 --version 1.0 --model-file examples/image_classifier/squeezenet/model.py --serialized-file squeezenet1_1-b8a52dc0.pth --handler image_classifier --extra-files examples/image_classifier/index_to_name.json mkdir model_store mv squeezenet1_1.mar model_store/ -torchserve --start --model-store model_store --models squeezenet1_1=squeezenet1_1.mar +torchserve --start --model-store model_store --models squeezenet1_1=squeezenet1_1.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/squeezenet1_1 -T examples/image_classifier/kitten.jpg ``` @@ -43,6 +43,6 @@ curl http://127.0.0.1:8080/predictions/squeezenet1_1 -T examples/image_classifie torch-model-archiver --model-name squeezenet1_1 --version 1.0 --serialized-file squeezenet1_1.pt --extra-files examples/image_classifier/index_to_name.json --handler image_classifier mkdir model_store mv squeezenet1_1.mar model_store/ -torchserve --start --model-store model_store --models squeezenet1_1=squeezenet1_1.mar +torchserve --start --model-store model_store --models squeezenet1_1=squeezenet1_1.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/squeezenet1_1 -T examples/image_classifier/kitten.jpg ``` diff --git a/examples/image_classifier/vgg_16/README.md b/examples/image_classifier/vgg_16/README.md index fa52c4ced0..aba5d1d37e 100644 --- a/examples/image_classifier/vgg_16/README.md +++ b/examples/image_classifier/vgg_16/README.md @@ -12,7 +12,7 @@ wget https://download.pytorch.org/models/vgg16-397923af.pth torch-model-archiver --model-name vgg16 --version 1.0 --model-file ./examples/image_classifier/vgg_16/model.py --serialized-file vgg16-397923af.pth --handler ./examples/image_classifier/vgg_16/vgg_handler.py --extra-files ./examples/image_classifier/index_to_name.json --config-file ./examples/image_classifier/vgg_16/model-config.yaml -f mkdir model_store mv vgg16.mar model_store/vgg16_compiled.mar -torchserve --start --model-store model_store --models vgg16=vgg16_compiled.mar +torchserve --start --model-store model_store --models vgg16=vgg16_compiled.mar --disable-token-auth --enable-model-api ``` Now in another terminal, run @@ -42,7 +42,7 @@ wget https://download.pytorch.org/models/vgg16-397923af.pth torch-model-archiver --model-name vgg16 --version 1.0 --model-file ./examples/image_classifier/vgg_16/model.py --serialized-file vgg16-397923af.pth --handler ./examples/image_classifier/vgg_16/vgg_handler.py --extra-files ./examples/image_classifier/index_to_name.json mkdir model_store mv vgg16.mar model_store/ -torchserve --start --model-store model_store --models vgg16=vgg16.mar +torchserve --start --model-store model_store --models vgg16=vgg16.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/vgg16 -T ./examples/image_classifier/kitten.jpg ``` @@ -70,14 +70,14 @@ curl http://127.0.0.1:8080/predictions/vgg16 -T ./examples/image_classifier/kitt example_input = torch.rand(1, 3, 224, 224) traced_script_module = torch.jit.trace(model, example_input) traced_script_module.save("vgg16.pt") - ``` - + ``` + * Use following commands to register vgg16 torchscript model on TorchServe and run image prediction ```bash torch-model-archiver --model-name vgg16 --version 1.0 --serialized-file vgg16.pt --extra-files ./examples/image_classifier/index_to_name.json --handler ./examples/image_classifier/vgg_16/vgg_handler.py mkdir model_store mv vgg16.mar model_store/ - torchserve --start --model-store model_store --models vgg16=vgg16.mar + torchserve --start --model-store model_store --models vgg16=vgg16.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/vgg16 -T ./serve/examples/image_classifier/kitten.jpg ``` diff --git a/examples/image_segmenter/deeplabv3/README.md b/examples/image_segmenter/deeplabv3/README.md index 88ac1c1ce3..4e9ff8db38 100644 --- a/examples/image_segmenter/deeplabv3/README.md +++ b/examples/image_segmenter/deeplabv3/README.md @@ -14,7 +14,7 @@ wget https://download.pytorch.org/models/deeplabv3_resnet101_coco-586e9e4e.pth torch-model-archiver --model-name deeplabv3_resnet_101 --version 1.0 --model-file examples/image_segmenter/deeplabv3/model.py --serialized-file deeplabv3_resnet101_coco-586e9e4e.pth --handler image_segmenter --extra-files examples/image_segmenter/deeplabv3/deeplabv3.py,examples/image_segmenter/deeplabv3/intermediate_layer_getter.py,examples/image_segmenter/deeplabv3/fcn.py mkdir model_store mv deeplabv3_resnet_101.mar model_store/ - torchserve --start --model-store model_store --models deeplabv3=deeplabv3_resnet_101.mar + torchserve --start --model-store model_store --models deeplabv3=deeplabv3_resnet_101.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/deeplabv3 -T examples/image_segmenter/persons.jpg ``` * Output @@ -22,4 +22,4 @@ An array of shape [Batch, Height, Width, 2] where the final dimensions are [clas ```json [[[0.0, 0.9988763332366943], [0.0, 0.9988763332366943], [0.0, 0.9988763332366943], [0.0, 0.9988763332366943], [0.0, 0.9988666772842407], [0.0, 0.9988440275192261], [0.0, 0.9988170862197876], [0.0, 0.9987859725952148] ... ]] -``` \ No newline at end of file +``` diff --git a/examples/image_segmenter/fcn/README.md b/examples/image_segmenter/fcn/README.md index ddf8250a02..332483339f 100644 --- a/examples/image_segmenter/fcn/README.md +++ b/examples/image_segmenter/fcn/README.md @@ -14,7 +14,7 @@ wget https://download.pytorch.org/models/fcn_resnet101_coco-7ecb50ca.pth torch-model-archiver --model-name fcn_resnet_101 --version 1.0 --model-file examples/image_segmenter/fcn/model.py --serialized-file fcn_resnet101_coco-7ecb50ca.pth --handler image_segmenter --extra-files examples/image_segmenter/fcn/fcn.py,examples/image_segmenter/fcn/intermediate_layer_getter.py mkdir model_store mv fcn_resnet_101.mar model_store/ - torchserve --start --model-store model_store --models fcn=fcn_resnet_101.mar + torchserve --start --model-store model_store --models fcn=fcn_resnet_101.mar --disable-token-auth --enable-model-api curl http://127.0.0.1:8080/predictions/fcn -T examples/image_segmenter/persons.jpg ``` * Output @@ -22,4 +22,4 @@ An array of shape [Batch, Height, Width, 2] where the final dimensions are [clas ```json [[[0.0, 0.9993857145309448], [0.0, 0.9993857145309448], [0.0, 0.9993857145309448], [0.0, 0.9993857145309448], [0.0, 0.9993864297866821], [0.0, 0.999385416507721], [0.0, 0.9993811845779419], [0.0, 0.9993740320205688] ... ]] -``` \ No newline at end of file +``` diff --git a/examples/intel_extension_for_pytorch/README.md b/examples/intel_extension_for_pytorch/README.md index 46310e1cb3..5ff07b8563 100644 --- a/examples/intel_extension_for_pytorch/README.md +++ b/examples/intel_extension_for_pytorch/README.md @@ -55,7 +55,7 @@ Below is an example of passing multiple args to `cpu_launcher_args`. ``` ipex_enable=true cpu_launcher_enable=true -cpu_launcher_args=--use_logical_core --disable_numactl +cpu_launcher_args=--use_logical_core --disable_numactl ``` Below are some useful `cpu_launcher_args` to note. Italic values are default if applicable. @@ -68,8 +68,8 @@ Below are some useful `cpu_launcher_args` to note. Italic values are default if Refer to [Launch Script Usage Guide](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/launch_script.md) for a full list of tunable configuration of launcher. And refer to [Performance Tuning Guide](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/performance_tuning/tuning_guide.md) for more details. -### Launcher Core Pinning to Boost Performance of TorchServe Multi Worker Inference -When running [multi-worker inference](https://pytorch.org/serve/management_api.html#scale-workers) with Torchserve, launcher pin cores to workers to boost performance. Internally, launcher equally divides the number of cores by the number of workers such that each worker is pinned to assigned cores. Doing so avoids core overlap among workers which can significantly boost performance for TorchServe multi-worker inference. For example, assume running 4 workers on a machine with Intel(R) Xeon(R) Platinum 8180 CPU, 2 sockets, 28 cores per socket, 2 threads per core. Launcher will bind worker 0 to cores 0-13, worker 1 to cores 14-27, worker 2 to cores 28-41, and worker 3 to cores 42-55. +### Launcher Core Pinning to Boost Performance of TorchServe Multi Worker Inference +When running [multi-worker inference](https://pytorch.org/serve/management_api.html#scale-workers) with Torchserve, launcher pin cores to workers to boost performance. Internally, launcher equally divides the number of cores by the number of workers such that each worker is pinned to assigned cores. Doing so avoids core overlap among workers which can significantly boost performance for TorchServe multi-worker inference. For example, assume running 4 workers on a machine with Intel(R) Xeon(R) Platinum 8180 CPU, 2 sockets, 28 cores per socket, 2 threads per core. Launcher will bind worker 0 to cores 0-13, worker 1 to cores 14-27, worker 2 to cores 28-41, and worker 3 to cores 42-55. CPU usage is shown below. 4 main worker threads were launched, each launching 14 threads affinitized to the assigned physical cores. ![26](https://user-images.githubusercontent.com/93151422/170373651-fd8a0363-febf-4528-bbae-e1ddef119358.gif) @@ -77,9 +77,9 @@ CPU usage is shown below. 4 main worker threads were launched, each launching 14 #### Scaling workers -Additionally when dynamically [scaling the number of workers](https://pytorch.org/serve/management_api.html#scale-workers), cores that were pinned to killed workers by the launcher could be left unutilized. To address this problem, launcher internally restarts the workers to re-distribute cores that were pinned to killed workers to the remaining, alive workers. This is taken care internally, so users do not have to worry about this. +Additionally when dynamically [scaling the number of workers](https://pytorch.org/serve/management_api.html#scale-workers), cores that were pinned to killed workers by the launcher could be left unutilized. To address this problem, launcher internally restarts the workers to re-distribute cores that were pinned to killed workers to the remaining, alive workers. This is taken care internally, so users do not have to worry about this. -Continuing with the above example with 4 workers, assume killing workers 2 and 3. If cores were not re-distributed after the scale down, cores 28-55 would be left unutilized. Instead, launcher re-distributes cores 28-55 to workers 0 and 1 such that now worker 0 binds to cores 0-27 and worker 1 binds to cores 28-55.2 +Continuing with the above example with 4 workers, assume killing workers 2 and 3. If cores were not re-distributed after the scale down, cores 28-55 would be left unutilized. Instead, launcher re-distributes cores 28-55 to workers 0 and 1 such that now worker 0 binds to cores 0-27 and worker 1 binds to cores 28-55.2 CPU usage is shown below. 4 main worker threads were initially launched. Then after scaling down the number of workers from 4 to 2, 2 main worker threads were launched, each launching 28 threads affinitized to the assigned physical cores. ![worker_scaling](https://user-images.githubusercontent.com/93151422/170374697-7497c2d5-4c17-421b-9993-1434d1f722f6.gif) @@ -88,7 +88,7 @@ CPU usage is shown below. 4 main worker threads were initially launched. Then af Again, all it needs to use TorchServe with launcher core pinning for multiple workers as well as scaling workers is to set its configuration in `config.properties`. -Add the following lines in `config.properties` to use launcher with its default configuration. +Add the following lines in `config.properties` to use launcher with its default configuration. ``` cpu_launcher_enable=true ``` @@ -99,13 +99,13 @@ TorchServe can also leverage Intel GPU for acceleration, providing additional pe ### Installation and Setup for Intel GPU Support -**Install Intel oneAPI Base Kit:** +**Install Intel oneAPI Base Kit:** Follow the installation instructions for your operating system from the [Intel oneAPI Base kit Installation](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.htm). -**Install the ipex GPU package to enable TorchServe to utilize Intel GPU for acceleration:** +**Install the ipex GPU package to enable TorchServe to utilize Intel GPU for acceleration:** Follow the installation instructions for your operating system from the [ Intel® Extension for PyTorch* XPU/GPU Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu). -**Activate the Intel oneAPI Base Kit:** +**Activate the Intel oneAPI Base Kit:** Activate the Intel oneAPI Base Kit using the following command: ```bash source /path/to/oneapi/setvars.sh @@ -114,7 +114,7 @@ Activate the Intel oneAPI Base Kit using the following command: **Install xpu-smi:** Install xpu-smi to let torchserve detect the number of Intel GPU devices present. xpu-smi provides information about the Intel GPU, including temperature, utilization, and other metrics.[xpu-smi Installation Guide](https://dgpu-docs.intel.com/driver/installation.html#ubuntu-package-repository) -**Enable Intel GPU Support in TorchServe:** +**Enable Intel GPU Support in TorchServe:** To enable TorchServe to use Intel GPUs, set the following configuration in `config.properties`: ``` ipex_enable=true @@ -183,8 +183,8 @@ Note: The optimal configuration will vary depending on the hardware used. ## Creating and Exporting INT8 model for Intel® Extension for PyTorch* Intel® Extension for PyTorch* supports both eager and torchscript mode. In this section, we show how to deploy INT8 model for Intel® Extension for PyTorch*. Refer to [here](https://github.com/intel/intel-extension-for-pytorch/blob/master/docs/tutorials/features/int8_overview.md) for more details on Intel® Extension for PyTorch* optimizations for quantization. -### 1. Creating a serialized file -First create `.pt` serialized file using Intel® Extension for PyTorch* INT8 inference. Here we show two examples with BERT and ResNet50. +### 1. Creating a serialized file +First create `.pt` serialized file using Intel® Extension for PyTorch* INT8 inference. Here we show two examples with BERT and ResNet50. #### BERT @@ -245,11 +245,11 @@ qconfig = ipex.quantization.default_static_qconfig # prepare and calibrate model = prepare(model, qconfig, example_inputs=dummy_tensor, inplace=False) - + n_iter = 100 for i in range(n_iter): model(dummy_tensor) - + # convert and deploy model = convert(model) @@ -260,23 +260,23 @@ with torch.no_grad(): torch.jit.save(model, 'rn50_int8_jit.pt') ``` -### 2. Creating a Model Archive +### 2. Creating a Model Archive Once the serialized file ( `.pt`) is created, it can be used with `torch-model-archiver` as usual. -Use the following command to package `rn50_int8_jit.pt` into `rn50_ipex_int8.mar`. +Use the following command to package `rn50_int8_jit.pt` into `rn50_ipex_int8.mar`. ``` torch-model-archiver --model-name rn50_ipex_int8 --version 1.0 --serialized-file rn50_int8_jit.pt --handler image_classifier ``` -Similarly, use the following command in the [Huggingface_Transformers directory](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) to package `bert_int8_jit.pt` into `bert_ipex_int8.mar`. +Similarly, use the following command in the [Huggingface_Transformers directory](https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers) to package `bert_int8_jit.pt` into `bert_ipex_int8.mar`. ``` torch-model-archiver --model-name bert_ipex_int8 --version 1.0 --serialized-file bert_int8_jit.pt --handler ./Transformer_handler_generalized.py --extra-files "./setup_config.json,./Seq_classification_artifacts/index_to_name.json" ``` -### 3. Start TorchServe to serve the model -Make sure to set `ipex_enable=true` in `config.properties`. Use the following command to start TorchServe with Intel® Extension for PyTorch*. +### 3. Start TorchServe to serve the model +Make sure to set `ipex_enable=true` in `config.properties`. Use the following command to start TorchServe with Intel® Extension for PyTorch*. ``` -torchserve --start --ncs --model-store model_store --ts-config config.properties +torchserve --start --ncs --model-store model_store --ts-config config.properties --disable-token-auth --enable-model-api ``` ### 4. Registering and Deploying model @@ -329,11 +329,11 @@ $ cat logs/model_log.log ### Benchmarking with Launcher Core Pinning As described previously in [TorchServe with Launcher](#torchserve-with-launcher), launcher core pinning boosts performance of multi-worker inference. We'll demonstrate launcher core pinning with TorchServe benchmark, but keep in mind that launcher core pinning is a generic feature applicable to any TorchServe multi-worker inference use case. -For example, assume running 4 workers +For example, assume running 4 workers ``` python benchmark-ab.py --workers 4 ``` -on a machine with Intel(R) Xeon(R) Platinum 8180 CPU, 2 sockets, 28 cores per socket, 2 threads per core. Launcher will bind worker 0 to cores 0-13, worker 1 to cores 14-27, worker 2 to cores 28-41, and worker 3 to cores 42-55. +on a machine with Intel(R) Xeon(R) Platinum 8180 CPU, 2 sockets, 28 cores per socket, 2 threads per core. Launcher will bind worker 0 to cores 0-13, worker 1 to cores 14-27, worker 2 to cores 28-41, and worker 3 to cores 42-55. All it needs to use TorchServe with launcher's core pinning is to enable launcher in `config.properties`. @@ -345,7 +345,7 @@ cpu_launcher_enable=true CPU usage is shown as below: ![launcher_core_pinning](https://user-images.githubusercontent.com/93151422/159063975-e7e8d4b0-e083-4733-bdb6-4d92bdc10556.gif) -4 main worker threads were launched, then each launched a num_physical_cores/num_workers number (14) of threads affinitized to the assigned physical cores. +4 main worker threads were launched, then each launched a num_physical_cores/num_workers number (14) of threads affinitized to the assigned physical cores.

 $ cat logs/model_log.log
@@ -387,7 +387,7 @@ $ cat logs/model_log.log
 ![pdt_perf](https://user-images.githubusercontent.com/93151422/159067306-dfd604e3-8c66-4365-91ae-c99f68d972d5.png)
 
 
-Above shows performance improvement of Torchserve with Intel® Extension for PyTorch* and launcher on ResNet50 and BERT-base-uncased. Torchserve official [apache-bench benchmark](https://github.com/pytorch/serve/tree/master/benchmarks#benchmarking-with-apache-bench) on Amazon EC2 m6i.24xlarge was used to collect the results2. Add the following lines in ```config.properties``` to reproduce the results. Notice that launcher is configured such that a single instance uses all physical cores on a single socket to avoid cross socket communication and core overlap. 
+Above shows performance improvement of Torchserve with Intel® Extension for PyTorch* and launcher on ResNet50 and BERT-base-uncased. Torchserve official [apache-bench benchmark](https://github.com/pytorch/serve/tree/master/benchmarks#benchmarking-with-apache-bench) on Amazon EC2 m6i.24xlarge was used to collect the results2. Add the following lines in ```config.properties``` to reproduce the results. Notice that launcher is configured such that a single instance uses all physical cores on a single socket to avoid cross socket communication and core overlap.
 
 ```
 ipex_enable=true
diff --git a/examples/large_models/Huggingface_accelerate/Readme.md b/examples/large_models/Huggingface_accelerate/Readme.md
index 273c337a33..f2afd96cc1 100644
--- a/examples/large_models/Huggingface_accelerate/Readme.md
+++ b/examples/large_models/Huggingface_accelerate/Readme.md
@@ -58,7 +58,7 @@ mv bloom.mar model_store
 Update config.properties and start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config config.properties
+torchserve --start --ncs --ts-config config.properties --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
@@ -66,8 +66,3 @@ torchserve --start --ncs --ts-config config.properties
 ```bash
 curl -v "http://localhost:8080/predictions/bloom" -T sample_text.txt
 ```
-
-
-
-
-
diff --git a/examples/large_models/Huggingface_accelerate/llama/Readme.md b/examples/large_models/Huggingface_accelerate/llama/Readme.md
index 41941b1175..3b45640c82 100644
--- a/examples/large_models/Huggingface_accelerate/llama/Readme.md
+++ b/examples/large_models/Huggingface_accelerate/llama/Readme.md
@@ -45,7 +45,7 @@ mv model model_store/llama3-70b-instruct
 Update config.properties and start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config config.properties --model-store model_store --models llama3-70b-instruct
+torchserve --start --ncs --ts-config config.properties --model-store model_store --models llama3-70b-instruct --disable-token-auth  --enable-model-api
 ```
 
 ### Step 4: Run inference
diff --git a/examples/large_models/deepspeed/Readme.md b/examples/large_models/deepspeed/Readme.md
index 045e924a14..8906e36a43 100644
--- a/examples/large_models/deepspeed/Readme.md
+++ b/examples/large_models/deepspeed/Readme.md
@@ -53,7 +53,7 @@ mv opt.tar.gz model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --model-store model_store --models opt.tar.gz
+torchserve --start --ncs --model-store model_store --models opt.tar.gz --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/deepspeed_mii/Readme.md b/examples/large_models/deepspeed_mii/Readme.md
index 694ac6895d..443f849fdb 100644
--- a/examples/large_models/deepspeed_mii/Readme.md
+++ b/examples/large_models/deepspeed_mii/Readme.md
@@ -62,7 +62,7 @@ Increase `max_response_size` for image response.
 Refer: https://github.com/pytorch/serve/blob/master/docs/configuration.md#other-properties
 
 ```bash
-torchserve --start --ts-config config.properties
+torchserve --start --ts-config config.properties --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/diffusion_fast/README.md b/examples/large_models/diffusion_fast/README.md
index 335ca63ce5..222cf9a6d1 100644
--- a/examples/large_models/diffusion_fast/README.md
+++ b/examples/large_models/diffusion_fast/README.md
@@ -52,7 +52,7 @@ mv diffusion_fast model_store
 ### Step 3: Start torchserve
 
 ```
-torchserve --start --ts-config config.properties --model-store model_store --models diffusion_fast
+torchserve --start --ts-config config.properties --model-store model_store --models diffusion_fast --disable-token-auth  --enable-model-api
 ```
 
 ### Step 4: Run inference
diff --git a/examples/large_models/gpt_fast/README.md b/examples/large_models/gpt_fast/README.md
index 0a15179d85..4a30e27040 100644
--- a/examples/large_models/gpt_fast/README.md
+++ b/examples/large_models/gpt_fast/README.md
@@ -115,7 +115,7 @@ mv gpt_fast model_store
 ### Step 4: Start torchserve
 
 ```
-torchserve --start --ncs --model-store model_store --models gpt_fast
+torchserve --start --ncs --model-store model_store --models gpt_fast --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/gpt_fast_mixtral_moe/README.md b/examples/large_models/gpt_fast_mixtral_moe/README.md
index 27201334c6..155f862800 100644
--- a/examples/large_models/gpt_fast_mixtral_moe/README.md
+++ b/examples/large_models/gpt_fast_mixtral_moe/README.md
@@ -85,7 +85,7 @@ mv gpt_fast_mixtral_moe model_store
 ### Step 4: Start torchserve
 
 ```
-torchserve --start --ncs --model-store model_store --models gpt_fast_mixtral_moe
+torchserve --start --ncs --model-store model_store --models gpt_fast_mixtral_moe --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/segment_anything_fast/README.md b/examples/large_models/segment_anything_fast/README.md
index 2a4b75cb99..d7df0c2307 100644
--- a/examples/large_models/segment_anything_fast/README.md
+++ b/examples/large_models/segment_anything_fast/README.md
@@ -57,7 +57,7 @@ mv sam_vit_h_4b8939.pth model_store/sam-fast/
 ### Step 3: Start torchserve
 
 ```
-torchserve --start --ncs --model-store model_store --models sam-fast
+torchserve --start --ncs --model-store model_store --models sam-fast --disable-token-auth  --enable-model-api
 ```
 
 ### Step 4: Run inference
diff --git a/examples/large_models/vllm/llama3/Readme.md b/examples/large_models/vllm/llama3/Readme.md
index a182b1cdff..b7952a0493 100644
--- a/examples/large_models/vllm/llama3/Readme.md
+++ b/examples/large_models/vllm/llama3/Readme.md
@@ -35,7 +35,7 @@ mv llama3-8b model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama3-8b
+torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama3-8b --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/vllm/lora/Readme.md b/examples/large_models/vllm/lora/Readme.md
index bea8cd7c8b..8d447966c9 100644
--- a/examples/large_models/vllm/lora/Readme.md
+++ b/examples/large_models/vllm/lora/Readme.md
@@ -39,7 +39,7 @@ mv llama-7b-lora model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama-7b-lora
+torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama-7b-lora --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/vllm/mistral/Readme.md b/examples/large_models/vllm/mistral/Readme.md
index 78e8f91d71..a79f425c38 100644
--- a/examples/large_models/vllm/mistral/Readme.md
+++ b/examples/large_models/vllm/mistral/Readme.md
@@ -35,7 +35,7 @@ mv mistral model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models mistral
+torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models mistral --disable-token-auth  --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/micro_batching/README.md b/examples/micro_batching/README.md
index a9ee1ad3c6..7ead8066c9 100644
--- a/examples/micro_batching/README.md
+++ b/examples/micro_batching/README.md
@@ -61,7 +61,7 @@ Third, we move the MAR file to our model_store and start TorchServe.
 ```bash
 $ mkdir model_store
 $ mv resnet-18_mb.mar model_store/
-$ torchserve --start --ncs --model-store model_store --models resnet-18_mb.mar
+$ torchserve --start --ncs --model-store model_store --models resnet-18_mb.mar --disable-token-auth  --enable-model-api
 ```
 
 Finally, we test the registered model with a request:
diff --git a/examples/nmt_transformer/README.md b/examples/nmt_transformer/README.md
index b6d33c8efd..6cd816c138 100644
--- a/examples/nmt_transformer/README.md
+++ b/examples/nmt_transformer/README.md
@@ -20,7 +20,7 @@ _NOTE: This example currently works with Py36 only due to fairseq dependency on
     ./create_mar.sh en2fr_model
     ```
     The above command will create a "model_store" directory in the current working directory and generate TransformerEn2Fr.mar file.
-	
+
 * To generate the model archive (.mar) file for English-to-German translation model using following command
 
     ```bash
@@ -32,10 +32,10 @@ _NOTE: This example currently works with Py36 only due to fairseq dependency on
 * Start the TorchServe using the model archive (.mar) file created in above step
 
     ```bash
-    torchserve --start --model-store model_store --ts-config config.properties
+    torchserve --start --model-store model_store --ts-config config.properties --disable-token-auth  --enable-model-api
     ```
 
-* Use [Management API](https://github.com/pytorch/serve/blob/master/docs/management_api.md#management-api) to register the model with one initial worker   
+* Use [Management API](https://github.com/pytorch/serve/blob/master/docs/management_api.md#management-api) to register the model with one initial worker
 	For English-to-French model
     ```bash
     curl -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=TransformerEn2Fr.mar"
@@ -51,7 +51,7 @@ _NOTE: This example currently works with Py36 only due to fairseq dependency on
     }
 	```
 
-* To get the inference use the following curl command  
+* To get the inference use the following curl command
 	For English-to-French model
     ```bash
     curl http://127.0.0.1:8080/predictions/TransformerEn2Fr -T model_input/sample.txt | json_pp
@@ -87,10 +87,10 @@ requests before this timer time's out, it sends what ever requests that were rec
 * Start the model server. In this example, we are starting the model server with config.properties file
 
     ```bash
-    torchserve --start --model-store model_store --ts-config config.properties
+    torchserve --start --model-store model_store --ts-config config.properties --disable-token-auth  --enable-model-api
     ```
 
-* Now let's launch English_to_French translation model, which we have built to handle batch inference. 
+* Now let's launch English_to_French translation model, which we have built to handle batch inference.
 In this example, we are going to launch 1 worker which handles a `batch size` of 4 with a `max_batch_delay` of 10s.
 
     ```bash
@@ -100,9 +100,9 @@ In this example, we are going to launch 1 worker which handles a `batch size` of
 * Run batch inference command to test the model.
 
     ```bash
-    curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample1.txt& 
-    curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample2.txt& 
-    curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample3.txt& 
+    curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample1.txt&
+    curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample2.txt&
+    curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample3.txt&
     curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2Fr -T ./model_input/sample4.txt&
     {
         "input" : "Hello World !!!\n",
@@ -127,10 +127,10 @@ In this example, we are going to launch 1 worker which handles a `batch size` of
 * Start the model server. In this example, we are starting the model server with config.properties file
 
     ```bash
-    torchserve --start --model-store model_store --ts-config config.properties
+    torchserve --start --model-store model_store --ts-config config.properties --disable-token-auth  --enable-model-api
     ```
 
-* Now let's launch English_to_French translation model, which we have built to handle batch inference. 
+* Now let's launch English_to_French translation model, which we have built to handle batch inference.
 In this example, we are going to launch 1 worker which handles a `batch size` of 4 with a `max_batch_delay` of 10s.
 
     ```bash
@@ -140,9 +140,9 @@ In this example, we are going to launch 1 worker which handles a `batch size` of
 * Run batch inference command to test the model.
 
     ```bash
-	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample1.txt& 
-	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample2.txt& 
-	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample3.txt& 
+	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample1.txt&
+	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample2.txt&
+	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample3.txt&
 	curl -X POST http://127.0.0.1:8080/predictions/TransformerEn2De -T ./model_input/sample4.txt&
     {
         "input" : "Hello World !!!\n",
diff --git a/examples/nvidia_dali/README.md b/examples/nvidia_dali/README.md
index b429f5dfab..793e7f09b0 100644
--- a/examples/nvidia_dali/README.md
+++ b/examples/nvidia_dali/README.md
@@ -40,7 +40,7 @@ mv resnet-18.mar model_store/
 ### Start the torchserve
 
 ```bash
-torchserve --start --model-store model_store --models resnet=resnet-18.mar
+torchserve --start --model-store model_store --models resnet=resnet-18.mar --disable-token-auth  --enable-model-api
 ```
 
 ### Run Inference
@@ -101,7 +101,7 @@ mv mnist.mar model_store/
 ### Start the torchserve
 
 ```bash
-torchserve --start --model-store model_store --models mnist=mnist.mar
+torchserve --start --model-store model_store --models mnist=mnist.mar --disable-token-auth  --enable-model-api
 ```
 
 ### Run Inference
diff --git a/examples/object_detector/fast-rcnn/README.md b/examples/object_detector/fast-rcnn/README.md
index 07c8a1521e..2bc8f60ab1 100644
--- a/examples/object_detector/fast-rcnn/README.md
+++ b/examples/object_detector/fast-rcnn/README.md
@@ -14,10 +14,10 @@ wget https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.p
     torch-model-archiver --model-name fastrcnn --version 1.0 --model-file examples/object_detector/fast-rcnn/model.py --serialized-file fasterrcnn_resnet50_fpn_coco-258fb6c6.pth --handler object_detector --extra-files examples/object_detector/index_to_name.json
     mkdir model_store
     mv fastrcnn.mar model_store/
-    torchserve --start --model-store model_store --models fastrcnn=fastrcnn.mar
+    torchserve --start --model-store model_store --models fastrcnn=fastrcnn.mar --disable-token-auth  --enable-model-api
     curl http://127.0.0.1:8080/predictions/fastrcnn -T examples/object_detector/persons.jpg
     ```
-* Note : The objects detected have scores greater than "0.5". This threshold value is set in object_detector handler. 
+* Note : The objects detected have scores greater than "0.5". This threshold value is set in object_detector handler.
 
 * Output
 
@@ -168,4 +168,4 @@ wget https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.p
     "score": 0.5939609408378601
   }
 ]
-```
\ No newline at end of file
+```
diff --git a/examples/object_detector/maskrcnn/README.md b/examples/object_detector/maskrcnn/README.md
index b4dd2443c1..62bb3a662d 100644
--- a/examples/object_detector/maskrcnn/README.md
+++ b/examples/object_detector/maskrcnn/README.md
@@ -14,7 +14,7 @@ wget https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
     torch-model-archiver --model-name maskrcnn --version 1.0 --model-file serve/examples/object_detector/maskrcnn/model.py --serialized-file maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth --handler object_detector --extra-files serve/examples/object_detector/index_to_name.json
     mkdir model_store
     mv maskrcnn.mar model_store/
-    torchserve --start --model-store model_store --models maskrcnn=maskrcnn.mar
+    torchserve --start --model-store model_store --models maskrcnn=maskrcnn.mar --disable-token-auth  --enable-model-api
     curl http://127.0.0.1:8080/predictions/maskrcnn -T serve/examples/object_detector/persons.jpg
     ```
 * Output
@@ -73,4 +73,4 @@ wget https://download.pytorch.org/models/maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
     "bench": "[(444.64893, 204.42014), (627.00635, 359.8998)]"
   }
 ]
-```
\ No newline at end of file
+```
diff --git a/examples/object_detector/yolo/yolov8/README.md b/examples/object_detector/yolo/yolov8/README.md
index dcfe975c9c..eed9e1d51b 100644
--- a/examples/object_detector/yolo/yolov8/README.md
+++ b/examples/object_detector/yolo/yolov8/README.md
@@ -29,7 +29,7 @@ mv yolov8n.mar model_store/.
 
 
 ```
-torchserve --start --model-store model_store --ncs
+torchserve --start --model-store model_store --ncs --disable-token-auth  --enable-model-api
 curl -X POST "localhost:8081/models?model_name=yolov8n&url=yolov8n.mar&initial_workers=4&batch_size=2"
 ```
 
diff --git a/examples/pt2/torch_compile/README.md b/examples/pt2/torch_compile/README.md
index 5f15bb0b76..2cc1e3da80 100644
--- a/examples/pt2/torch_compile/README.md
+++ b/examples/pt2/torch_compile/README.md
@@ -34,7 +34,7 @@ torch-model-archiver --model-name densenet161 --version 1.0 --model-file model.p
 
 #### Start TorchServe
 ```
-torchserve --start --ncs --model-store model_store --models densenet161.mar
+torchserve --start --ncs --model-store model_store --models densenet161.mar --disable-token-auth  --enable-model-api
 ```
 
 #### Run Inference
diff --git a/examples/pt2/torch_compile_hpu/README.md b/examples/pt2/torch_compile_hpu/README.md
index 063d4a8e7b..7aaf43f74a 100644
--- a/examples/pt2/torch_compile_hpu/README.md
+++ b/examples/pt2/torch_compile_hpu/README.md
@@ -74,7 +74,7 @@ PT_HPU_LAZY_MODE=0 torch-model-archiver --model-name resnet-50 --version 1.0 --m
 
 Start the TorchServe server using the following command:
 ```bash
-PT_HPU_LAZY_MODE=0 torchserve --start --ncs --disable-token --model-store model_store --models resnet-50.mar
+PT_HPU_LAZY_MODE=0 torchserve --start --ncs --disable-token --model-store model_store --models resnet-50.mar --disable-token-auth  --enable-model-api
 ```
 `--disable-token` - this is an option that disables token authorization. This option is used here only for example purposes. Please refer to the torchserve [documentation](https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md), which describes the process of serving the model using tokens.
 
diff --git a/examples/pt2/torch_compile_openvino/README.md b/examples/pt2/torch_compile_openvino/README.md
index 0ca87ba69e..9d70a052ae 100644
--- a/examples/pt2/torch_compile_openvino/README.md
+++ b/examples/pt2/torch_compile_openvino/README.md
@@ -71,7 +71,7 @@ torch-model-archiver --model-name resnet-50 --version 1.0 --model-file model.py
 
 Start the TorchServe server using the following command:
 ```bash
-torchserve --start --ncs --model-store model_store --models resnet-50.mar
+torchserve --start --ncs --model-store model_store --models resnet-50.mar --disable-token-auth  --enable-model-api
 ```
 
 ### 4. Run Inference
diff --git a/examples/pt2/torch_compile_openvino/stable_diffusion/README.md b/examples/pt2/torch_compile_openvino/stable_diffusion/README.md
index 6e810da739..29802e0d95 100644
--- a/examples/pt2/torch_compile_openvino/stable_diffusion/README.md
+++ b/examples/pt2/torch_compile_openvino/stable_diffusion/README.md
@@ -70,7 +70,7 @@ mv diffusion_fast model_store
 Start the TorchServe server using the following command:
 
 ```bash
-torchserve --start --ts-config config.properties --model-store model_store --models diffusion_fast
+torchserve --start --ts-config config.properties --model-store model_store --models diffusion_fast --disable-token-auth  --enable-model-api
 ```
 
 ### 4. Run inference
diff --git a/examples/pt2/torch_export_aot_compile/README.md b/examples/pt2/torch_export_aot_compile/README.md
index c14cc29b2e..82a26624b2 100644
--- a/examples/pt2/torch_export_aot_compile/README.md
+++ b/examples/pt2/torch_export_aot_compile/README.md
@@ -34,7 +34,7 @@ mv res18-pt2.mar model_store/.
 
 #### Start TorchServe
 ```
-torchserve --start --model-store model_store --models res18-pt2=res18-pt2.mar --ncs
+torchserve --start --model-store model_store --models res18-pt2=res18-pt2.mar --ncs --disable-token-auth  --enable-model-api
 ```
 
 #### Run Inference
diff --git a/examples/pt2/torch_inductor_caching/README.md b/examples/pt2/torch_inductor_caching/README.md
index 84319b3306..eb28ac7e2e 100644
--- a/examples/pt2/torch_inductor_caching/README.md
+++ b/examples/pt2/torch_inductor_caching/README.md
@@ -58,7 +58,7 @@ torch-model-archiver --model-name densenet161 --version 1.0 --model-file ../../i
 
 #### Start TorchServe
 ```
-torchserve --start --ncs --model-store model_store --models densenet161.mar
+torchserve --start --ncs --model-store model_store --models densenet161.mar --disable-token-auth  --enable-model-api
 ```
 
 #### Run Inference
@@ -147,7 +147,7 @@ torch-model-archiver --model-name densenet161 --version 1.0 --model-file ../../i
 
 #### Start TorchServe
 ```
-torchserve --start --ncs --model-store model_store --models densenet161.mar
+torchserve --start --ncs --model-store model_store --models densenet161.mar --disable-token-auth  --enable-model-api
 ```
 
 #### Run Inference
diff --git a/examples/speech2text_wav2vec2/README.md b/examples/speech2text_wav2vec2/README.md
index 2546dfeaea..75f7c73d29 100644
--- a/examples/speech2text_wav2vec2/README.md
+++ b/examples/speech2text_wav2vec2/README.md
@@ -19,7 +19,7 @@ Next, we need to download our wav2vec2 model and archive it for use by torchserv
 
 Now let's start the server and try it out with our example file!
 ```bash
-torchserve --start --model-store model_store --models Wav2Vec2=Wav2Vec2.mar --ncs
+torchserve --start --model-store model_store --models Wav2Vec2=Wav2Vec2.mar --ncs --disable-token-auth  --enable-model-api
 # Once the server is running, let's try it with:
 curl -X POST http://127.0.0.1:8080/predictions/Wav2Vec2 --data-binary '@./sample.wav' -H "Content-Type: audio/basic"
 # Which will happily return:
diff --git a/examples/text_classification/README.md b/examples/text_classification/README.md
index 47a0e489e7..c53a94bde5 100644
--- a/examples/text_classification/README.md
+++ b/examples/text_classification/README.md
@@ -19,21 +19,21 @@ The above command generated the model's state dict as model.pt and the vocab use
 # Serve the text classification model on TorchServe
 
  * Create a torch model archive using the torch-model-archiver utility to archive the above files.
- 
+
     ```bash
     torch-model-archiver --model-name my_text_classifier --version 1.0 --model-file model.py --serialized-file model.pt  --handler text_classifier --extra-files "index_to_name.json,source_vocab.pt"
     ```
-    
-    NOTE - `run_script.sh` has generated `source_vocab.pt` and it is a mandatory file for this handler. 
+
+    NOTE - `run_script.sh` has generated `source_vocab.pt` and it is a mandatory file for this handler.
            If you are planning to override or use custom source vocab. then name it as `source_vocab.pt` and provide it as `--extra-files` as per above example.
            Other option is to extend `TextHandler` and override `get_source_vocab_path` function in your custom handler. Refer [custom handler](../../docs/custom_service.md) for detail
-   
+
  * Register the model on TorchServe using the above model archive file and run digit recognition inference
-   
+
     ```bash
     mkdir model_store
     mv my_text_classifier.mar model_store/
-    torchserve --start --model-store model_store --models my_tc=my_text_classifier.mar
+    torchserve --start --model-store model_store --models my_tc=my_text_classifier.mar --disable-token-auth  --enable-model-api
     curl http://127.0.0.1:8080/predictions/my_tc -T examples/text_classification/sample_text.txt
     ```
 To make a captum explanations request on the Torchserve side, use the below command:
@@ -62,7 +62,7 @@ Captum/Explain doesn't support batching.
 
 1. The handlers should initialize.
 ```python
-self.lig = LayerIntegratedGradients(captum_sequence_forward, self.model.bert.embeddings) 
+self.lig = LayerIntegratedGradients(captum_sequence_forward, self.model.bert.embeddings)
 ```
 in the initialize function for the captum to work.
 
diff --git a/examples/text_classification_with_scriptable_tokenizer/README.md b/examples/text_classification_with_scriptable_tokenizer/README.md
index 43c9241362..091ac2ae80 100644
--- a/examples/text_classification_with_scriptable_tokenizer/README.md
+++ b/examples/text_classification_with_scriptable_tokenizer/README.md
@@ -57,7 +57,7 @@ python script_tokenizer_and_model.py model.pt model_jit.pt
     ```bash
     mkdir model_store
     mv scriptable_tokenizer.mar model_store/
-    torchserve --start --model-store model_store --models my_tc=scriptable_tokenizer.mar
+    torchserve --start --model-store model_store --models my_tc=scriptable_tokenizer.mar --disable-token-auth  --enable-model-api
     curl http://127.0.0.1:8080/predictions/my_tc -T sample_text.txt
     ```
  * Expected Output:
diff --git a/examples/text_to_speech_synthesizer/SpeechT5/README.md b/examples/text_to_speech_synthesizer/SpeechT5/README.md
index e2182faf7f..6536dbc3e3 100644
--- a/examples/text_to_speech_synthesizer/SpeechT5/README.md
+++ b/examples/text_to_speech_synthesizer/SpeechT5/README.md
@@ -38,7 +38,7 @@ mv model_artifacts/* model_store/SpeechT5-TTS/
 ## Start TorchServe
 
 ```
-torchserve --start --ncs --model-store model_store --models SpeechT5-TTS
+torchserve --start --ncs --model-store model_store --models SpeechT5-TTS --disable-token-auth  --enable-model-api
 ```
 
 ## Send Inference request
diff --git a/examples/text_to_speech_synthesizer/WaveGlow/README.md b/examples/text_to_speech_synthesizer/WaveGlow/README.md
index 6e8f6e4868..8816495681 100644
--- a/examples/text_to_speech_synthesizer/WaveGlow/README.md
+++ b/examples/text_to_speech_synthesizer/WaveGlow/README.md
@@ -30,7 +30,7 @@ pip install librosa --user
     ```bash
     mkdir model_store
     mv waveglow_synthesizer.mar model_store/
-    torchserve --start --model-store model_store --models waveglow_synthesizer.mar
+    torchserve --start --model-store model_store --models waveglow_synthesizer.mar --disable-token-auth  --enable-model-api
     ```
   * Run inference and download audio output using curl command :
     ```bash
diff --git a/examples/torch_tensorrt/torchcompile/README.md b/examples/torch_tensorrt/torchcompile/README.md
index 754dc43a63..4584ec6f4f 100644
--- a/examples/torch_tensorrt/torchcompile/README.md
+++ b/examples/torch_tensorrt/torchcompile/README.md
@@ -35,7 +35,7 @@ torch-model-archiver --model-name res50-trt --handler image_classifier --version
 
 #### Start TorchServe
 ```
-torchserve --start --model-store model_store --models res50-trt=res50-trt.mar --disable-token --ncs
+torchserve --start --model-store model_store --models res50-trt=res50-trt.mar --disable-token --ncs --disable-token-auth  --enable-model-api
 ```
 
 #### Run Inference
@@ -82,4 +82,3 @@ If we disable `torch.compile` and use PyTorch eager, we see the following
 
 We see that `torch.compile` with `tensorrt` backend reduces model inference from `5.56 ms` to `1.6 ms`.
 Please note that `torch.compile` is a JIT compiler and it takes a few iterations (1-3) to warmup before you see the speedup
-
diff --git a/examples/torch_tensorrt/torchscript/README.md b/examples/torch_tensorrt/torchscript/README.md
index 0117103c54..d73d023fca 100644
--- a/examples/torch_tensorrt/torchscript/README.md
+++ b/examples/torch_tensorrt/torchscript/README.md
@@ -30,7 +30,7 @@ mv res50-trt-fp16.mar model_store/.
 
 #### Start TorchServe
 ```
-torchserve --start --model-store model_store --models res50-trt-fp16=res50-trt-fp16.mar --ncs
+torchserve --start --model-store model_store --models res50-trt-fp16=res50-trt-fp16.mar --ncs --disable-token-auth  --enable-model-api
 ```
 
 #### Run Inference
diff --git a/examples/torchrec_dlrm/README.md b/examples/torchrec_dlrm/README.md
index 8e16fdab5b..a90717224b 100644
--- a/examples/torchrec_dlrm/README.md
+++ b/examples/torchrec_dlrm/README.md
@@ -34,7 +34,7 @@ mv dlrm.mar model_store
 Then we can start TorchServe with:
 
 ```
-torchserve --start --model-store model_store --models dlrm=dlrm.mar
+torchserve --start --model-store model_store --models dlrm=dlrm.mar --disable-token-auth  --enable-model-api
 ```
 
 To query the model we can then run:
@@ -61,7 +61,7 @@ The output should look like this:
 We start TorchServe with:
 
 ```
-torchserve --start --model-store model_store
+torchserve --start --model-store model_store --disable-token-auth  --enable-model-api
 curl -X POST "localhost:8081/models?model_name=dlrm&url=dlrm.mar&batch_size=4&max_batch_delay=5000&initial_workers=1&synchronous=true"
 ```
 
diff --git a/examples/xgboost_classfication/README.md b/examples/xgboost_classfication/README.md
index bd500527cf..153645611f 100644
--- a/examples/xgboost_classfication/README.md
+++ b/examples/xgboost_classfication/README.md
@@ -29,7 +29,7 @@ torch-model-archiver --model-name xgb_iris --version 1.0 --serialized-file iris_
 ## Start TorchServe
 
 ```
-torchserve --start --ncs --model-store model_store --models xgb_iris=xgb_iris.mar
+torchserve --start --ncs --model-store model_store --models xgb_iris=xgb_iris.mar --disable-token-auth  --enable-model-api
 ```
 
 ## Inference request

From c8683f14f59dd8121e3982120e26be01d73dcf4a Mon Sep 17 00:00:00 2001
From: udaij12 
Date: Tue, 2 Jul 2024 20:33:44 -0700
Subject: [PATCH 2/4] examples added

---
 examples/LLM/llama/chat_app/torchserve_server_app.py          | 4 +++-
 examples/cloudformation/ec2-asg.yaml                          | 2 +-
 examples/cloudformation/ec2.yaml                              | 2 +-
 examples/dcgan_fashiongen/Readme.md                           | 2 +-
 examples/image_classifier/mnist/README.md                     | 2 +-
 .../deepspeed_mii/LLM/mii-deepspeed-fastgen.ipynb             | 2 +-
 examples/large_models/vllm/llama3/Readme.md                   | 2 +-
 examples/large_models/vllm/lora/Readme.md                     | 2 +-
 examples/large_models/vllm/mistral/Readme.md                  | 2 +-
 examples/micro_batching/README.md                             | 2 +-
 examples/nvidia_dali/README.md                                | 2 +-
 examples/torchrec_dlrm/README.md                              | 2 +-
 12 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/examples/LLM/llama/chat_app/torchserve_server_app.py b/examples/LLM/llama/chat_app/torchserve_server_app.py
index 74b1b2060d..de36080cb6 100644
--- a/examples/LLM/llama/chat_app/torchserve_server_app.py
+++ b/examples/LLM/llama/chat_app/torchserve_server_app.py
@@ -10,7 +10,9 @@
 
 
 def start_server():
-    os.system("torchserve --start --model-store model_store --ncs")
+    os.system(
+        "torchserve --start --model-store model_store --ncs --disable-token-auth --enable-model-api"
+    )
     st.session_state.started = True
     st.session_state.stopped = False
     st.session_state.registered = False
diff --git a/examples/cloudformation/ec2-asg.yaml b/examples/cloudformation/ec2-asg.yaml
index 59d46f5018..1f2aa12aeb 100644
--- a/examples/cloudformation/ec2-asg.yaml
+++ b/examples/cloudformation/ec2-asg.yaml
@@ -620,7 +620,7 @@ Resources:
           /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackId} --resource TorchServeASG --region ${AWS::Region}
           # Start torchserve
           export LOG_LOCATION="/var/log/torchserve"
-          torchserve --start --ts-config /etc/torchserve/config.properties
+          torchserve --start --ts-config /etc/torchserve/config.properties --disable-token-auth --enable-model-api
     CreationPolicy:
       ResourceSignal:
         Timeout: PT30M
diff --git a/examples/cloudformation/ec2.yaml b/examples/cloudformation/ec2.yaml
index 9189604b09..7dde155bcf 100644
--- a/examples/cloudformation/ec2.yaml
+++ b/examples/cloudformation/ec2.yaml
@@ -419,7 +419,7 @@ Resources:
           # Start torchserve
           mkdir model_store
           export LOG_LOCATION="/var/log/torchserve"
-          torchserve --start --model-store ./model_store --ts-config /etc/torchserve/config.properties
+          torchserve --start --model-store ./model_store --ts-config /etc/torchserve/config.properties --disable-token-auth --enable-model-api
     CreationPolicy:
       ResourceSignal:
         Timeout: PT30M
diff --git a/examples/dcgan_fashiongen/Readme.md b/examples/dcgan_fashiongen/Readme.md
index 3df48aa959..c8898e1ea2 100644
--- a/examples/dcgan_fashiongen/Readme.md
+++ b/examples/dcgan_fashiongen/Readme.md
@@ -19,7 +19,7 @@ Alternatively, you can directly [download the dcgan_fashiongen.mar](https://torc
 ```
 mkdir modelstore
 mv dcgan_fashiongen.mar modelstore/
-torchserve --start --ncs --model-store ./modelstore --models dcgan_fashiongen.mar --disable-token-auth  --enable-model-api
+torchserve --start --ncs --model-store ./modelstore --models dcgan_fashiongen.mar --disable-token-auth --enable-model-api
 ```
 
 ### 3. Generate Images
diff --git a/examples/image_classifier/mnist/README.md b/examples/image_classifier/mnist/README.md
index c9494842a8..cac021fb9f 100644
--- a/examples/image_classifier/mnist/README.md
+++ b/examples/image_classifier/mnist/README.md
@@ -38,7 +38,7 @@ Run the commands given in following steps from the parent directory of the root
     ```bash
     mkdir model_store
     mv mnist.mar model_store/
-    torchserve --start --model-store model_store --models mnist=mnist.mar --ts-config config.properties --disable-token-auth  --enable-model-api
+    torchserve --start --model-store model_store --models mnist=mnist.mar --ts-config config.properties --disable-token-auth --enable-model-api
     curl http://127.0.0.1:8080/predictions/mnist -T examples/image_classifier/mnist/test_data/0.png
     ```
 
diff --git a/examples/large_models/deepspeed_mii/LLM/mii-deepspeed-fastgen.ipynb b/examples/large_models/deepspeed_mii/LLM/mii-deepspeed-fastgen.ipynb
index f42b75f95e..d7c79ecb96 100644
--- a/examples/large_models/deepspeed_mii/LLM/mii-deepspeed-fastgen.ipynb
+++ b/examples/large_models/deepspeed_mii/LLM/mii-deepspeed-fastgen.ipynb
@@ -111,7 +111,7 @@
    },
    "outputs": [],
    "source": [
-    "!torchserve --ncs --start --model-store model_store --models mii-llama--Llama-2-13b-hf --ts-config benchmarks/config.properties"
+    "!torchserve --ncs --start --model-store model_store --models mii-llama--Llama-2-13b-hf --ts-config benchmarks/config.properties --disable-token-auth --enable-model-api"
    ]
   },
   {
diff --git a/examples/large_models/vllm/llama3/Readme.md b/examples/large_models/vllm/llama3/Readme.md
index b7952a0493..a136495169 100644
--- a/examples/large_models/vllm/llama3/Readme.md
+++ b/examples/large_models/vllm/llama3/Readme.md
@@ -35,7 +35,7 @@ mv llama3-8b model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama3-8b --disable-token-auth  --enable-model-api
+torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama3-8b --disable-token-auth --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/vllm/lora/Readme.md b/examples/large_models/vllm/lora/Readme.md
index 8d447966c9..66a616ca8d 100644
--- a/examples/large_models/vllm/lora/Readme.md
+++ b/examples/large_models/vllm/lora/Readme.md
@@ -39,7 +39,7 @@ mv llama-7b-lora model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama-7b-lora --disable-token-auth  --enable-model-api
+torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models llama-7b-lora --disable-token-auth --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/large_models/vllm/mistral/Readme.md b/examples/large_models/vllm/mistral/Readme.md
index a79f425c38..18111845ea 100644
--- a/examples/large_models/vllm/mistral/Readme.md
+++ b/examples/large_models/vllm/mistral/Readme.md
@@ -35,7 +35,7 @@ mv mistral model_store
 ### Step 4: Start torchserve
 
 ```bash
-torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models mistral --disable-token-auth  --enable-model-api
+torchserve --start --ncs --ts-config ../config.properties --model-store model_store --models mistral --disable-token-auth --enable-model-api
 ```
 
 ### Step 5: Run inference
diff --git a/examples/micro_batching/README.md b/examples/micro_batching/README.md
index 7ead8066c9..0f8a3d7fee 100644
--- a/examples/micro_batching/README.md
+++ b/examples/micro_batching/README.md
@@ -61,7 +61,7 @@ Third, we move the MAR file to our model_store and start TorchServe.
 ```bash
 $ mkdir model_store
 $ mv resnet-18_mb.mar model_store/
-$ torchserve --start --ncs --model-store model_store --models resnet-18_mb.mar --disable-token-auth  --enable-model-api
+$ torchserve --start --ncs --model-store model_store --models resnet-18_mb.mar --disable-token-auth --enable-model-api
 ```
 
 Finally, we test the registered model with a request:
diff --git a/examples/nvidia_dali/README.md b/examples/nvidia_dali/README.md
index 793e7f09b0..0af11ee51b 100644
--- a/examples/nvidia_dali/README.md
+++ b/examples/nvidia_dali/README.md
@@ -40,7 +40,7 @@ mv resnet-18.mar model_store/
 ### Start the torchserve
 
 ```bash
-torchserve --start --model-store model_store --models resnet=resnet-18.mar --disable-token-auth  --enable-model-api
+torchserve --start --model-store model_store --models resnet=resnet-18.mar --disable-token-auth --enable-model-api
 ```
 
 ### Run Inference
diff --git a/examples/torchrec_dlrm/README.md b/examples/torchrec_dlrm/README.md
index a90717224b..ee5dbb710b 100644
--- a/examples/torchrec_dlrm/README.md
+++ b/examples/torchrec_dlrm/README.md
@@ -34,7 +34,7 @@ mv dlrm.mar model_store
 Then we can start TorchServe with:
 
 ```
-torchserve --start --model-store model_store --models dlrm=dlrm.mar --disable-token-auth  --enable-model-api
+torchserve --start --model-store model_store --models dlrm=dlrm.mar --disable-token-auth --enable-model-api
 ```
 
 To query the model we can then run:

From 2d2b5f06f69d8ed877157c17575868a9eeda5d1d Mon Sep 17 00:00:00 2001
From: udaij12 
Date: Wed, 3 Jul 2024 08:44:24 -0700
Subject: [PATCH 3/4] spellcheck addition

---
 .../object_detector/yolo/yolov8/README.md     |  2 +-
 ts_scripts/spellcheck_conf/wordlist.txt       | 32 +++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/examples/object_detector/yolo/yolov8/README.md b/examples/object_detector/yolo/yolov8/README.md
index eed9e1d51b..6e0915b1ea 100644
--- a/examples/object_detector/yolo/yolov8/README.md
+++ b/examples/object_detector/yolo/yolov8/README.md
@@ -6,7 +6,7 @@ Install `ultralytics` using
 python -m pip install -r requirements.txt
 ```
 
-In this example, we are using the YOLOv8 Nano model from ultralytics.Downlaod the pretrained weights from [Ultralytics](https://docs.ultralytics.com/models/yolov8/#supported-modes)
+In this example, we are using the YOLOv8 Nano model from ultralytics. Download the pretrained weights from [Ultralytics](https://docs.ultralytics.com/models/yolov8/#supported-modes)
 
 ```
 wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt
diff --git a/ts_scripts/spellcheck_conf/wordlist.txt b/ts_scripts/spellcheck_conf/wordlist.txt
index b626447bb3..780b82cc39 100644
--- a/ts_scripts/spellcheck_conf/wordlist.txt
+++ b/ts_scripts/spellcheck_conf/wordlist.txt
@@ -1260,3 +1260,35 @@ torchcompile
 HPU
 hpu
 llm
+OpenCV
+DeprecationWarning
+anyio
+envs
+mpl
+pluggy
+rootdir
+ruamel
+toolkits
+bcb
+ceea
+fae
+BPE
+GQA
+Mixtral
+instalation
+mixtral
+moe
+Nano
+Ultralytics
+Ultralytics's
+YOLOv
+Yolov
+ultralytics
+yolov
+Gaudi
+habana
+openvino
+StableDiffusionXL
+photorealistic
+miniconda
+torchaudio

From 380cf495320385ce1eb6902b00265f3eb8cfeb8b Mon Sep 17 00:00:00 2001
From: udaij12 
Date: Mon, 8 Jul 2024 14:55:53 -0700
Subject: [PATCH 4/4] removing disable-token reference

---
 examples/Huggingface_Transformers/README.md    | 2 +-
 examples/pt2/torch_compile_hpu/README.md       | 2 +-
 examples/torch_tensorrt/torchcompile/README.md | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/Huggingface_Transformers/README.md b/examples/Huggingface_Transformers/README.md
index a555190977..4fc3b34a96 100644
--- a/examples/Huggingface_Transformers/README.md
+++ b/examples/Huggingface_Transformers/README.md
@@ -114,7 +114,7 @@ To register the model on TorchServe using the above model archive file, we run t
 ```
 mkdir model_store
 mv BERTSeqClassification.mar model_store/
-torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --disable-token --ncs --disable-token-auth  --enable-model-api
+torchserve --start --model-store model_store --models my_tc=BERTSeqClassification.mar --ncs --disable-token-auth --enable-model-api
 
 ```
 
diff --git a/examples/pt2/torch_compile_hpu/README.md b/examples/pt2/torch_compile_hpu/README.md
index 7aaf43f74a..077bd4989f 100644
--- a/examples/pt2/torch_compile_hpu/README.md
+++ b/examples/pt2/torch_compile_hpu/README.md
@@ -74,7 +74,7 @@ PT_HPU_LAZY_MODE=0 torch-model-archiver --model-name resnet-50 --version 1.0 --m
 
 Start the TorchServe server using the following command:
 ```bash
-PT_HPU_LAZY_MODE=0 torchserve --start --ncs --disable-token --model-store model_store --models resnet-50.mar --disable-token-auth  --enable-model-api
+PT_HPU_LAZY_MODE=0 torchserve --start --ncs --model-store model_store --models resnet-50.mar --disable-token-auth --enable-model-api
 ```
 `--disable-token` - this is an option that disables token authorization. This option is used here only for example purposes. Please refer to the torchserve [documentation](https://github.com/pytorch/serve/blob/master/docs/token_authorization_api.md), which describes the process of serving the model using tokens.
 
diff --git a/examples/torch_tensorrt/torchcompile/README.md b/examples/torch_tensorrt/torchcompile/README.md
index 4584ec6f4f..bb1c9c5767 100644
--- a/examples/torch_tensorrt/torchcompile/README.md
+++ b/examples/torch_tensorrt/torchcompile/README.md
@@ -35,7 +35,7 @@ torch-model-archiver --model-name res50-trt --handler image_classifier --version
 
 #### Start TorchServe
 ```
-torchserve --start --model-store model_store --models res50-trt=res50-trt.mar --disable-token --ncs --disable-token-auth  --enable-model-api
+torchserve --start --model-store model_store --models res50-trt=res50-trt.mar --ncs --disable-token-auth --enable-model-api
 ```
 
 #### Run Inference