diff --git a/doc/source/data/batch_inference.rst b/doc/source/data/batch_inference.rst index 6759daab3ae1..c26eba992203 100644 --- a/doc/source/data/batch_inference.rst +++ b/doc/source/data/batch_inference.rst @@ -5,7 +5,7 @@ End-to-end: Offline Batch Inference .. tip:: - `Get in touch `_ to get help using Ray Data, the industry's fastest and cheapest solution for offline batch inference. + `Get in touch `_ to get help using Ray Data, the industry's fastest and cheapest solution for offline batch inference. Offline batch inference is a process for generating model predictions on a fixed set of input data. Ray Data offers an efficient and scalable solution for batch inference, providing faster execution and cost-effectiveness for deep learning applications. @@ -27,7 +27,7 @@ To start, install Ray Data: Using Ray Data for offline inference involves four basic steps: - **Step 1:** Load your data into a Ray Dataset. Ray Data supports many different data sources and formats. For more details, see :ref:`Loading Data `. -- **Step 2:** Define a Python class to load the pre-trained model. +- **Step 2:** Define a Python class to load the pre-trained model. - **Step 3:** Transform your dataset using the pre-trained model by calling :meth:`ds.map_batches() `. For more details, see :ref:`Transforming Data `. - **Step 4:** Get the final predictions by either iterating through the output or saving the results. For more details, see the :ref:`Iterating over data ` and :ref:`Saving data ` user guides. @@ -37,14 +37,14 @@ For how to configure batch inference, see :ref:`the configuration guide` +- :doc:`Image Classification Batch Inference with PyTorch ResNet18 ` - :doc:`Object Detection Batch Inference with PyTorch FasterRCNN_ResNet50 ` - :doc:`Image Classification Batch Inference with Huggingface Vision Transformer ` @@ -200,21 +200,21 @@ To use GPUs for inference, make the following changes to your code: 1. Update the class implementation to move the model and data to and from GPU. 2. Specify `num_gpus=1` in the :meth:`ds.map_batches() ` call to indicate that each actor should use 1 GPU. -3. Specify a `batch_size` for inference. For more details on how to configure the batch size, see `batch_inference_batch_size`_. +3. Specify a `batch_size` for inference. For more details on how to configure the batch size, see `batch_inference_batch_size`_. The remaining is the same as the :ref:`Quickstart `. .. tabs:: .. group-tab:: HuggingFace - + .. testcode:: - + from typing import Dict import numpy as np import ray - + ds = ray.data.from_numpy(np.asarray(["Complete this", "for me"])) class HuggingFacePredictor: @@ -230,21 +230,21 @@ The remaining is the same as the :ref:`Quickstart `. # Use 2 actors, each actor using 1 GPU. 2 GPUs total. predictions = ds.map_batches( - HuggingFacePredictor, + HuggingFacePredictor, num_gpus=1, - # Specify the batch size for inference. + # Specify the batch size for inference. # Increase this for larger datasets. - batch_size=1, + batch_size=1, # Set the ActorPool size to the number of GPUs in your cluster. - compute=ray.data.ActorPoolStrategy(size=2), + compute=ray.data.ActorPoolStrategy(size=2), ) predictions.show(limit=1) - + .. testoutput:: - :skipif: True + :options: +MOCK {'data': 'Complete this', 'output': 'Complete this poll. Which one do you think holds the most promise for you?\n\nThank you'} - + .. group-tab:: PyTorch @@ -277,18 +277,18 @@ The remaining is the same as the :ref:`Quickstart `. # Use 2 actors, each actor using 1 GPU. 2 GPUs total. predictions = ds.map_batches( - TorchPredictor, + TorchPredictor, num_gpus=1, - # Specify the batch size for inference. + # Specify the batch size for inference. # Increase this for larger datasets. batch_size=1, # Set the ActorPool size to the number of GPUs in your cluster. - compute=ray.data.ActorPoolStrategy(size=2) + compute=ray.data.ActorPoolStrategy(size=2) ) predictions.show(limit=1) .. testoutput:: - :skipif: True + :options: +MOCK {'output': array([0.5590901], dtype=float32)} @@ -302,7 +302,7 @@ The remaining is the same as the :ref:`Quickstart `. from tensorflow import keras import ray - + ds = ray.data.from_numpy(np.ones((1, 100))) class TFPredictor: @@ -320,18 +320,18 @@ The remaining is the same as the :ref:`Quickstart `. # Use 2 actors, each actor using 1 GPU. 2 GPUs total. predictions = ds.map_batches( - TFPredictor, + TFPredictor, num_gpus=1, - # Specify the batch size for inference. + # Specify the batch size for inference. # Increase this for larger datasets. batch_size=1, # Set the ActorPool size to the number of GPUs in your cluster. - compute=ray.data.ActorPoolStrategy(size=2) + compute=ray.data.ActorPoolStrategy(size=2) ) predictions.show(limit=1) .. testoutput:: - :skipif: True + :options: +MOCK {'output': array([0.625576], dtype=float32)} @@ -345,7 +345,7 @@ Configure the size of the input batch that is passed to ``__call__`` by setting Increasing batch size results in faster execution because inference is a vectorized operation. For GPU inference, increasing batch size increases GPU utilization. Set the batch size to as large possible without running out of memory. If you encounter OOMs, decreasing ``batch_size`` may help. .. testcode:: - + import numpy as np import ray @@ -355,7 +355,7 @@ Increasing batch size results in faster execution because inference is a vectori def assert_batch(batch: Dict[str, np.ndarray]): assert len(batch) == 2 return batch - + # Specify that each input batch should be of size 2. ds.map_batches(assert_batch, batch_size=2) @@ -392,12 +392,12 @@ Suppose your cluster has 4 nodes, each with 16 CPUs. To limit to at most .. testcode:: :skipif: True - + from typing import Dict import numpy as np import ray - + ds = ray.data.from_numpy(np.asarray(["Complete this", "for me"])) class HuggingFacePredictor: @@ -411,10 +411,10 @@ Suppose your cluster has 4 nodes, each with 16 CPUs. To limit to at most return batch predictions = ds.map_batches( - HuggingFacePredictor, + HuggingFacePredictor, # Require 5 CPUs per actor (so at most 3 can fit per 16 CPU node). num_cpus=5, # 3 actors per node, with 4 nodes in the cluster means ActorPool size of 12. - compute=ray.data.ActorPoolStrategy(size=12) + compute=ray.data.ActorPoolStrategy(size=12) ) predictions.show(limit=1)