diff --git a/doc/source/data/api/datastream.rst b/doc/source/data/api/datastream.rst index 1963ce142f2b..e4c789bca050 100644 --- a/doc/source/data/api/datastream.rst +++ b/doc/source/data/api/datastream.rst @@ -138,9 +138,10 @@ Execution --------- .. autosummary:: - :toctree: doc/ + :toctree: doc/ - Datastream.materialize + Datastream.materialize + ActorPoolStrategy Serialization ------------- diff --git a/doc/source/data/batch_inference.rst b/doc/source/data/batch_inference.rst new file mode 100644 index 000000000000..5ace6a8bb45f --- /dev/null +++ b/doc/source/data/batch_inference.rst @@ -0,0 +1,751 @@ +.. _batch_inference_home: + +Running Batch Inference with Ray +================================ + +.. note:: + + In this tutorial you'll learn what batch inference is, why you might want to use + Ray for it, and how to use Ray effectively for this task. + If you are familiar with the basics of inference tasks, jump straight to + code in the :ref:`quickstart section <batch_inference_quickstart>` or the + :ref:`advanced guide<batch_inference_advanced_pytorch_example>`. + +Batch inference refers to generating model predictions on a set of input data. +The model can range from a simple Python function to a complex neural network. +In batch inference, also known as offline inference, your model is run on a large +batch of data on demand. +This is in contrast to online inference, where the model is run immediately on a +data point when it becomes available. + +Here's a simple schematic of batch inference, "mapping" batches to predictions +via model inference: + +.. figure:: images/batch_inference.png + + Evaluating a batch of input data with a model to get predictions. + +Batch inference is a foundational workload for many AI companies, especially since +more and more pre-trained models become available. +And while batch inference looks simple at the surface, it can be challenging to do right in production. +For instance, your data batches can be excessively large, too slow to process sequentially, +or might need custom preprocessing before being fed into your models. +To run inference workloads effectively at scale, you need to: + +- manage your compute infrastructure and cloud clusters +- parallelize data processing and utilize all your cluster resources (CPUs and GPUs) +- efficiently transfer data between cloud storage, CPUs for preprocessing, and GPUs for model inference + +Here's a realistic view of batch inference for modern AI applications: + +.. figure:: images/batch_inference_overview.png + + Evaluating a batch of input data with a model to get predictions. + +Why use Ray for batch inference? +--------------------------------- + +There are reasons to use Ray for batch inference, even if your current +use case does not require scaling yet: + +1. **Faster and Cheaper for modern Deep Learning Applications**: + Ray is built for + complex workloads and supports loading and preprocessing data with CPUs and model inference on GPUs. +2. **Cloud, framework, and data format agnostic**: + Ray Data works on any cloud provider or + any ML framework (like PyTorch and Tensorflow) and does not require a particular file format. +3. **Out of the box scaling**: + The same code that works on one machine also runs on a + large cluster without any changes. +4. **Python first**: + You can express your inference job directly in Python instead of + YAML files or other formats. + +.. _batch_inference_quickstart: + +Quick Start +----------- + +Install Ray with the data processing library, Ray Data: + +.. code-block:: bash + + pip install ray[data] + +Running batch inference is conceptually easy and requires three steps: + +1. Load your data into a Ray dataset and optionally apply any preprocessing you need. +2. Define your model for inference. +3. Run inference on your data by using the :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` + method from Ray Data. + +The last step also defines how your batch processing job gets distributed across your (local) cluster. +We start with very simple use cases here and build up to more complex ones in other guides and tutorials. + +.. note:: + + All advanced use cases ultimately boil down to extensions of the above three steps, + like loading and storing data from cloud storage, using complex preprocessing functions, + demanding model setups, additional postprocessing, or other customizations. + We'll cover these advanced use cases in the next sections. + +1. Loading and preprocessing data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For this quick start guide we use very small, in-memory data sets by +leveraging common Python libraries like NumPy and Pandas. +In general, once you load your datasets using Ray Data, you also want to apply some preprocessing steps. +We skip this step here for simplicity. +In any case, the result of this step is a Ray Datastream ``ds`` that we can use to run inference on. + +.. margin:: + + For larger data sets, you can use Ray Data to load data from cloud storage like S3 or GCS. + We'll cover this later on. + +.. tabs:: + + .. group-tab:: HuggingFace + + Create a Pandas + DataFrame with text data to run a GPT-2 model on. + + .. literalinclude:: ./doc_code/hf_quick_start.py + :language: python + :start-after: __hf_quickstart_load_start__ + :end-before: __hf_quickstart_load_end__ + + .. group-tab:: PyTorch + + Create a NumPy array with 100 + entries, which represents the input to a feed-forward neural network. + + .. literalinclude:: ./doc_code/pytorch_quick_start.py + :language: python + :start-after: __pt_quickstart_load_start__ + :end-before: __pt_quickstart_load_end__ + + .. group-tab:: TensorFlow + + Create a NumPy array with 100 + entries, which represents the input to a feed-forward neural network. + + .. literalinclude:: ./doc_code/tf_quick_start.py + :language: python + :start-after: __tf_quickstart_load_start__ + :end-before: __tf_quickstart_load_end__ + +2. Setting up your model +~~~~~~~~~~~~~~~~~~~~~~~~ + +Next, you want to set up your model for inference, by defining a predictor. +The core idea is to define a class that loads your model in its ``__init__`` method and +and implements a ``__call__`` method that takes a batch of data and returns a batch of predictions. +Below you find examples for PyTorch, TensorFlow, and HuggingFace. + +.. tabs:: + + .. group-tab:: HuggingFace + + .. callout:: + + .. literalinclude:: ./doc_code/hf_quick_start.py + :language: python + :start-after: __hf_quickstart_model_start__ + :end-before: __hf_quickstart_model_end__ + + .. annotations:: + <1> Use the constructor (``__init__``) to initialize your model. + + <2> The ``__call__`` method runs inference on a batch of data. + + .. group-tab:: PyTorch + + .. callout:: + + .. literalinclude:: ./doc_code/pytorch_quick_start.py + :language: python + :start-after: __pt_quickstart_model_start__ + :end-before: __pt_quickstart_model_end__ + + .. annotations:: + <1> Use the constructor (``__init__``) to initialize your model. + + <2> The ``__call__`` method runs inference on a batch of data. + + + .. group-tab:: TensorFlow + + .. callout:: + + .. literalinclude:: ./doc_code/tf_quick_start.py + :language: python + :start-after: __tf_quickstart_model_start__ + :end-before: __tf_quickstart_model_end__ + + .. annotations:: + <1> Use the constructor (``__init__``) to initialize your model. + + <2> The ``__call__`` method runs inference on a batch of data. + + +3. Getting predictions with Ray Data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Once you have your Ray Datastream ``ds`` and your predictor class, you can use +:meth:`ds.map_batches() <ray.data.Dataset.map_batches>` to get predictions. +``map_batches`` takes your predictor class as an argument and allows you to specify +``compute`` resources by defining the :class:`ActorPoolStrategy <ray.data.ActorPoolStrategy>`. +In the example below, we use two CPUs to run inference in parallel and then print the results. +We cover resource allocation in more detail in :ref:`the configuration section of this guide <batch_inference_config>`. + +.. tabs:: + + .. group-tab:: HuggingFace + + .. literalinclude:: ./doc_code/hf_quick_start.py + :language: python + :start-after: __hf_quickstart_prediction_start__ + :end-before: __hf_quickstart_prediction_end__ + + .. group-tab:: PyTorch + + .. literalinclude:: ./doc_code/pytorch_quick_start.py + :language: python + :start-after: __pt_quickstart_prediction_start__ + :end-before: __pt_quickstart_prediction_end__ + + .. group-tab:: TensorFlow + + .. literalinclude:: ./doc_code/tf_quick_start.py + :language: python + :start-after: __tf_quickstart_prediction_start__ + :end-before: __tf_quickstart_prediction_end__ + +.. _batch_inference_advanced_pytorch_example: + +Advanced batch inference guide +------------------------------ + + Let's use batch inference on a pre-trained PyTorch model for image classification +to illustrate advanced concepts of batch processing with Ray. + +.. important:: + + If you want to dive right into example use cases next, consider reading the following + tutorials next: + + .. panels:: + :container: container pb-3 + :column: col-md-3 px-1 py-1 + :img-top-cls: p-2 w-75 d-block mx-auto fixed-height-img + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/ocr_example + :type: ref + :text: Batch OCR processing using Ray Data + :classes: btn-link btn-block stretched-link + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/torch_detection + :type: ref + :text: Fine-tuning an Object Detection Model and using it for Batch Inference + :classes: btn-link btn-block stretched-link + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/torch_image_example + :type: ref + :text: Training an Image Classifier and using it for Batch Inference + :classes: btn-link btn-block stretched-link + + +Loading data with Ray Data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In the quick start guide we glossed over the details of loading data with Ray Data. +Your data might be stored in a variety of formats, and you might want to load it from different sources. +Ray Data supports multiple formats and sources out of the box. +The :ref:`guide to creating datasets <creating_datasets>` is the ultimate resource +to learn more about loading data with Ray Data, but we'll cover the basics here, too. + +.. hint:: + + With Ray Data, you can :ref:`create synthetic data in Python<dataset_generate_data>`, + :ref:`load data from various storage solutions<dataset_reading_from_storage>` such as S3, + HDFS, or GCS, using common formats such as CSV, JSON, Text, Images, Binary, + TFRecords, Parquet, and more. Ray Data also supports reading from common SQL and NoSQL + databases, and allows you to define your own, custom data sources. + + You can also read :ref:`common Python library formats <dataset_from_in_memory_data_single_node>` + such as Pandas, NumPy, Arrow, or plain Python objects, as well as from + :ref:`distributed data processing frameworks <dataset_from_in_memory_data_distributed>` + such as Spark, Dask, Modin, or Mars. + + Of course, Ray Data also supports :ref:`reading data from common ML frameworks <dataset_from_torch_tf>` + like PyTorch, TensorFlow or HuggingFace. + +.. callout:: + + .. literalinclude:: ./doc_code/torch_image_batch_trained.py + :language: python + :start-after: __pt_load_start__ + :end-before: __pt_load_end__ + + .. annotations:: + <1> We use one gigabyte of image data from the Imagenet dataset from S3. + + <2> We use ``read_images`` from Ray Data and limit the number of images to 1000. + +The process of loading data with Ray Data is as diverse as the data you have. +For instance, in the example above we didn't load the text labels for our images, +which would require a different data source and loading function. +For any advanced use cases, we recommend you read the +:ref:`guide to creating datasets <creating_datasets>`. + +Preprocessing with Ray Data +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +After loading your data, it often needs to be preprocessed prior to inference. +This may include cropping or resizing images, or tokenizing raw text. + +To introduce common terminology, with :ref:`Ray Data <datasets>` you can define +:term:`user-defined functions<User-defined function (UDF)>` (UDFs) that transform batches of your data. +As you've seen before, applying these UDFs via +:meth:`ds.map_batches() <ray.data.Dataset.map_batches>` outputs a new, transformed dataset. + +.. note:: + + The way we do preprocessing here is conceptually close to how we do batch + inference, and we use the same :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` + call from Ray Data to run this task. + The main difference is that we don't use a machine learning model to transform our data, + which has some practical consequences. For instance, in the example below we simply + define a map function that we pass into ``map_batches``, and not a class. + +To transform our raw images loaded from S3 in the last step, we use functionality from +the ``torchvision`` package to define a UDF called ``preprocess_images``. + +.. callout:: + + .. literalinclude:: ./doc_code/torch_image_batch_trained.py + :language: python + :start-after: __pt_preprocess_start__ + :end-before: __pt_preprocess_end__ + + .. annotations:: + <1> We compose PyTorch tensor creation with image preprocessing, so that our processed images "fit" into a ``ResNet18`` PyTorch model. + + <2> We then define a simple UDF to transform batches of raw data accordingly. Note that these batches come as dictionaries of NumPy images stored in the ``"images"`` key. + + <3> Finally, we apply the UDF to our dataset using ``map_batches``. + +.. tip:: + + For the full suite of transformations available in Ray Data, read + :ref:`the data transformation guide <transforming_datasets>`. + +.. caution:: + + Depending on how you load your data and what input data format you use, the dataset + loaded with :ref:`Ray Data <datasets>` will have different *batch formats*. + For instance, image data might be naturally stored in NumPy format, while tabular + data makes much more sense as a Pandas DataFrame. + What (default) batch format your data has and how to deal with it is explained in + detail in :ref:`the batch format section <batch_inference_formats>`. + +Defining predictors as stateful UDFs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +One of the key value adds of Ray over other distributed systems is the support for +distributed stateful operations. These stateful operations are especially useful +for inference since the model only needs to be initialized once, instead of per batch. + +.. margin:: + + In short, running model inference means applying + :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` + to a dataset with a trained model as a UDF. + +You've already seen how to do this in the quickstart section of this guide, but now +that you're equipped with more knowledge, let's have a look at how to define a +stateful UDF with Ray for our pretrained ResNet model: + +.. callout:: + + .. literalinclude:: ./doc_code/torch_image_batch_trained.py + :language: python + :start-after: __pt_model_start__ + :end-before: __pt_model_end__ + + .. annotations:: + <1> The ``__init__`` method is used to initialize the model once. Ray takes care of distributing and managing this state for our batch processing task. + + <2> The ``__call__`` method is used to apply the model to a batch of data. + + <3> We're free to use any custom code in a stateful UDF, and here we prepare the data to run on GPUs. + + <4> Finally, we return the ``"class"`` key of the model predictions as Numpy array. + + +Scalable inference with Ray Data +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To get predictions, we call :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`, +by making sure to specify a :class:`ActorPoolStrategy <ray.data.ActorPoolStrategy>` +which defines how many workers to use for inference. + +.. callout:: + + .. literalinclude:: ./doc_code/torch_image_batch_trained.py + :language: python + :start-after: __pt_prediction_start__ + :end-before: __pt_prediction_end__ + + .. annotations:: + <1> In this example we use a total of four Ray Actors to run inference on our dataset. + + <2> Each actor should use one GPU. + +To summarize, mapping a UDF over batches is the simplest transform for Ray Datastreams. +The UDF defines the logic for transforming individual batches of data of the dataset +Performing operations over batches of data is more performant than single element +operations as it can leverage the underlying vectorization capabilities of Pandas or NumPy. + + +.. note:: + + You can use :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` on functions, too. + This is mostly useful for quick transformations of your data that doesn't require + an ML model or other stateful objects. + To handle state, using classes like we did above is the recommended way. + In the dropdown below you find an example of mapping data with a simple Python + function. + + .. dropdown:: Example using ``map_batches`` with functions + + This example transforms example data using a simple Python function. + The ``map_function`` uses the fact that our ``data`` batches in this particular + example are Pandas dataframes. + Note that by using a map function instead of a class, we don't have to define + :class:`ActorPoolStrategy <ray.data.ActorPoolStrategy>` to specify compute resources. + + .. literalinclude:: ./doc_code/batch_formats.py + :language: python + :start-after: __simple_map_function_start__ + :end-before: __simple_map_function_end__ + +.. _batch_inference_formats: + +Working with batch formats +-------------------------- + +Now that you've seen examples of batch inference with Ray, let's have a closer look +at how to deal with different data formats. +First of all, you need to distinguish between two types of batch formats: + +- Input batch formats: This is the format of the input to your UDFs. You will often have to + refer to the right format name to run batch inference on your data. +- Output batch formats: This is the format your UDFs return. + +In many standard cases, the input batch format is the same as the output batch format, +but it's good to be aware of the differences. + +.. margin:: + We refer to batch formats by name in Ray Data (using strings). + For instance, the batch format used to represent Pandas dataframes is called ``"pandas"``. + We often use batch format names and the libraries they represent interchangeably. + +Let's focus on the three available input batch formats first, +namely Pandas, NumPy, and Arrow, and how they're used in Ray Data: + +.. tabbed:: Pandas + + The ``"pandas"`` batch format presents batches in + `pandas.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__ + format. If converting a simple dataset to Pandas DataFrame batches, a single-column + dataframe with the column ``"__value__"`` will be created. + + .. literalinclude:: ./doc_code/batch_formats.py + :language: python + :start-after: __simple_pandas_start__ + :end-before: __simple_pandas_end__ + +.. tabbed:: NumPy + + The ``"numpy"`` batch format presents batches in + `numpy.ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`__ + format as follows: + + * **Tabular datasets**: Each batch will be a dictionary of NumPy + ndarrays (``Dict[str, np.ndarray]``), with each key-value pair representing a column + in the table. + + * **Tensor datasets** (single-column): Each batch will be a single + `numpy.ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`__ + containing the single tensor column for this batch. + + * **Simple datasets**: Each batch will be a single NumPy ndarray, where Ray Data will + attempt to convert each list-batch to an ndarray. + + .. literalinclude:: ./doc_code/batch_formats.py + :language: python + :start-after: __simple_numpy_start__ + :end-before: __simple_numpy_end__ + +.. tabbed:: Arrow + + The ``"pyarrow"`` batch format presents batches in ``pyarrow.Table`` format. + If converting a simple dataset to Arrow Table batches, a single-column table + with the column ``"__value__"`` will be created. + + .. literalinclude:: ./doc_code/batch_formats.py + :language: python + :start-after: __simple_pyarrow_start__ + :end-before: __simple_pyarrow_end__ + +When defining the return value of your UDF, you can choose between +Pandas dataframes (``pandas.DataFrame``), NumPy arrays (``numpy.ndarray``), Arrow tables +(``pyarrow.Table``), dictionaries of NumPy arrays (``Dict[str, np.ndarray]``) or simple +Python lists (``list``). +You can learn more about output formats in :ref:`the output format guide<transform_datasets_batch_output_types>`. + +.. important:: + + No matter which batch format you use, you will always have to be familiar with + the underlying APIs used to represent your data. For instance, if you use the + ``"pandas"`` batch format, you will need to know the basics of interacting with + dataframes to make your batch inference jobs work. + +Default data formats +~~~~~~~~~~~~~~~~~~~~ + +In all the examples we've seen so far, we didn't have to specify the batch format. +In fact, the format is inferred from the input dataset, which can be straightforward. +For instance, when loading a NumPy array with :meth:`ray.data.from_numpy() <ray.data.from_numpy>`, +the batch format will be ``"numpy"``, but it's not always that easy. + +In any case, Ray Data has a ``"default"`` batch format that is computed per data type +as follows: + +.. tabbed:: Tabular data + + Each batch will be a + `pandas.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__. + This may incur a conversion cost if the underlying Datastream block is not + zero-copy convertible from an Arrow table. + + .. literalinclude:: ./doc_code/transforming_datastreams.py + :language: python + :start-after: __writing_default_udfs_tabular_begin__ + :end-before: __writing_default_udfs_tabular_end__ + +.. tabbed:: Tensor data (single-column) + + Each batch will be a single + `numpy.ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`__ + containing the single tensor column for this batch. + + .. literalinclude:: ./doc_code/transforming_datastreams.py + :language: python + :start-after: __writing_default_udfs_tensor_begin__ + :end-before: __writing_default_udfs_tensor_end__ + +.. tabbed:: Simple data + + Each batch will be a Python list. + + .. literalinclude:: ./doc_code/transforming_datastreams.py + :language: python + :start-after: __writing_default_udfs_list_begin__ + :end-before: __writing_default_udfs_list_end__ + + +.. seealso:: + + As we've discussed in this guide, using :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` + on a class defining your model + should be your default choice for running inference with Ray. + For instance, if you're already using the Ray AIR framework for running your ML workflows, + you may want to use the + :ref:`framework-specific batch predictor implementations<air_framework_predictors>`. + + To see an extension of the quick start example using an AIR + ``HuggingFacePredictor``, see the following example: + + .. dropdown:: Batch inference example with HuggingFace and Ray AIR + + .. literalinclude:: ./doc_code/hf_quick_start.py + :language: python + :start-after: __hf_quickstart_air_start__ + :end-before: __hf_quickstart_air_end__ + + +.. _batch_inference_config: +Configuration & Troubleshooting +------------------------------- + +Configuring Batch Size +~~~~~~~~~~~~~~~~~~~~~~ + +An important parameter to set for :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` +is ``batch_size``, which controls the size of the batches provided to the UDF. +Here's a simple example of loading the IRIS dataset (which has Pandas format by default) +and processing it with a batch size of `10`: + +.. literalinclude:: ./doc_code/batch_formats.py + :language: python + :start-after: __simple_map_function_start__ + :end-before: __simple_map_function_end__ + +Increasing ``batch_size`` can result in faster execution by better leveraging vectorized +operations and hardware, reducing batch slicing and concatenation overhead, and overall +saturation of CPUs or GPUs. +On the other hand, this will also result in higher memory utilization, which can +lead to out-of-memory (OOM) failures. +If encountering OOMs, decreasing your ``batch_size`` may help. + +.. caution:: + The default ``batch_size`` of ``4096`` may be too large for datasets with large rows + (e.g. tables with many columns or a collection of large images). + +Using GPUs in batch inference +----------------------------- + +To use GPUs for inference, first pdate the callable class implementation to +move the model and data to and from Cuda device. +Here's a quick example for a PyTorch model: + +.. code-block:: diff + + from torchvision.models import resnet18 + + class TorchModel: + def __init__(self): + self.model = resnet18(pretrained=True) + + self.model = self.model.cuda() + self.model.eval() + + def __call__(self, batch: List[torch.Tensor]): + torch_batch = torch.stack(batch) + + torch_batch = torch_batch.cuda() + with torch.inference_mode(): + prediction = self.model(torch_batch) + - return {"class": prediction.argmax(dim=1).detach().numpy()} + + return {"class": prediction.argmax(dim=1).detach().cpu().numpy()} + + +Next, specify ``num_gpus=N`` in :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` +to indicate that each inference worker should use ``N`` GPUs. + +.. code-block:: diff + + predictions = dataset.map_batches( + TorchModel, + compute=ray.data.ActorPoolStrategy(size=2), + + num_gpus=1 + ) + +**How should I configure num_cpus and num_gpus for my model?** + +By default, Ray will assign 1 CPU per task or actor. For example, on a machine +with 16 CPUs, this will result in 16 tasks or actors running concurrently for inference. +To change this, you can specify ``num_cpus=N``, which will tell Ray to reserve more CPUs +for the task or actor, or ``num_gpus=N``, which will tell Ray to reserve/assign GPUs +(GPUs will be assigned via `CUDA_VISIBLE_DEVICES` env var). + +.. code-block:: python + + # Use 16 actors, each of which is assigned 1 GPU (16 GPUs total). + ds = ds.map_batches( + MyFn, + compute=ActorPoolStrategy(size=16), + num_gpus=1 + ) + + # Use 16 actors, each of which is reserved 8 CPUs (128 CPUs total). + ds = ds.map_batches( + MyFn, + compute=ActorPoolStrategy(size=16), + num_cpus=8) + + +**How should I deal with OOM errors due to heavy model memory usage?** + +It's common for models to consume a large amount of heap memory. For example, if a model +uses 5GB of RAM when created / run, and a machine has 16GB of RAM total, then no more +than three of these models can be run at the same time. The default resource assignments +of one CPU per task/actor will likely lead to OutOfMemoryErrors from Ray in this situation. + +Let's suppose our machine has 16GiB of RAM and 8 GPUs. To tell Ray to construct at most +3 of these actors per node, we can override the CPU or memory: + +.. code-block:: python + + # Require 5 CPUs per actor (so at most 3 can fit per 16 CPU node). + ds = ds.map_batches(MyFn, + compute=ActorPoolStrategy(size=16), num_cpus=5) + +Learn more +---------- + + +Batch inference is just one small part of the Machine Learning workflow, and only +a fraction of what Ray can do. + +.. figure:: images/train_predict_pipeline.png + + How batch inference fits into the bigger picture of training and prediction AI models. + +To learn more about Ray and batch inference, check out the following +tutorials and examples: + +.. panels:: + :container: container pb-3 + :column: col-md-3 px-1 py-1 + :img-top-cls: p-2 w-75 d-block mx-auto fixed-height-img + + --- + :img-top: /images/ray_logo.png + + .. link-button:: https://github.com/ray-project/ray-educational-materials/blob/main/Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb + :type: url + :text: Scalable Batch Inference with Ray for Semantic Segmentation + :classes: btn-link btn-block stretched-link + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/nyc_taxi_basic_processing + :type: ref + :text: Batch Inference on NYC taxi data using Ray Data + :classes: btn-link btn-block stretched-link + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/ocr_example + :type: ref + :text: Batch OCR processing using Ray Data + :classes: btn-link btn-block stretched-link + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/torch_detection + :type: ref + :text: Fine-tuning an Object Detection Model and using it for Batch Inference + :classes: btn-link btn-block stretched-link + + --- + :img-top: /images/ray_logo.png + + .. link-button:: /data/examples/torch_image_example + :type: ref + :text: Training an Image Classifier and using it for Batch Inference + :classes: btn-link btn-block stretched-link diff --git a/doc/source/data/doc_code/batch_formats.py b/doc/source/data/doc_code/batch_formats.py new file mode 100644 index 000000000000..8dc1136e6124 --- /dev/null +++ b/doc/source/data/doc_code/batch_formats.py @@ -0,0 +1,66 @@ +# flake8: noqa +# isort: skip_file +# fmt: off + +# __simple_map_function_start__ +import ray + +ds = ray.data.read_csv("example://iris.csv") + +def map_function(data): + return data[data["sepal.length"] < 5] + +transformed = ds.map_batches(map_function, batch_size=10) +# __simple_map_function_end__ + +# __simple_pandas_start__ +import ray +import pandas as pd + +ds = ray.data.read_csv("example://iris.csv") +ds.show(1) +# -> {'sepal.length': 5.1, ..., 'petal.width': 0.2, 'variety': 'Setosa'} + +ds.default_batch_format() +# pandas.core.frame.DataFrame + +def transform_pandas(df_batch: pd.DataFrame) -> pd.DataFrame: + df_batch = df_batch[df_batch["variety"] == "Versicolor"] + df_batch.loc[:, "normalized.sepal.length"] = df_batch["sepal.length"] / df_batch["sepal.length"].max() + df_batch = df_batch.drop(columns=["sepal.length"]) + return df_batch + +ds.map_batches(transform_pandas).show(1) +# -> {..., 'variety': 'Versicolor', 'normalized.sepal.length': 1.0} +# __simple_pandas_end__ + +# __simple_numpy_start__ +import ray +import numpy as np + +ds = ray.data.range_tensor(1000, shape=(2, 2)) +ds.default_batch_format() +# 'numpy.ndarray' + +def transform_numpy(arr: np.ndarray) -> np.ndarray: + return arr * 2 + +ds.map_batches(transform_numpy) +# __simple_numpy_end__ + + +# __simple_pyarrow_start__ +import ray +import pyarrow as pa +import pyarrow.compute as pac + +ds = ray.data.read_csv("example://iris.csv") + +def transform_pyarrow(batch: pa.Table) -> pa.Table: + batch = batch.filter(pac.equal(batch["variety"], "Versicolor")) + return batch.drop(["sepal.length"]) + +ds.map_batches(transform_pyarrow, batch_format="pyarrow").show(1) +# -> {'sepal.width': 3.2, ..., 'variety': 'Versicolor'} +# __simple_pyarrow_end__ +# fmt: on diff --git a/doc/source/data/doc_code/hf_quick_start.py b/doc/source/data/doc_code/hf_quick_start.py new file mode 100644 index 000000000000..c9de271ad4ec --- /dev/null +++ b/doc/source/data/doc_code/hf_quick_start.py @@ -0,0 +1,51 @@ +# flake8: noqa +# isort: skip_file +# fmt: off + +# __hf_quickstart_load_start__ +import ray +import pandas as pd + + +prompts = pd.DataFrame(["Complete these sentences", "for me"], columns=["text"]) +ds = ray.data.from_pandas(prompts) +# __hf_quickstart_load_end__ + + +# __hf_quickstart_model_start__ +class HuggingFacePredictor: + def __init__(self): # <1> + from transformers import pipeline + self.model = pipeline("text-generation", model="gpt2") + + def __call__(self, batch): # <2> + return self.model(list(batch["text"]), max_length=20) +# __hf_quickstart_model_end__ + + +# __hf_quickstart_prediction_start__ +scale = ray.data.ActorPoolStrategy(2) +predictions = ds.map_batches(HuggingFacePredictor, compute=scale) + +predictions.show(limit=1) +# [{'generated_text': 'Complete these sentences until you understand them.'}] +# __hf_quickstart_prediction_end__ + +# __hf_quickstart_air_start__ +import pandas as pd +from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer +from transformers.pipelines import pipeline +from ray.train.huggingface import HuggingFacePredictor + + +tokenizer = AutoTokenizer.from_pretrained("sgugger/gpt2-like-tokenizer") +model_config = AutoConfig.from_pretrained("gpt2") +model = AutoModelForCausalLM.from_config(model_config) +pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer) + +predictor = HuggingFacePredictor(pipeline=pipeline) + +prompts = pd.DataFrame(["Complete these sentences", "for me"], columns=["sentences"]) +predictions = predictor.predict(prompts) +# __hf_quickstart_air_end__ +# fmt: on diff --git a/doc/source/data/doc_code/pytorch_quick_start.py b/doc/source/data/doc_code/pytorch_quick_start.py new file mode 100644 index 000000000000..39bcdc4f9bdc --- /dev/null +++ b/doc/source/data/doc_code/pytorch_quick_start.py @@ -0,0 +1,40 @@ +# flake8: noqa +# isort: skip_file +# fmt: off + +# __pt_quickstart_load_start__ +import ray +import numpy as np + + +dataset = ray.data.from_numpy(np.ones((1, 100))) +# __pt_quickstart_load_end__ + + +# __pt_quickstart_model_start__ +import torch +import torch.nn as nn + +class TorchPredictor: + + def __init__(self): # <1> + self.model = nn.Sequential( + nn.Linear(in_features=100, out_features=1), + nn.Sigmoid(), + ) + self.model.eval() + + def __call__(self, batch): # <2> + tensor = torch.as_tensor(batch, dtype=torch.float32) + with torch.inference_mode(): + return self.model(tensor).detach().numpy() +# __pt_quickstart_model_end__ + + +# __pt_quickstart_prediction_start__ +scale = ray.data.ActorPoolStrategy(2) +predictions = dataset.map_batches(TorchPredictor, compute=scale) +predictions.show(limit=1) +# [0.45092654] +# __pt_quickstart_prediction_end__ +# fmt: on diff --git a/doc/source/data/doc_code/tf_quick_start.py b/doc/source/data/doc_code/tf_quick_start.py new file mode 100644 index 000000000000..92885b619a89 --- /dev/null +++ b/doc/source/data/doc_code/tf_quick_start.py @@ -0,0 +1,35 @@ +# flake8: noqa +# isort: skip_file +# fmt: off + +# __tf_quickstart_load_start__ +import ray +import numpy as np + + +dataset = ray.data.from_numpy(np.ones((1, 100))) +# __tf_quickstart_load_end__ + + +# __tf_quickstart_model_start__ +class TFPredictor: + def __init__(self): # <1> + from tensorflow import keras + + input_layer = keras.Input(shape=(100,)) + output_layer = keras.layers.Dense(1, activation="sigmoid") + self.model = keras.Sequential([input_layer, output_layer]) + + def __call__(self, batch: np.ndarray): # <2> + return self.model(batch).numpy() +# __tf_quickstart_model_end__ + + +# __tf_quickstart_prediction_start__ +scale = ray.data.ActorPoolStrategy(2) + +predicted_probabilities = dataset.map_batches(TFPredictor, compute=scale) +predicted_probabilities.show(limit=1) +# [0.45119727] +# __tf_quickstart_prediction_end__ +# fmt: on diff --git a/doc/source/data/doc_code/torch_image_batch_trained.py b/doc/source/data/doc_code/torch_image_batch_trained.py new file mode 100644 index 000000000000..feb99e0f5d5a --- /dev/null +++ b/doc/source/data/doc_code/torch_image_batch_trained.py @@ -0,0 +1,58 @@ +# flake8: noqa +# isort: skip_file +# fmt: off + +# __pt_load_start__ +import ray + +data_url = "s3://anonymous@air-example-data-2/1G-image-data-synthetic-raw" # <1> +dataset = ray.data.read_images(data_url).limit(1000) # <2> +# __pt_load_end__ + +# __pt_preprocess_start__ +from typing import Dict +import numpy as np +from torchvision import transforms +from torchvision.models import ResNet18_Weights + +resnet_transforms = ResNet18_Weights.DEFAULT.transforms +transform = transforms.Compose([transforms.ToTensor(), resnet_transforms()]) # <1> + +def preprocess_images(batch: Dict[str, np.ndarray]): # <2> + transformed_images = [transform(image) for image in batch["image"]] + return transformed_images + +dataset = dataset.map_batches(preprocess_images) # <3> +# __pt_preprocess_end__ + + +# __pt_model_start__ +from typing import List +import torch +from torchvision.models import resnet18 + + +class TorchPredictor: + def __init__(self): # <1> + self.model = resnet18(pretrained=True).cuda() + self.model.eval() + + def __call__(self, batch: List[torch.Tensor]): # <2> + torch_batch = torch.stack(batch).cuda() # <3> + with torch.inference_mode(): + prediction = self.model(torch_batch) + return {"class": prediction.argmax(dim=1).detach().cpu().numpy()} # <4> +# __pt_model_end__ + + +# __pt_prediction_start__ +predictions = dataset.map_batches( + TorchPredictor, + compute=ray.data.ActorPoolStrategy(4), # <1> + num_gpus=1, # <2> +) + +predictions.show(limit=1) +# {'class': 258} +# __pt_prediction_end__ +# fmt: on diff --git a/doc/source/data/images/actor_batch_prediction.png b/doc/source/data/images/actor_batch_prediction.png new file mode 100644 index 000000000000..5922dde5893b Binary files /dev/null and b/doc/source/data/images/actor_batch_prediction.png differ diff --git a/doc/source/data/images/actor_pool_batch_prediction.png b/doc/source/data/images/actor_pool_batch_prediction.png new file mode 100644 index 000000000000..bc8999aad58d Binary files /dev/null and b/doc/source/data/images/actor_pool_batch_prediction.png differ diff --git a/doc/source/data/images/air_batch_prediction.png b/doc/source/data/images/air_batch_prediction.png new file mode 100644 index 000000000000..7741431af463 Binary files /dev/null and b/doc/source/data/images/air_batch_prediction.png differ diff --git a/doc/source/data/images/batch_inference.png b/doc/source/data/images/batch_inference.png new file mode 100644 index 000000000000..c2aba39e55c6 Binary files /dev/null and b/doc/source/data/images/batch_inference.png differ diff --git a/doc/source/data/images/batch_inference_overview.png b/doc/source/data/images/batch_inference_overview.png new file mode 100644 index 000000000000..5dd8536700f8 Binary files /dev/null and b/doc/source/data/images/batch_inference_overview.png differ diff --git a/doc/source/data/images/task_batch_prediction.png b/doc/source/data/images/task_batch_prediction.png new file mode 100644 index 000000000000..72328062a938 Binary files /dev/null and b/doc/source/data/images/task_batch_prediction.png differ diff --git a/doc/source/data/images/train_predict_pipeline.png b/doc/source/data/images/train_predict_pipeline.png new file mode 100644 index 000000000000..890b7346b7bf Binary files /dev/null and b/doc/source/data/images/train_predict_pipeline.png differ diff --git a/doc/source/data/user-guide.rst b/doc/source/data/user-guide.rst index 4888c02c3598..7acb35968480 100644 --- a/doc/source/data/user-guide.rst +++ b/doc/source/data/user-guide.rst @@ -4,8 +4,10 @@ User Guides =========== -If you’re new to Ray Data, we recommend starting with the :ref:`Ray Data Quick Start <data_getting_started>`. -This user guide will help you navigate the Ray Data project and show you how achieve several tasks. +If you’re new to Ray Datasets, we recommend starting with the +:ref:`Ray Datasets Quick Start <datasets_getting_started>`. +This user guide will help you navigate the Ray Datasets project and +show you how achieve several tasks. .. toctree:: :maxdepth: 2 @@ -13,7 +15,8 @@ This user guide will help you navigate the Ray Data project and show you how ach creating-datastreams transforming-datastreams consuming-datastreams - data-tensor-support + batch_inference + dataset-tensor-support custom-datasource data-internals performance-tips diff --git a/doc/source/ray-air/api/predictor.rst b/doc/source/ray-air/api/predictor.rst index 1c438fbbd54c..92a4c818f720 100644 --- a/doc/source/ray-air/api/predictor.rst +++ b/doc/source/ray-air/api/predictor.rst @@ -80,6 +80,7 @@ Batch Prediction API batch_predictor.BatchPredictor.predict batch_predictor.BatchPredictor.predict_pipelined +.. _air_framework_predictors: Built-in Predictors for Library Integrations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/source/ray-overview/use-cases.rst b/doc/source/ray-overview/use-cases.rst index 5a9c22c40952..bbb73e0889d6 100644 --- a/doc/source/ray-overview/use-cases.rst +++ b/doc/source/ray-overview/use-cases.rst @@ -3,7 +3,9 @@ Ray Use Cases ============= -This page indexes common Ray use cases for scaling ML. It contains highlighted references to blogs, examples, and tutorials also located elsewhere in the Ray documentation. +This page indexes common Ray use cases for scaling ML. +It contains highlighted references to blogs, examples, and tutorials also located +elsewhere in the Ray documentation. .. _ref-use-cases-llm: @@ -98,15 +100,15 @@ Learn more about how Ray scales LLMs and generative AI with the following resour Batch Inference --------------- -Batch inference refers to generating model predictions over a set of input observations. The model could be a regression model, neural network, or simply a Python function. Ray can scale batch inference from single GPU machines to large clusters. +Batch inference is the process of generating model predictions on a large "batch" of input data. +Ray for batch inference works with any cloud provider and ML framework, +and is fast and cheap for modern deep learning applications. +It scales from single machines to large clusters with minimal code changes. +As a Python-first framework, you can easily express and interactively develop your inference workloads in Ray. +To learn more about running batch inference with Ray, see the :ref:`batch inference guide<batch_inference_home>`. -Performing inference on incoming batches of data can be parallelized by exporting the architecture and weights of a trained model to the shared object store. Using these model replicas, Ray AIR's :ref:`Batch Predictor <air-predictors>` scales predictions on batches across workers. +.. figure:: batch_inference/images/batch_inference.png -.. figure:: /images/batch_inference.png - - Using Ray AIR's ``BatchPredictor`` for batch inference. - -Learn more about batch inference with the following resources. .. panels:: :container: container pb-3 @@ -116,24 +118,17 @@ Learn more about batch inference with the following resources. --- :img-top: /images/ray_logo.png - .. link-button:: https://github.com/ray-project/ray-educational-materials/blob/main/Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb - :type: url - :text: [Tutorial] Architectures for Scalable Batch Inference with Ray - :classes: btn-link btn-block stretched-link scalableBatchInference + .. link-button:: /data/batch-inference + :type: ref + :text: [User Guide] Batch Inference with Ray Data + :classes: btn-link btn-block stretched-link --- :img-top: /images/ray_logo.png - .. link-button:: https://www.anyscale.com/blog/model-batch-inference-in-ray-actors-actorpool-and-datasets + .. link-button:: https://github.com/ray-project/ray-educational-materials/blob/main/Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb :type: url - :text: [Blog] Batch Inference in Ray: Actors, ActorPool, and Datasets - :classes: btn-link btn-block stretched-link batchActorPool - --- - :img-top: /images/ray_logo.png - - .. link-button:: /ray-core/examples/batch_prediction - :type: ref - :text: [Example] Batch Prediction using Ray Core - :classes: btn-link btn-block stretched-link batchCore + :text: [Tutorial] Architectures for Scalable Batch Inference with Ray + :classes: btn-link btn-block stretched-link scalableBatchInference --- :img-top: /images/ray_logo.png @@ -150,6 +145,7 @@ Learn more about batch inference with the following resources. :text: [Example] Batch OCR processing using Ray Data :classes: btn-link btn-block stretched-link batchOcr + .. _ref-use-cases-mmt: Many Model Training