diff --git a/doc/source/data/api/datastream.rst b/doc/source/data/api/datastream.rst
index 1963ce142f2b..e4c789bca050 100644
--- a/doc/source/data/api/datastream.rst
+++ b/doc/source/data/api/datastream.rst
@@ -138,9 +138,10 @@ Execution
 ---------
 
 .. autosummary::
-   :toctree: doc/
+    :toctree: doc/
 
-   Datastream.materialize
+    Datastream.materialize
+    ActorPoolStrategy
 
 Serialization
 -------------
diff --git a/doc/source/data/batch_inference.rst b/doc/source/data/batch_inference.rst
new file mode 100644
index 000000000000..5ace6a8bb45f
--- /dev/null
+++ b/doc/source/data/batch_inference.rst
@@ -0,0 +1,751 @@
+.. _batch_inference_home:
+
+Running Batch Inference with Ray
+================================
+
+.. note::
+
+    In this tutorial you'll learn what batch inference is, why you might want to use
+    Ray for it, and how to use Ray effectively for this task.
+    If you are familiar with the basics of inference tasks, jump straight to
+    code in the :ref:`quickstart section <batch_inference_quickstart>` or the
+    :ref:`advanced guide<batch_inference_advanced_pytorch_example>`.
+
+Batch inference refers to generating model predictions on a set of input data.
+The model can range from a simple Python function to a complex neural network.
+In batch inference, also known as offline inference, your model is run on a large
+batch of data on demand.
+This is in contrast to online inference, where the model is run immediately on a
+data point when it becomes available.
+
+Here's a simple schematic of batch inference, "mapping" batches to predictions
+via model inference:
+
+.. figure:: images/batch_inference.png
+
+  Evaluating a batch of input data with a model to get predictions.
+
+Batch inference is a foundational workload for many AI companies, especially since
+more and more pre-trained models become available.
+And while batch inference looks simple at the surface, it can be challenging to do right in production.
+For instance, your data batches can be excessively large, too slow to process sequentially,
+or might need custom preprocessing before being fed into your models.
+To run inference workloads effectively at scale, you need to:
+
+- manage your compute infrastructure and cloud clusters
+- parallelize data processing and utilize all your cluster resources (CPUs and GPUs)
+- efficiently transfer data between cloud storage, CPUs for preprocessing, and GPUs for model inference
+
+Here's a realistic view of batch inference for modern AI applications:
+
+.. figure:: images/batch_inference_overview.png
+
+  Evaluating a batch of input data with a model to get predictions.
+
+Why use Ray for batch inference?
+---------------------------------
+
+There are reasons to use Ray for batch inference, even if your current
+use case does not require scaling yet:
+
+1. **Faster and Cheaper for modern Deep Learning Applications**:
+    Ray is built for
+    complex workloads and supports loading and preprocessing data with CPUs and model inference on GPUs.
+2. **Cloud, framework, and data format agnostic**:
+    Ray Data works on any cloud provider or
+    any ML framework (like PyTorch and Tensorflow) and does not require a particular file format.
+3. **Out of the box scaling**:
+    The same code that works on one machine also runs on a
+    large cluster without any changes.
+4. **Python first**:
+    You can express your inference job directly in Python instead of
+    YAML files or other formats.
+
+.. _batch_inference_quickstart:
+
+Quick Start
+-----------
+
+Install Ray with the data processing library, Ray Data:
+
+.. code-block:: bash
+
+    pip install ray[data]
+
+Running batch inference is conceptually easy and requires three steps:
+
+1. Load your data into a Ray dataset and optionally apply any preprocessing you need.
+2. Define your model for inference.
+3. Run inference on your data by using the :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
+   method from Ray Data.
+
+The last step also defines how your batch processing job gets distributed across your (local) cluster.
+We start with very simple use cases here and build up to more complex ones in other guides and tutorials.
+
+.. note::
+
+    All advanced use cases ultimately boil down to extensions of the above three steps,
+    like loading and storing data from cloud storage, using complex preprocessing functions,
+    demanding model setups, additional postprocessing, or other customizations.
+    We'll cover these advanced use cases in the next sections.
+
+1. Loading and preprocessing data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For this quick start guide we use very small, in-memory data sets by
+leveraging common Python libraries like NumPy and Pandas.
+In general, once you load your datasets using Ray Data, you also want to apply some preprocessing steps.
+We skip this step here for simplicity.
+In any case, the result of this step is a Ray Datastream ``ds`` that we can use to run inference on.
+
+.. margin::
+
+    For larger data sets, you can use Ray Data to load data from cloud storage like S3 or GCS.
+    We'll cover this later on.
+
+.. tabs::
+
+    .. group-tab:: HuggingFace
+
+        Create a Pandas
+        DataFrame with text data to run a GPT-2 model on.
+
+        .. literalinclude:: ./doc_code/hf_quick_start.py
+            :language: python
+            :start-after: __hf_quickstart_load_start__
+            :end-before: __hf_quickstart_load_end__
+
+    .. group-tab:: PyTorch
+
+        Create a NumPy array with 100
+        entries, which represents the input to a feed-forward neural network.
+
+        .. literalinclude:: ./doc_code/pytorch_quick_start.py
+            :language: python
+            :start-after: __pt_quickstart_load_start__
+            :end-before: __pt_quickstart_load_end__
+
+    .. group-tab:: TensorFlow
+
+       Create a NumPy array with 100
+        entries, which represents the input to a feed-forward neural network.
+
+        .. literalinclude:: ./doc_code/tf_quick_start.py
+            :language: python
+            :start-after: __tf_quickstart_load_start__
+            :end-before: __tf_quickstart_load_end__
+
+2. Setting up your model
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Next, you want to set up your model for inference, by defining a predictor.
+The core idea is to define a class that loads your model in its ``__init__`` method and
+and implements a ``__call__`` method that takes a batch of data and returns a batch of predictions.
+Below you find examples for PyTorch, TensorFlow, and HuggingFace.
+
+.. tabs::
+
+    .. group-tab:: HuggingFace
+
+        .. callout::
+
+            .. literalinclude:: ./doc_code/hf_quick_start.py
+                :language: python
+                :start-after: __hf_quickstart_model_start__
+                :end-before: __hf_quickstart_model_end__
+
+            .. annotations::
+                <1> Use the constructor (``__init__``) to initialize your model.
+
+                <2> The ``__call__`` method runs inference on a batch of data.
+
+    .. group-tab:: PyTorch
+
+        .. callout::
+
+            .. literalinclude:: ./doc_code/pytorch_quick_start.py
+                :language: python
+                :start-after: __pt_quickstart_model_start__
+                :end-before: __pt_quickstart_model_end__
+
+            .. annotations::
+                <1> Use the constructor (``__init__``) to initialize your model.
+
+                <2> The ``__call__`` method runs inference on a batch of data.
+
+
+    .. group-tab:: TensorFlow
+
+        .. callout::
+
+            .. literalinclude:: ./doc_code/tf_quick_start.py
+                :language: python
+                :start-after: __tf_quickstart_model_start__
+                :end-before: __tf_quickstart_model_end__
+
+            .. annotations::
+                <1> Use the constructor (``__init__``) to initialize your model.
+
+                <2> The ``__call__`` method runs inference on a batch of data.
+
+
+3. Getting predictions with Ray Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Once you have your Ray Datastream ``ds`` and your predictor class, you can use
+:meth:`ds.map_batches() <ray.data.Dataset.map_batches>` to get predictions.
+``map_batches`` takes your predictor class as an argument and allows you to specify
+``compute`` resources by defining the :class:`ActorPoolStrategy <ray.data.ActorPoolStrategy>`.
+In the example below, we use two CPUs to run inference in parallel and then print the results.
+We cover resource allocation in more detail in :ref:`the configuration section of this guide <batch_inference_config>`.
+
+.. tabs::
+
+    .. group-tab:: HuggingFace
+
+        .. literalinclude:: ./doc_code/hf_quick_start.py
+            :language: python
+            :start-after: __hf_quickstart_prediction_start__
+            :end-before: __hf_quickstart_prediction_end__
+
+    .. group-tab:: PyTorch
+
+        .. literalinclude:: ./doc_code/pytorch_quick_start.py
+            :language: python
+            :start-after: __pt_quickstart_prediction_start__
+            :end-before: __pt_quickstart_prediction_end__
+
+    .. group-tab:: TensorFlow
+
+        .. literalinclude:: ./doc_code/tf_quick_start.py
+            :language: python
+            :start-after: __tf_quickstart_prediction_start__
+            :end-before: __tf_quickstart_prediction_end__
+
+.. _batch_inference_advanced_pytorch_example:
+
+Advanced batch inference guide
+------------------------------
+
+ Let's use batch inference on a pre-trained PyTorch model for image classification
+to illustrate advanced concepts of batch processing with Ray.
+
+.. important::
+
+    If you want to dive right into example use cases next, consider reading the following
+    tutorials next:
+
+    .. panels::
+        :container: container pb-3
+        :column: col-md-3 px-1 py-1
+        :img-top-cls: p-2 w-75 d-block mx-auto fixed-height-img
+
+        ---
+        :img-top: /images/ray_logo.png
+
+        .. link-button:: /data/examples/ocr_example
+            :type: ref
+            :text: Batch OCR processing using Ray Data
+            :classes: btn-link btn-block stretched-link
+
+        ---
+        :img-top: /images/ray_logo.png
+
+        .. link-button:: /data/examples/torch_detection
+            :type: ref
+            :text: Fine-tuning an Object Detection Model and using it for Batch Inference
+            :classes: btn-link btn-block stretched-link
+
+        ---
+        :img-top: /images/ray_logo.png
+
+        .. link-button:: /data/examples/torch_image_example
+            :type: ref
+            :text: Training an Image Classifier and using it for Batch Inference
+            :classes: btn-link btn-block stretched-link
+
+
+Loading data with Ray Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+In the quick start guide we glossed over the details of loading data with Ray Data.
+Your data might be stored in a variety of formats, and you might want to load it from different sources.
+Ray Data supports multiple formats and sources out of the box.
+The :ref:`guide to creating datasets <creating_datasets>` is the ultimate resource
+to learn more about loading data with Ray Data, but we'll cover the basics here, too.
+
+.. hint::
+
+    With Ray Data, you can :ref:`create synthetic data in Python<dataset_generate_data>`,
+    :ref:`load data from various storage solutions<dataset_reading_from_storage>` such as S3,
+    HDFS, or GCS, using common formats such as CSV, JSON, Text, Images, Binary,
+    TFRecords, Parquet, and more. Ray Data also supports reading from common SQL and NoSQL
+    databases, and allows you to define your own, custom data sources.
+
+    You can also read :ref:`common Python library formats <dataset_from_in_memory_data_single_node>`
+    such as Pandas, NumPy, Arrow, or plain Python objects, as well as from
+    :ref:`distributed data processing frameworks <dataset_from_in_memory_data_distributed>`
+    such as Spark, Dask, Modin, or Mars.
+
+    Of course, Ray Data also supports :ref:`reading data from common ML frameworks <dataset_from_torch_tf>`
+    like PyTorch, TensorFlow or HuggingFace.
+
+.. callout::
+
+    .. literalinclude:: ./doc_code/torch_image_batch_trained.py
+        :language: python
+        :start-after: __pt_load_start__
+        :end-before: __pt_load_end__
+
+    .. annotations::
+        <1> We use one gigabyte of image data from the Imagenet dataset from S3.
+
+        <2> We use ``read_images`` from Ray Data and limit the number of images to 1000.
+
+The process of loading data with Ray Data is as diverse as the data you have.
+For instance, in the example above we didn't load the text labels for our images,
+which would require a different data source and loading function.
+For any advanced use cases, we recommend you read the
+:ref:`guide to creating datasets <creating_datasets>`.
+
+Preprocessing with Ray Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+After loading your data, it often needs to be preprocessed prior to inference.
+This may include cropping or resizing images, or tokenizing raw text.
+
+To introduce common terminology, with :ref:`Ray Data <datasets>` you can define
+:term:`user-defined functions<User-defined function (UDF)>` (UDFs) that transform batches of your data.
+As you've seen before, applying these UDFs via
+:meth:`ds.map_batches() <ray.data.Dataset.map_batches>` outputs a new, transformed dataset.
+
+.. note::
+
+    The way we do preprocessing here is conceptually close to how we do batch
+    inference, and we use the same :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
+    call from Ray Data to run this task.
+    The main difference is that we don't use a machine learning model to transform our data,
+    which has some practical consequences. For instance, in the example below we simply
+    define a map function that we pass into ``map_batches``, and not a class.
+
+To transform our raw images loaded from S3 in the last step, we use functionality from
+the ``torchvision`` package to define a UDF called ``preprocess_images``.
+
+.. callout::
+
+    .. literalinclude:: ./doc_code/torch_image_batch_trained.py
+        :language: python
+        :start-after: __pt_preprocess_start__
+        :end-before: __pt_preprocess_end__
+
+    .. annotations::
+        <1> We compose PyTorch tensor creation with image preprocessing, so that our processed images "fit" into a ``ResNet18`` PyTorch model.
+
+        <2> We then define a simple UDF to transform batches of raw data accordingly. Note that these batches come as dictionaries of NumPy images stored in the ``"images"`` key.
+
+        <3> Finally, we apply the UDF to our dataset using ``map_batches``.
+
+.. tip::
+
+    For the full suite of transformations available in Ray Data, read
+    :ref:`the data transformation guide <transforming_datasets>`.
+
+.. caution::
+
+    Depending on how you load your data and what input data format you use, the dataset
+    loaded with :ref:`Ray Data <datasets>` will have different *batch formats*.
+    For instance, image data might be naturally stored in NumPy format, while tabular
+    data makes much more sense as a Pandas DataFrame.
+    What (default) batch format your data has and how to deal with it is explained in
+    detail in :ref:`the batch format section <batch_inference_formats>`.
+
+Defining predictors as stateful UDFs
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One of the key value adds of Ray over other distributed systems is the support for
+distributed stateful operations. These stateful operations are especially useful
+for inference since the model only needs to be initialized once, instead of per batch.
+
+.. margin::
+
+    In short, running model inference means applying
+    :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
+    to a dataset with a trained model as a UDF.
+
+You've already seen how to do this in the quickstart section of this guide, but now
+that you're equipped with more knowledge, let's have a look at how to define a
+stateful UDF with Ray for our pretrained ResNet model:
+
+.. callout::
+
+    .. literalinclude:: ./doc_code/torch_image_batch_trained.py
+        :language: python
+        :start-after: __pt_model_start__
+        :end-before: __pt_model_end__
+
+    .. annotations::
+        <1> The ``__init__`` method is used to initialize the model once. Ray takes care of distributing and managing this state for our batch processing task.
+
+        <2> The ``__call__`` method is used to apply the model to a batch of data.
+
+        <3> We're free to use any custom code in a stateful UDF, and here we prepare the data to run on GPUs.
+
+        <4> Finally, we return the ``"class"`` key of the model predictions as Numpy array.
+
+
+Scalable inference with Ray Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get predictions, we call :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`,
+by making sure to specify a :class:`ActorPoolStrategy <ray.data.ActorPoolStrategy>`
+which defines how many workers to use for inference.
+
+.. callout::
+
+    .. literalinclude:: ./doc_code/torch_image_batch_trained.py
+        :language: python
+        :start-after: __pt_prediction_start__
+        :end-before: __pt_prediction_end__
+
+    .. annotations::
+        <1> In this example we use a total of four Ray Actors to run inference on our dataset.
+
+        <2> Each actor should use one GPU.
+
+To summarize, mapping a UDF over batches is the simplest transform for Ray Datastreams.
+The UDF defines the logic for transforming individual batches of data of the dataset
+Performing operations over batches of data is more performant than single element
+operations as it can leverage the underlying vectorization capabilities of Pandas or NumPy.
+
+
+.. note::
+
+    You can use :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` on functions, too.
+    This is mostly useful for quick transformations of your data that doesn't require
+    an ML model or other stateful objects.
+    To handle state, using classes like we did above is the recommended way.
+    In the dropdown below you find an example of mapping data with a simple Python
+    function.
+
+    .. dropdown:: Example using ``map_batches`` with functions
+
+        This example transforms example data using a simple Python function.
+        The ``map_function`` uses the fact that our ``data`` batches in this particular
+        example are Pandas dataframes.
+        Note that by using a map function instead of a class, we don't have to define
+        :class:`ActorPoolStrategy <ray.data.ActorPoolStrategy>` to specify compute resources.
+
+        .. literalinclude:: ./doc_code/batch_formats.py
+            :language: python
+            :start-after: __simple_map_function_start__
+            :end-before: __simple_map_function_end__
+
+.. _batch_inference_formats:
+
+Working with batch formats
+--------------------------
+
+Now that you've seen examples of batch inference with Ray, let's have a closer look
+at how to deal with different data formats.
+First of all, you need to distinguish between two types of batch formats:
+
+- Input batch formats: This is the format of the input to your UDFs. You will often have to
+  refer to the right format name to run batch inference on your data.
+- Output batch formats: This is the format your UDFs return.
+
+In many standard cases, the input batch format is the same as the output batch format,
+but it's good to be aware of the differences.
+
+.. margin::
+    We refer to batch formats by name in Ray Data (using strings).
+    For instance, the batch format used to represent Pandas dataframes is called ``"pandas"``.
+    We often use batch format names and the libraries they represent interchangeably.
+
+Let's focus on the three available input batch formats first,
+namely Pandas, NumPy, and Arrow, and how they're used in Ray Data:
+
+.. tabbed:: Pandas
+
+  The ``"pandas"`` batch format presents batches in
+  `pandas.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__
+  format. If converting a simple dataset to Pandas DataFrame batches, a single-column
+  dataframe with the column ``"__value__"`` will be created.
+
+  .. literalinclude:: ./doc_code/batch_formats.py
+    :language: python
+    :start-after: __simple_pandas_start__
+    :end-before: __simple_pandas_end__
+
+.. tabbed:: NumPy
+
+  The ``"numpy"`` batch format presents batches in
+  `numpy.ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`__
+  format as follows:
+
+  * **Tabular datasets**: Each batch will be a dictionary of NumPy
+    ndarrays (``Dict[str, np.ndarray]``), with each key-value pair representing a column
+    in the table.
+
+  * **Tensor datasets** (single-column): Each batch will be a single
+    `numpy.ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`__
+    containing the single tensor column for this batch.
+
+  * **Simple datasets**: Each batch will be a single NumPy ndarray, where Ray Data will
+    attempt to convert each list-batch to an ndarray.
+
+  .. literalinclude:: ./doc_code/batch_formats.py
+    :language: python
+    :start-after: __simple_numpy_start__
+    :end-before: __simple_numpy_end__
+
+.. tabbed:: Arrow
+
+    The ``"pyarrow"`` batch format presents batches in ``pyarrow.Table`` format.
+    If converting a simple dataset to Arrow Table batches, a single-column table
+    with the column ``"__value__"`` will be created.
+
+    .. literalinclude:: ./doc_code/batch_formats.py
+        :language: python
+        :start-after: __simple_pyarrow_start__
+        :end-before: __simple_pyarrow_end__
+
+When defining the return value of your UDF, you can choose between
+Pandas dataframes (``pandas.DataFrame``), NumPy arrays (``numpy.ndarray``), Arrow tables
+(``pyarrow.Table``), dictionaries of NumPy arrays (``Dict[str, np.ndarray]``) or simple
+Python lists (``list``).
+You can learn more about output formats in :ref:`the output format guide<transform_datasets_batch_output_types>`.
+
+.. important::
+
+    No matter which batch format you use, you will always have to be familiar with
+    the underlying APIs used to represent your data. For instance, if you use the
+    ``"pandas"`` batch format, you will need to know the basics of interacting with
+    dataframes to make your batch inference jobs work.
+
+Default data formats
+~~~~~~~~~~~~~~~~~~~~
+
+In all the examples we've seen so far, we didn't have to specify the batch format.
+In fact, the format is inferred from the input dataset, which can be straightforward.
+For instance, when loading a NumPy array with :meth:`ray.data.from_numpy() <ray.data.from_numpy>`,
+the batch format will be ``"numpy"``, but it's not always that easy.
+
+In any case, Ray Data has a ``"default"`` batch format that is computed per data type
+as follows:
+
+.. tabbed:: Tabular data
+
+    Each batch will be a
+    `pandas.DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html>`__.
+    This may incur a conversion cost if the underlying Datastream block is not
+    zero-copy convertible from an Arrow table.
+
+    .. literalinclude:: ./doc_code/transforming_datastreams.py
+        :language: python
+        :start-after: __writing_default_udfs_tabular_begin__
+        :end-before: __writing_default_udfs_tabular_end__
+
+.. tabbed:: Tensor data (single-column)
+
+    Each batch will be a single
+    `numpy.ndarray <https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html>`__
+    containing the single tensor column for this batch.
+
+    .. literalinclude:: ./doc_code/transforming_datastreams.py
+        :language: python
+        :start-after: __writing_default_udfs_tensor_begin__
+        :end-before: __writing_default_udfs_tensor_end__
+
+.. tabbed:: Simple data
+
+    Each batch will be a Python list.
+
+    .. literalinclude:: ./doc_code/transforming_datastreams.py
+        :language: python
+        :start-after: __writing_default_udfs_list_begin__
+        :end-before: __writing_default_udfs_list_end__
+
+
+.. seealso::
+
+    As we've discussed in this guide, using :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
+    on a class defining your model
+    should be your default choice for running inference with Ray.
+    For instance, if you're already using the Ray AIR framework for running your ML workflows,
+    you may want to use the
+    :ref:`framework-specific batch predictor implementations<air_framework_predictors>`.
+
+    To see an extension of the quick start example using an AIR
+    ``HuggingFacePredictor``, see the following example:
+
+    .. dropdown:: Batch inference example with HuggingFace and Ray AIR
+
+        .. literalinclude:: ./doc_code/hf_quick_start.py
+            :language: python
+            :start-after: __hf_quickstart_air_start__
+            :end-before: __hf_quickstart_air_end__
+
+
+.. _batch_inference_config:
+Configuration & Troubleshooting
+-------------------------------
+
+Configuring Batch Size
+~~~~~~~~~~~~~~~~~~~~~~
+
+An important parameter to set for :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
+is ``batch_size``, which controls the size of the batches provided to the UDF.
+Here's a simple example of loading the IRIS dataset (which has Pandas format by default)
+and processing it with a batch size of `10`:
+
+.. literalinclude:: ./doc_code/batch_formats.py
+  :language: python
+  :start-after: __simple_map_function_start__
+  :end-before: __simple_map_function_end__
+
+Increasing ``batch_size`` can result in faster execution by better leveraging vectorized
+operations and hardware, reducing batch slicing and concatenation overhead, and overall
+saturation of CPUs or GPUs.
+On the other hand, this will also result in higher memory utilization, which can
+lead to out-of-memory (OOM) failures.
+If encountering OOMs, decreasing your ``batch_size`` may help.
+
+.. caution::
+  The default ``batch_size`` of ``4096`` may be too large for datasets with large rows
+  (e.g. tables with many columns or a collection of large images).
+
+Using GPUs in batch inference
+-----------------------------
+
+To use GPUs for inference, first pdate the callable class implementation to
+move the model and data to and from Cuda device.
+Here's a quick example for a PyTorch model:
+
+.. code-block:: diff
+
+    from torchvision.models import resnet18
+
+    class TorchModel:
+        def __init__(self):
+            self.model = resnet18(pretrained=True)
+    +       self.model = self.model.cuda()
+            self.model.eval()
+
+        def __call__(self, batch: List[torch.Tensor]):
+            torch_batch = torch.stack(batch)
+    +       torch_batch = torch_batch.cuda()
+            with torch.inference_mode():
+                prediction = self.model(torch_batch)
+    -           return {"class": prediction.argmax(dim=1).detach().numpy()}
+    +           return {"class": prediction.argmax(dim=1).detach().cpu().numpy()}
+
+
+Next, specify ``num_gpus=N`` in :meth:`ds.map_batches() <ray.data.Dataset.map_batches>`
+to indicate that each inference worker should use ``N`` GPUs.
+
+.. code-block:: diff
+
+    predictions = dataset.map_batches(
+        TorchModel,
+        compute=ray.data.ActorPoolStrategy(size=2),
+    +   num_gpus=1
+    )
+
+**How should I configure num_cpus and num_gpus for my model?**
+
+By default, Ray will assign 1 CPU per task or actor. For example, on a machine
+with 16 CPUs, this will result in 16 tasks or actors running concurrently for inference.
+To change this, you can specify ``num_cpus=N``, which will tell Ray to reserve more CPUs
+for the task or actor, or ``num_gpus=N``, which will tell Ray to reserve/assign GPUs
+(GPUs will be assigned via `CUDA_VISIBLE_DEVICES` env var).
+
+.. code-block:: python
+
+    # Use 16 actors, each of which is assigned 1 GPU (16 GPUs total).
+    ds = ds.map_batches(
+        MyFn,
+        compute=ActorPoolStrategy(size=16),
+        num_gpus=1
+    )
+
+    # Use 16 actors, each of which is reserved 8 CPUs (128 CPUs total).
+    ds = ds.map_batches(
+        MyFn,
+        compute=ActorPoolStrategy(size=16),
+        num_cpus=8)
+
+
+**How should I deal with OOM errors due to heavy model memory usage?**
+
+It's common for models to consume a large amount of heap memory. For example, if a model
+uses 5GB of RAM when created / run, and a machine has 16GB of RAM total, then no more
+than three of these models can be run at the same time. The default resource assignments
+of one CPU per task/actor will likely lead to OutOfMemoryErrors from Ray in this situation.
+
+Let's suppose our machine has 16GiB of RAM and 8 GPUs. To tell Ray to construct at most
+3 of these actors per node, we can override the CPU or memory:
+
+.. code-block:: python
+
+    # Require 5 CPUs per actor (so at most 3 can fit per 16 CPU node).
+    ds = ds.map_batches(MyFn,
+    compute=ActorPoolStrategy(size=16), num_cpus=5)
+
+Learn more
+----------
+
+
+Batch inference is just one small part of the Machine Learning workflow, and only
+a fraction of what Ray can do.
+
+.. figure:: images/train_predict_pipeline.png
+
+  How batch inference fits into the bigger picture of training and prediction AI models.
+
+To learn more about Ray and batch inference, check out the following
+tutorials and examples:
+
+.. panels::
+    :container: container pb-3
+    :column: col-md-3 px-1 py-1
+    :img-top-cls: p-2 w-75 d-block mx-auto fixed-height-img
+
+    ---
+    :img-top: /images/ray_logo.png
+
+    .. link-button:: https://github.com/ray-project/ray-educational-materials/blob/main/Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb
+        :type: url
+        :text: Scalable Batch Inference with Ray for Semantic Segmentation
+        :classes: btn-link btn-block stretched-link
+
+    ---
+    :img-top: /images/ray_logo.png
+
+    .. link-button:: /data/examples/nyc_taxi_basic_processing
+        :type: ref
+        :text: Batch Inference on NYC taxi data using Ray Data
+        :classes: btn-link btn-block stretched-link
+
+    ---
+    :img-top: /images/ray_logo.png
+
+    .. link-button:: /data/examples/ocr_example
+        :type: ref
+        :text: Batch OCR processing using Ray Data
+        :classes: btn-link btn-block stretched-link
+
+    ---
+    :img-top: /images/ray_logo.png
+
+    .. link-button:: /data/examples/torch_detection
+        :type: ref
+        :text: Fine-tuning an Object Detection Model and using it for Batch Inference
+        :classes: btn-link btn-block stretched-link
+
+    ---
+    :img-top: /images/ray_logo.png
+
+    .. link-button:: /data/examples/torch_image_example
+        :type: ref
+        :text: Training an Image Classifier and using it for Batch Inference
+        :classes: btn-link btn-block stretched-link
diff --git a/doc/source/data/doc_code/batch_formats.py b/doc/source/data/doc_code/batch_formats.py
new file mode 100644
index 000000000000..8dc1136e6124
--- /dev/null
+++ b/doc/source/data/doc_code/batch_formats.py
@@ -0,0 +1,66 @@
+# flake8: noqa
+# isort: skip_file
+# fmt: off
+
+# __simple_map_function_start__
+import ray
+
+ds = ray.data.read_csv("example://iris.csv")
+
+def map_function(data):
+    return data[data["sepal.length"] < 5]
+
+transformed = ds.map_batches(map_function, batch_size=10)
+# __simple_map_function_end__
+
+# __simple_pandas_start__
+import ray
+import pandas as pd
+
+ds = ray.data.read_csv("example://iris.csv")
+ds.show(1)
+# -> {'sepal.length': 5.1, ..., 'petal.width': 0.2, 'variety': 'Setosa'}
+
+ds.default_batch_format()
+# pandas.core.frame.DataFrame
+
+def transform_pandas(df_batch: pd.DataFrame) -> pd.DataFrame:
+    df_batch = df_batch[df_batch["variety"] == "Versicolor"]
+    df_batch.loc[:, "normalized.sepal.length"] = df_batch["sepal.length"] / df_batch["sepal.length"].max()
+    df_batch = df_batch.drop(columns=["sepal.length"])
+    return df_batch
+
+ds.map_batches(transform_pandas).show(1)
+# -> {..., 'variety': 'Versicolor', 'normalized.sepal.length': 1.0}
+# __simple_pandas_end__
+
+# __simple_numpy_start__
+import ray
+import numpy as np
+
+ds = ray.data.range_tensor(1000, shape=(2, 2))
+ds.default_batch_format()
+# 'numpy.ndarray'
+
+def transform_numpy(arr: np.ndarray) -> np.ndarray:
+    return arr * 2
+
+ds.map_batches(transform_numpy)
+# __simple_numpy_end__
+
+
+# __simple_pyarrow_start__
+import ray
+import pyarrow as pa
+import pyarrow.compute as pac
+
+ds = ray.data.read_csv("example://iris.csv")
+
+def transform_pyarrow(batch: pa.Table) -> pa.Table:
+    batch = batch.filter(pac.equal(batch["variety"], "Versicolor"))
+    return batch.drop(["sepal.length"])
+
+ds.map_batches(transform_pyarrow, batch_format="pyarrow").show(1)
+# -> {'sepal.width': 3.2, ..., 'variety': 'Versicolor'}
+# __simple_pyarrow_end__
+# fmt: on
diff --git a/doc/source/data/doc_code/hf_quick_start.py b/doc/source/data/doc_code/hf_quick_start.py
new file mode 100644
index 000000000000..c9de271ad4ec
--- /dev/null
+++ b/doc/source/data/doc_code/hf_quick_start.py
@@ -0,0 +1,51 @@
+# flake8: noqa
+# isort: skip_file
+# fmt: off
+
+# __hf_quickstart_load_start__
+import ray
+import pandas as pd
+
+
+prompts = pd.DataFrame(["Complete these sentences", "for me"], columns=["text"])
+ds = ray.data.from_pandas(prompts)
+# __hf_quickstart_load_end__
+
+
+# __hf_quickstart_model_start__
+class HuggingFacePredictor:
+    def __init__(self):  # <1>
+        from transformers import pipeline
+        self.model = pipeline("text-generation", model="gpt2")
+
+    def __call__(self, batch):  # <2>
+        return self.model(list(batch["text"]), max_length=20)
+# __hf_quickstart_model_end__
+
+
+# __hf_quickstart_prediction_start__
+scale = ray.data.ActorPoolStrategy(2)
+predictions = ds.map_batches(HuggingFacePredictor, compute=scale)
+
+predictions.show(limit=1)
+# [{'generated_text': 'Complete these sentences until you understand them.'}]
+# __hf_quickstart_prediction_end__
+
+# __hf_quickstart_air_start__
+import pandas as pd
+from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
+from transformers.pipelines import pipeline
+from ray.train.huggingface import HuggingFacePredictor
+
+
+tokenizer = AutoTokenizer.from_pretrained("sgugger/gpt2-like-tokenizer")
+model_config = AutoConfig.from_pretrained("gpt2")
+model = AutoModelForCausalLM.from_config(model_config)
+pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)
+
+predictor = HuggingFacePredictor(pipeline=pipeline)
+
+prompts = pd.DataFrame(["Complete these sentences", "for me"], columns=["sentences"])
+predictions = predictor.predict(prompts)
+# __hf_quickstart_air_end__
+# fmt: on
diff --git a/doc/source/data/doc_code/pytorch_quick_start.py b/doc/source/data/doc_code/pytorch_quick_start.py
new file mode 100644
index 000000000000..39bcdc4f9bdc
--- /dev/null
+++ b/doc/source/data/doc_code/pytorch_quick_start.py
@@ -0,0 +1,40 @@
+# flake8: noqa
+# isort: skip_file
+# fmt: off
+
+# __pt_quickstart_load_start__
+import ray
+import numpy as np
+
+
+dataset = ray.data.from_numpy(np.ones((1, 100)))
+# __pt_quickstart_load_end__
+
+
+# __pt_quickstart_model_start__
+import torch
+import torch.nn as nn
+
+class TorchPredictor:
+
+    def __init__(self):  # <1>
+        self.model = nn.Sequential(
+            nn.Linear(in_features=100, out_features=1),
+            nn.Sigmoid(),
+        )
+        self.model.eval()
+
+    def __call__(self, batch):  # <2>
+        tensor = torch.as_tensor(batch, dtype=torch.float32)
+        with torch.inference_mode():
+            return self.model(tensor).detach().numpy()
+# __pt_quickstart_model_end__
+
+
+# __pt_quickstart_prediction_start__
+scale = ray.data.ActorPoolStrategy(2)
+predictions = dataset.map_batches(TorchPredictor, compute=scale)
+predictions.show(limit=1)
+# [0.45092654]
+# __pt_quickstart_prediction_end__
+# fmt: on
diff --git a/doc/source/data/doc_code/tf_quick_start.py b/doc/source/data/doc_code/tf_quick_start.py
new file mode 100644
index 000000000000..92885b619a89
--- /dev/null
+++ b/doc/source/data/doc_code/tf_quick_start.py
@@ -0,0 +1,35 @@
+# flake8: noqa
+# isort: skip_file
+# fmt: off
+
+# __tf_quickstart_load_start__
+import ray
+import numpy as np
+
+
+dataset = ray.data.from_numpy(np.ones((1, 100)))
+# __tf_quickstart_load_end__
+
+
+# __tf_quickstart_model_start__
+class TFPredictor:
+    def __init__(self):  # <1>
+        from tensorflow import keras
+
+        input_layer = keras.Input(shape=(100,))
+        output_layer = keras.layers.Dense(1, activation="sigmoid")
+        self.model = keras.Sequential([input_layer, output_layer])
+
+    def __call__(self, batch: np.ndarray):  # <2>
+        return self.model(batch).numpy()
+# __tf_quickstart_model_end__
+
+
+# __tf_quickstart_prediction_start__
+scale = ray.data.ActorPoolStrategy(2)
+
+predicted_probabilities = dataset.map_batches(TFPredictor, compute=scale)
+predicted_probabilities.show(limit=1)
+# [0.45119727]
+# __tf_quickstart_prediction_end__
+# fmt: on
diff --git a/doc/source/data/doc_code/torch_image_batch_trained.py b/doc/source/data/doc_code/torch_image_batch_trained.py
new file mode 100644
index 000000000000..feb99e0f5d5a
--- /dev/null
+++ b/doc/source/data/doc_code/torch_image_batch_trained.py
@@ -0,0 +1,58 @@
+# flake8: noqa
+# isort: skip_file
+# fmt: off
+
+# __pt_load_start__
+import ray
+
+data_url = "s3://anonymous@air-example-data-2/1G-image-data-synthetic-raw"  # <1>
+dataset = ray.data.read_images(data_url).limit(1000)  # <2>
+# __pt_load_end__
+
+# __pt_preprocess_start__
+from typing import Dict
+import numpy as np
+from torchvision import transforms
+from torchvision.models import ResNet18_Weights
+
+resnet_transforms = ResNet18_Weights.DEFAULT.transforms
+transform = transforms.Compose([transforms.ToTensor(), resnet_transforms()])  # <1>
+
+def preprocess_images(batch: Dict[str, np.ndarray]):  # <2>
+    transformed_images = [transform(image) for image in batch["image"]]
+    return transformed_images
+
+dataset = dataset.map_batches(preprocess_images)  # <3>
+# __pt_preprocess_end__
+
+
+# __pt_model_start__
+from typing import List
+import torch
+from torchvision.models import resnet18
+
+
+class TorchPredictor:
+    def __init__(self):  # <1>
+        self.model = resnet18(pretrained=True).cuda()
+        self.model.eval()
+
+    def __call__(self, batch: List[torch.Tensor]):  # <2>
+        torch_batch = torch.stack(batch).cuda()  # <3>
+        with torch.inference_mode():
+            prediction = self.model(torch_batch)
+            return {"class": prediction.argmax(dim=1).detach().cpu().numpy()}  # <4>
+# __pt_model_end__
+
+
+# __pt_prediction_start__
+predictions = dataset.map_batches(
+    TorchPredictor,
+    compute=ray.data.ActorPoolStrategy(4),  # <1>
+    num_gpus=1,  # <2>
+)
+
+predictions.show(limit=1)
+# {'class': 258}
+# __pt_prediction_end__
+# fmt: on
diff --git a/doc/source/data/images/actor_batch_prediction.png b/doc/source/data/images/actor_batch_prediction.png
new file mode 100644
index 000000000000..5922dde5893b
Binary files /dev/null and b/doc/source/data/images/actor_batch_prediction.png differ
diff --git a/doc/source/data/images/actor_pool_batch_prediction.png b/doc/source/data/images/actor_pool_batch_prediction.png
new file mode 100644
index 000000000000..bc8999aad58d
Binary files /dev/null and b/doc/source/data/images/actor_pool_batch_prediction.png differ
diff --git a/doc/source/data/images/air_batch_prediction.png b/doc/source/data/images/air_batch_prediction.png
new file mode 100644
index 000000000000..7741431af463
Binary files /dev/null and b/doc/source/data/images/air_batch_prediction.png differ
diff --git a/doc/source/data/images/batch_inference.png b/doc/source/data/images/batch_inference.png
new file mode 100644
index 000000000000..c2aba39e55c6
Binary files /dev/null and b/doc/source/data/images/batch_inference.png differ
diff --git a/doc/source/data/images/batch_inference_overview.png b/doc/source/data/images/batch_inference_overview.png
new file mode 100644
index 000000000000..5dd8536700f8
Binary files /dev/null and b/doc/source/data/images/batch_inference_overview.png differ
diff --git a/doc/source/data/images/task_batch_prediction.png b/doc/source/data/images/task_batch_prediction.png
new file mode 100644
index 000000000000..72328062a938
Binary files /dev/null and b/doc/source/data/images/task_batch_prediction.png differ
diff --git a/doc/source/data/images/train_predict_pipeline.png b/doc/source/data/images/train_predict_pipeline.png
new file mode 100644
index 000000000000..890b7346b7bf
Binary files /dev/null and b/doc/source/data/images/train_predict_pipeline.png differ
diff --git a/doc/source/data/user-guide.rst b/doc/source/data/user-guide.rst
index 4888c02c3598..7acb35968480 100644
--- a/doc/source/data/user-guide.rst
+++ b/doc/source/data/user-guide.rst
@@ -4,8 +4,10 @@
 User Guides
 ===========
 
-If you’re new to Ray Data, we recommend starting with the :ref:`Ray Data Quick Start <data_getting_started>`.
-This user guide will help you navigate the Ray Data project and show you how achieve several tasks.
+If you’re new to Ray Datasets, we recommend starting with the
+:ref:`Ray Datasets Quick Start <datasets_getting_started>`.
+This user guide will help you navigate the Ray Datasets project and
+show you how achieve several tasks.
 
 .. toctree::
     :maxdepth: 2
@@ -13,7 +15,8 @@ This user guide will help you navigate the Ray Data project and show you how ach
     creating-datastreams
     transforming-datastreams
     consuming-datastreams
-    data-tensor-support
+    batch_inference
+    dataset-tensor-support
     custom-datasource
     data-internals
     performance-tips
diff --git a/doc/source/ray-air/api/predictor.rst b/doc/source/ray-air/api/predictor.rst
index 1c438fbbd54c..92a4c818f720 100644
--- a/doc/source/ray-air/api/predictor.rst
+++ b/doc/source/ray-air/api/predictor.rst
@@ -80,6 +80,7 @@ Batch Prediction API
     batch_predictor.BatchPredictor.predict
     batch_predictor.BatchPredictor.predict_pipelined
 
+.. _air_framework_predictors:
 
 Built-in Predictors for Library Integrations
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/source/ray-overview/use-cases.rst b/doc/source/ray-overview/use-cases.rst
index 5a9c22c40952..bbb73e0889d6 100644
--- a/doc/source/ray-overview/use-cases.rst
+++ b/doc/source/ray-overview/use-cases.rst
@@ -3,7 +3,9 @@
 Ray Use Cases
 =============
 
-This page indexes common Ray use cases for scaling ML. It contains highlighted references to blogs, examples, and tutorials also located elsewhere in the Ray documentation.
+This page indexes common Ray use cases for scaling ML.
+It contains highlighted references to blogs, examples, and tutorials also located
+elsewhere in the Ray documentation.
 
 .. _ref-use-cases-llm:
 
@@ -98,15 +100,15 @@ Learn more about how Ray scales LLMs and generative AI with the following resour
 Batch Inference
 ---------------
 
-Batch inference refers to generating model predictions over a set of input observations. The model could be a regression model, neural network, or simply a Python function. Ray can scale batch inference from single GPU machines to large clusters.
+Batch inference is the process of generating model predictions on a large "batch" of input data.
+Ray for batch inference works with any cloud provider and ML framework,
+and is fast and cheap for modern deep learning applications.
+It scales from single machines to large clusters with minimal code changes.
+As a Python-first framework, you can easily express and interactively develop your inference workloads in Ray.
+To learn more about running batch inference with Ray, see the :ref:`batch inference guide<batch_inference_home>`.
 
-Performing inference on incoming batches of data can be parallelized by exporting the architecture and weights of a trained model to the shared object store. Using these model replicas, Ray AIR's :ref:`Batch Predictor <air-predictors>` scales predictions on batches across workers.
+.. figure:: batch_inference/images/batch_inference.png
 
-.. figure:: /images/batch_inference.png
-  
-  Using Ray AIR's ``BatchPredictor`` for batch inference.
-
-Learn more about batch inference with the following resources.
 
 .. panels::
     :container: container pb-3
@@ -116,24 +118,17 @@ Learn more about batch inference with the following resources.
     ---
     :img-top: /images/ray_logo.png
 
-    .. link-button:: https://github.com/ray-project/ray-educational-materials/blob/main/Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb
-        :type: url
-        :text: [Tutorial] Architectures for Scalable Batch Inference with Ray
-        :classes: btn-link btn-block stretched-link scalableBatchInference
+    .. link-button:: /data/batch-inference
+        :type: ref
+        :text: [User Guide] Batch Inference with Ray Data
+        :classes: btn-link btn-block stretched-link
     ---
     :img-top: /images/ray_logo.png
 
-    .. link-button:: https://www.anyscale.com/blog/model-batch-inference-in-ray-actors-actorpool-and-datasets
+    .. link-button:: https://github.com/ray-project/ray-educational-materials/blob/main/Computer_vision_workloads/Semantic_segmentation/Scaling_batch_inference.ipynb
         :type: url
-        :text: [Blog] Batch Inference in Ray: Actors, ActorPool, and Datasets
-        :classes: btn-link btn-block stretched-link batchActorPool
-    ---
-    :img-top: /images/ray_logo.png
-
-    .. link-button:: /ray-core/examples/batch_prediction
-        :type: ref
-        :text: [Example] Batch Prediction using Ray Core
-        :classes: btn-link btn-block stretched-link batchCore
+        :text: [Tutorial] Architectures for Scalable Batch Inference with Ray
+        :classes: btn-link btn-block stretched-link scalableBatchInference
     ---
     :img-top: /images/ray_logo.png
 
@@ -150,6 +145,7 @@ Learn more about batch inference with the following resources.
         :text: [Example] Batch OCR processing using Ray Data
         :classes: btn-link btn-block stretched-link batchOcr
 
+
 .. _ref-use-cases-mmt:
 
 Many Model Training