[Docs] [Draft] First pass of batch inference docs #34185

amogkam · 2023-04-08T00:50:44Z

Initial pass of batch inference docs.

Open questions:

How should this interact with Ray Data docs?
What to do with the existing Predictor/preprocessor/batch predictor docs in Ray AIR?

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: amogkam <amogkamsetty@yahoo.com>

waleedkadous

Not an approver but some suggestsions.

doc/source/batch-inference/doc_code/torch_image_batch_pretrained.py

waleedkadous · 2023-04-08T18:24:14Z

doc/source/batch-inference/doc_code/pytorch_tabular_batch_prediction.py

+dataset = dataset.map_batches(concatenate, batch_format="pandas")
+
+# Define the model class for prediction.
+class TorchModel:


Is calling this a model the right thing? Should we call it a pipeline?

We can rename to TorchPredictor. But I don't think pipeline is very intuitive.

doc/source/batch-inference/doc_code/pytorch_tabular_batch_prediction.py

doc/source/batch-inference/doc_code/tf_tabular_batch_prediction.py

waleedkadous · 2023-04-08T18:26:27Z

doc/source/batch-inference/doc_code/tf_tabular_batch_prediction.py

+        from tensorflow import keras  # this is needed for tf<2.9
+        from keras import layers
+
+        self.model = keras.Sequential(


Shouldn't model have a leading dunder?

doc/source/batch-inference/doc_code/torch_image_batch_pretrained.py

waleedkadous · 2023-04-08T18:32:03Z

doc/source/batch-inference/preprocessing.rst

+
+
+Writing batch UDFs
+------------------


Why are we calling them UDF? That seems like Spark terminology. Why don't we call them stateful model classes or something ?

I think this is common lingo in this context. My worry is rather that "UDF" and other acronyms might look intimidating to new users

ericl · 2023-04-10T20:13:52Z

doc/source/batch-inference/preprocessing.rst

+The following is an example to make use of those transformation APIs for processing
+the Iris dataset.
+
+.. literalinclude:: ../data/doc_code/transforming_datasets.py


Should also give an example of how to pass a custom constructor arg / custom call arg. It comes up often in feedback.

ericl · 2023-04-10T20:16:09Z

doc/source/batch-inference/inference.rst

+Model Inferencing
+=================
+
+Model inferencing involves applying a :meth:`ds.map_batches() <ray.data.Dataset.map_batches>` to our transformed dataset with a pre-trained model as a UDF.


For this page, how about using phrasing more like custom model / Python class instead of stateful operations / stateful UDFs?

I think it's OK to use UDF terminology for Data docs more generally, since it's more accurate, but for this workflow custom models are probably what users are looking for.

ericl · 2023-04-10T20:16:37Z

doc/source/batch-inference/inference.rst

+        num_cpus=8)
+
+
+**How should I deal with OOM errors due to heavy model memory usage?**


Shall we also add a FAQ on how to configure batch size? (which may affect both performance and memory usage?)

ericl · 2023-04-10T20:17:56Z

doc/source/batch-inference/preprocessing.rst

+
+
+Configuring Batch Size
+~~~~~~~~~~~~~~~~~~~~~~


Ah I see you have it here. IMO, this page should be combined with the former, since preprocessing + inference are so tied together. Also, batch size applies for both preprocessing and inference.

Signed-off-by: amogkam <amogkamsetty@yahoo.com>

maxpumperla · 2023-04-14T14:25:44Z

doc/source/_toc.yml

@@ -14,6 +14,13 @@ parts:
      - file: ray-overview/ray-libraries
        title: "Ecosystem"

+      - file: batch-inference/getting-started


A good way to extend this to other CUJs would be to make our "Use Cases" section expandable and then have several cases such as this one there.

maxpumperla · 2023-04-14T14:39:13Z

doc/source/batch-inference/getting-started.rst

@@ -0,0 +1,44 @@
+Scalable Offline Batch Inference
+================================
+:ref:`Ray Data <datasets>` offers a highly performant and scalable solution for offline batch inference and processing on large amounts of data. 


Quick paragraph on what batch inference is maybe? Also, we should somehow mention that there are many ways to run batch inference with Ray, but we're starting with the most convenient here (or so).

maxpumperla · 2023-04-14T16:58:53Z

doc/source/batch-inference/getting-started.rst

+
+Why should I use Ray for offline batch inference?
+-------------------------------------------------
+1. **Faster and Cheaper for modern Deep Learning Applications**: Ray Data is built for hybrid CPU+GPU workloads. Through streaming based execution, CPU tasks like reading and preprocessing can be executed concurrently with GPU inference.


just "Ray" here? We don't know about "Data" at this point

stale · 2023-05-18T23:03:24Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

zhe-thoughts · 2023-05-19T17:42:15Z

This can be closed because of #34567

amogkam added 2 commits April 6, 2023 20:49

wip

9f716cf

Signed-off-by: amogkam <amogkamsetty@yahoo.com>

first pass

735ce89

Signed-off-by: amogkam <amogkamsetty@yahoo.com>

amogkam assigned ericl, maxpumperla, richardliaw and c21 Apr 8, 2023

waleedkadous reviewed Apr 8, 2023

View reviewed changes

ericl reviewed Apr 10, 2023

View reviewed changes

amogkam added 2 commits April 10, 2023 14:01

address comments

fe821e2

Signed-off-by: amogkam <amogkamsetty@yahoo.com>

address comment

f3806a2

Signed-off-by: amogkam <amogkamsetty@yahoo.com>

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Apr 10, 2023

maxpumperla reviewed Apr 14, 2023

View reviewed changes

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 18, 2023

zhe-thoughts closed this May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] [Draft] First pass of batch inference docs #34185

[Docs] [Draft] First pass of batch inference docs #34185

amogkam commented Apr 8, 2023 •

edited

Loading

waleedkadous left a comment

waleedkadous Apr 8, 2023

amogkam Apr 10, 2023

waleedkadous Apr 8, 2023

waleedkadous Apr 8, 2023

maxpumperla Apr 14, 2023

ericl Apr 10, 2023

ericl Apr 10, 2023

ericl Apr 10, 2023

ericl Apr 10, 2023

maxpumperla Apr 14, 2023

maxpumperla Apr 14, 2023

maxpumperla Apr 14, 2023

stale bot commented May 18, 2023

zhe-thoughts commented May 19, 2023

		num_cpus=8)


		How should I deal with OOM errors due to heavy model memory usage?



		Writing batch UDFs
		------------------



		Configuring Batch Size
		~~~~~~~~~~~~~~~~~~~~~~

[Docs] [Draft] First pass of batch inference docs #34185

[Docs] [Draft] First pass of batch inference docs #34185

Conversation

amogkam commented Apr 8, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

waleedkadous left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented May 18, 2023

zhe-thoughts commented May 19, 2023

amogkam commented Apr 8, 2023 •

edited

Loading