Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] Remove "Example: Large-scale ML Ingest" #33067

Merged
merged 25 commits into from
Mar 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
3db9875
Initial commit
bveeramani Mar 2, 2023
94e12bb
Remove big_data_ingestion.ipynb
bveeramani Mar 6, 2023
c2c5f46
Merge branch 'ml-preprocessing-revisions' into remove-example
bveeramani Mar 6, 2023
32abf7b
Update doc/source/data/transforming-datasets.rst
bveeramani Mar 6, 2023
833c2a1
Update doc/source/data/transforming-datasets.rst
bveeramani Mar 6, 2023
4edad33
Update doc/source/data/transforming-datasets.rst
bveeramani Mar 6, 2023
fc10610
Update doc/source/data/dataset.rst
bveeramani Mar 6, 2023
393fc40
Update doc/source/data/dataset.rst
bveeramani Mar 6, 2023
520b0df
Update doc/source/data/dataset.rst
bveeramani Mar 6, 2023
3849ab3
Update doc/source/data/dataset.rst
bveeramani Mar 6, 2023
68bbfa1
Update doc/source/data/transforming-datasets.rst
bveeramani Mar 6, 2023
781819f
Update dataset.rst
bveeramani Mar 6, 2023
1d0d62a
Update transforming-datasets.rst
bveeramani Mar 6, 2023
d81b0be
Merge branch 'ml-preprocessing-revisions' of https://github.com/bveer…
bveeramani Mar 6, 2023
5d31afb
Update doc/source/ray-air/check-ingest.rst
bveeramani Mar 10, 2023
f8c191c
Update doc/source/data/dataset.rst
bveeramani Mar 10, 2023
f621d65
Update doc/source/data/transforming-datasets.rst
bveeramani Mar 10, 2023
91187a5
Address review comments
bveeramani Mar 10, 2023
8d1ddbd
Merge remote-tracking branch 'upstream/master' into ml-preprocessing-…
bveeramani Mar 11, 2023
12e21b8
Update faq.rst
bveeramani Mar 11, 2023
421d3d4
Update nyc_taxi_basic_processing.ipynb
bveeramani Mar 11, 2023
789e10a
Update transforming-datasets.rst
bveeramani Mar 11, 2023
b32ecc2
Merge branch 'ml-preprocessing-revisions' into remove-example
bveeramani Mar 13, 2023
8bca667
Merge remote-tracking branch 'upstream/master' into remove-example
bveeramani Mar 13, 2023
c61acc1
Fix changes
bveeramani Mar 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions doc/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,6 @@ py_test(
tags = ["exclusive", "team:ml"]
)

py_test(
name = "big_data_ingestion",
size = "small",
main = "test_myst_doc.py",
srcs = ["test_myst_doc.py"],
args = ["--path", "doc/source/data/examples/big_data_ingestion.ipynb"],
data = ["//doc/source/data/examples:data_examples"],
tags = ["exclusive", "team:core", "py37"]
)

py_test(
name = "datasets_train",
Expand Down
2 changes: 0 additions & 2 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,6 @@ parts:
title: Processing the NYC taxi dataset
- file: data/examples/batch_training
title: Batch Training with Ray Datasets
- file: data/examples/big_data_ingestion
title: Large-scale ML Ingest
- file: data/examples/ocr_example
title: Scaling OCR with Ray Datasets
- file: data/examples/advanced-pipelines
Expand Down
54 changes: 0 additions & 54 deletions doc/source/data/big_data_ingestion.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion doc/source/data/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Data Loading and Preprocessing for ML Training
----------------------------------------------

Use Ray Datasets to load and preprocess data for distributed :ref:`ML training pipelines <train-docs>`.
Compared to other loading solutions, Datasets are more flexible (e.g., can express higher-quality `per-epoch global shuffles <examples/big_data_ingestion.html>`__) and provides `higher overall performance <https://www.anyscale.com/blog/why-third-generation-ml-platforms-are-more-performant>`__.
Compared to other loading solutions, Datasets are more flexible (e.g., can express higher-quality per-epoch global shuffles) and provides `higher overall performance <https://www.anyscale.com/blog/why-third-generation-ml-platforms-are-more-performant>`__.

Use Datasets as a last-mile bridge from storage or ETL pipeline outputs to distributed
applications and libraries in Ray. Don't use it as a replacement for more general data
Expand Down
17 changes: 0 additions & 17 deletions doc/source/data/examples/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,3 @@ filegroup(
srcs = glob(["*.ipynb"]),
visibility = ["//doc:__subpackages__"]
)

# --------------------------------------------------------------------
# Test all doc/source/data/examples notebooks.
# --------------------------------------------------------------------

# big_data_ingestion.ipynb is not tested right now due to large resource requirements
# and a need of a general overhaul.

py_test_run_all_notebooks(
size = "large",
include = ["*.ipynb"],
exclude = [
"big_data_ingestion.ipynb",
],
data = ["//doc/source/data/examples:data_examples"],
tags = ["exclusive", "team:ml"]
)
Loading