[Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. #32749

clarkzinzow · 2023-02-22T21:04:51Z

This PR adds basic metrics collection to the Datasets data layer, focusing on batching and block building. This allows us to capture metrics around data slicing, concatenation, copying, etc.

This PR is partially stacked on changes in #32744. This PR is an e2e instrumentation successor to #33831.

TODOs

Add copy instrumentation of zero-copy slicing; currently num_copies isn't incremented for slicing, even though SimpleBlocks always copy on .slice(), even if copy=False.
Add metrics test coverage at the batcher/builder level around expected number of slices, concatenations, copies, format conversions, etc. This already exists in the adapters PR, but only at the adapters level, and only for simple blocks.
Move to row-based metrics for slicing and concatenation.
Add unit tests of metrics abstractions.

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/data/_internal/execution/operators/map_operator.py

ericl · 2023-02-28T21:15:52Z

python/ray/data/_internal/block_batching.py

    for block in block_iter:
        with stats.iter_format_batch_s.timer() if stats else nullcontext():
-            batch = BlockAccessor.for_block(block).to_batch_format(batch_format)
+            acc = BlockAccessor.for_block(block)


You may want to record the metrics from within BlockAccessor, e.g., BlockAccessor(block, metrics_sink) to avoid duplication of code recording metrics and make the code look nicer.

ericl · 2023-02-28T21:17:31Z

python/ray/data/_internal/planner/filter.py

@@ -28,5 +32,7 @@ def fn(
            # This causes different behavior between filter and other map-like
            # functions. We should revisit and try to get rid of this logic.
            yield builder.build()
+            if metrics_collector is not None:


Shall we always make this non None?

Sure! Initially this was going to allow the legacy path to not be ported, but we can always collect the metrics and then drop them.

ericl

We should also add some unit tests for the metrics collector (this is getting large... should we merge metrics util code in a dedicated PR?)

Couple other thoughts on the metrics:

Should we record only rows copied/sliced/concatenated? It seems more directly related to performance than the number of slices.
Maybe start with fewer metrics

clarkzinzow requested review from ericl, scv119, jjyao, jianoaix and c21 as code owners February 22, 2023 21:04

clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch 2 times, most recently from 65e4e9d to 1da6e5e Compare February 27, 2023 21:48

clarkzinzow assigned ericl, c21 and jianoaix Feb 27, 2023

clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch from 1da6e5e to 7b9c78c Compare February 28, 2023 19:01

clarkzinzow commented Feb 28, 2023

View reviewed changes

python/ray/data/_internal/execution/operators/map_operator.py Outdated Show resolved Hide resolved

ericl reviewed Feb 28, 2023

View reviewed changes

ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 28, 2023

clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch 5 times, most recently from f9fa7f0 to e157ad9 Compare March 28, 2023 23:31

Add metrics collection to data layer and MapOperator.

1959282

clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch from e157ad9 to 1959282 Compare March 28, 2023 23:57

clarkzinzow mentioned this pull request Mar 29, 2023

[Datasets] [Operator Fusion - 4/N] Add metrics collection abstractions, without instrumentation. #33831

Closed

8 tasks

clarkzinzow changed the title ~~[Datasets] [Operator Fusion - 4/N] Add metrics collection to data layer.~~ [Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. Mar 29, 2023

clarkzinzow mentioned this pull request Mar 31, 2023

[Datasets] [Operator Fusion - 5/N] Add metrics instrumentation to data layer. #33983

Closed

8 tasks

clarkzinzow closed this Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. #32749

[Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. #32749

clarkzinzow commented Feb 22, 2023 •

edited

Loading

ericl Feb 28, 2023

ericl Feb 28, 2023

clarkzinzow Mar 1, 2023

ericl left a comment

[Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. #32749

[Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. #32749

Conversation

clarkzinzow commented Feb 22, 2023 • edited Loading

TODOs

Checks

ericl Feb 28, 2023

Choose a reason for hiding this comment

ericl Feb 28, 2023

Choose a reason for hiding this comment

clarkzinzow Mar 1, 2023

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

clarkzinzow commented Feb 22, 2023 •

edited

Loading