Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. #32749

Conversation

clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Feb 22, 2023

This PR adds basic metrics collection to the Datasets data layer, focusing on batching and block building. This allows us to capture metrics around data slicing, concatenation, copying, etc.

This PR is partially stacked on changes in #32744. This PR is an e2e instrumentation successor to #33831.

TODOs

  • Add copy instrumentation of zero-copy slicing; currently num_copies isn't incremented for slicing, even though SimpleBlocks always copy on .slice(), even if copy=False.
  • Add metrics test coverage at the batcher/builder level around expected number of slices, concatenations, copies, format conversions, etc. This already exists in the adapters PR, but only at the adapters level, and only for simple blocks.
  • Move to row-based metrics for slicing and concatenation.
  • Add unit tests of metrics abstractions.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@clarkzinzow clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch 2 times, most recently from 65e4e9d to 1da6e5e Compare February 27, 2023 21:48
@clarkzinzow clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch from 1da6e5e to 7b9c78c Compare February 28, 2023 19:01
for block in block_iter:
with stats.iter_format_batch_s.timer() if stats else nullcontext():
batch = BlockAccessor.for_block(block).to_batch_format(batch_format)
acc = BlockAccessor.for_block(block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to record the metrics from within BlockAccessor, e.g., BlockAccessor(block, metrics_sink) to avoid duplication of code recording metrics and make the code look nicer.

@@ -28,5 +32,7 @@ def fn(
# This causes different behavior between filter and other map-like
# functions. We should revisit and try to get rid of this logic.
yield builder.build()
if metrics_collector is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we always make this non None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Initially this was going to allow the legacy path to not be ported, but we can always collect the metrics and then drop them.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add some unit tests for the metrics collector (this is getting large... should we merge metrics util code in a dedicated PR?)

Couple other thoughts on the metrics:

  • Should we record only rows copied/sliced/concatenated? It seems more directly related to performance than the number of slices.
  • Maybe start with fewer metrics

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 28, 2023
@clarkzinzow clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch 5 times, most recently from f9fa7f0 to e157ad9 Compare March 28, 2023 23:31
@clarkzinzow clarkzinzow force-pushed the datasets/feat/data-layer-metrics branch from e157ad9 to 1959282 Compare March 28, 2023 23:57
@clarkzinzow clarkzinzow changed the title [Datasets] [Operator Fusion - 4/N] Add metrics collection to data layer. [Datasets] [Operator Fusion - 5/N] Add metrics collection to data layer. Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants