-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] cleanup: use SortKey instead of mixed typing in aggregation #48697
Changes from all commits
a3301e6
d5597f1
6fa628a
dd356cf
87dc7c5
eeb8e79
f1e3774
6fc6763
54417ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
from typing import List, Optional, Tuple | ||
from typing import List, Optional, Tuple, Union | ||
|
||
from ray.data._internal.execution.interfaces import ( | ||
AllToAllTransformFn, | ||
|
@@ -22,7 +22,7 @@ | |
|
||
|
||
def generate_aggregate_fn( | ||
key: Optional[str], | ||
key: Optional[Union[str, List[str]]], | ||
aggs: List[AggregateFn], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not making this API accept SortKey as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically there is no need for the aggregate function to take a sortkey; we just happen to use it as an implementation detail (our aggregations are sort-based). |
||
_debug_limit_shuffle_execution_to_num_blocks: Optional[int] = None, | ||
) -> AllToAllTransformFn: | ||
|
@@ -49,6 +49,8 @@ def fn( | |
|
||
num_mappers = len(blocks) | ||
|
||
sort_key = SortKey(key) | ||
|
||
if key is None: | ||
num_outputs = 1 | ||
boundaries = [] | ||
|
@@ -60,12 +62,12 @@ def fn( | |
] | ||
# Sample boundaries for aggregate key. | ||
boundaries = SortTaskSpec.sample_boundaries( | ||
blocks, SortKey(key), num_outputs, sample_bar | ||
blocks, sort_key, num_outputs, sample_bar | ||
) | ||
|
||
agg_spec = SortAggregateTaskSpec( | ||
boundaries=boundaries, | ||
key=key, | ||
key=sort_key, | ||
aggs=aggs, | ||
) | ||
if DataContext.get_current().use_push_based_shuffle: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docstring contains
key
instead ofsort_key
. same with other methodsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SortKey type is kind of a misnomer. It's just the key(s) on which we happen to do things like groupby, sort, join, windowing etc.