-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use pairwise order for mapreduce on arbitrary iterators #52397
Draft
stevengj
wants to merge
22
commits into
master
Choose a base branch
from
sgj/mapreduce_pairwise
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+84
−11
Draft
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
5b2b294
use pairwise order for mapreduce on arbitrary iterators
stevengj 5cdca76
tweak
stevengj 92e0028
fallback to mapfoldl if pairwise_blocksize < 3
stevengj ba7ade9
comment
stevengj 52d7c75
whoops
stevengj bc94925
handle pairwise_blocksize < 3
stevengj ee24cb2
whoops
stevengj 106563f
length may not be O(1)
stevengj 1f8d60b
unify error messages
stevengj 5e06e36
don't change the exception type (make sure to call reduce_empty / map…
stevengj c783ea3
simplification
stevengj f96f76a
updated sums in doctests (more accurate!)
stevengj f679885
more transducer style implementation
stevengj 42105bf
use mapreduce_impl instead of mapfoldl in mapreduce fallbacks
stevengj 1e972a1
pairwise is not safe for arbitrary init value
stevengj c074cb9
make mapreduce_impl argument order consistent with mapfoldl_impl, fix…
stevengj 808d61c
pairwise reduction seems incompatible with the transducer style
stevengj d4e9420
rm unnecessary specialization
stevengj 43da644
cleanup/unify mapreduce_empty_iter
stevengj a8cf82e
performance optimization by unrolling
stevengj 52b2f0b
Merge branch 'master' into sgj/mapreduce_pairwise
stevengj 20a6dea
Merge remote-tracking branch 'origin/master' into sgj/mapreduce_pairwise
mbauman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -613,6 +613,7 @@ test18695(r) = sum( t^2 for t in r ) | |||||||||||||||||||||||||||||||||||||||||||||||||
@test_throws str -> ( occursin("reducing over an empty", str) && | ||||||||||||||||||||||||||||||||||||||||||||||||||
occursin("consider supplying `init`", str) && | ||||||||||||||||||||||||||||||||||||||||||||||||||
!occursin("or defining", str)) test18695(Any[]) | ||||||||||||||||||||||||||||||||||||||||||||||||||
@test_throws MethodError test18695(Any[]) | ||||||||||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I blindly went to add some tests here as I started reviewing this, and I wrote the following...
Suggested change
But that fails, because supplying init pushes you back into foldl land. I see you've considered this... but it still feels pretty unexpected. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
# For Core.IntrinsicFunction | ||||||||||||||||||||||||||||||||||||||||||||||||||
@test_throws str -> ( occursin("reducing over an empty", str) && | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's not clear to me why this is necessarily a problem. Isn't it also true that
op(op(itr[i], itr[j]), itr[k])
might have a different type thanop(itr[i], op(itr[j], itr[k])
? Why is having a separately-passed initial value different fromitr[i]
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that re-association can introduce the initial value multiple times. e.g. if I do
then the result depends on how many times I split the pairwise reduction. i.e. if that is just
you get
56
. But if you do a pairwise split in the middle you getthen we have a result of
57
. This is similar to but distinct from the problems you encounter whenop
is not associative. In this case, it turns out to be important thatinit
must be an identity element underop
, i.e. you must haveop(init, x) ≈ x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: why couldn't we do
foldl(+, 1:5, init=1) + foldl(+, 5:10)
? Isn't this already doingfoldl
for the first 14 operations anyway?...
Oh. Right, I suppose this is where type comes into play. Because it'd be very weird and problematic if$2^8$ and half in arbitrary precision. Yeah, ok.
reduce(+, Int8.(1:100), init=big(0))
did half of its calculation modThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But on the other hand, isn't that what you sign up for with
reduce
? Since we very explicitly do not guarantee an evaluation order, I don't think you should be able to rely upon this sort of "cascading-promotion-from-the-init-value".Is it really meaningfully different than the status quo:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I infer from the docstring is that
init
must be the identity element of the monoid defined byop
andeltype(itr)
so I might expect it to "work" when
typeof(op(init, x)) <: eltype(itr)
for allx in itr
but in the example
reduce(+, Int8.(1:100), init=big(0))
since the promoted return type of+(::BigInt, ::Int8)
does not subtypeInt8
, I do not think that can reasonably be expected to always produce a sane resultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, we even go further than that and say:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the docstring seems to give us a lot of leeway to either drop the
init
or use it each time and give a 'surprising' result if it's not the identity element.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue this conversation in #53871, e.g. see my comment at #53871 (comment)