-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#2013: Improved merge_asof implementation #2178
Conversation
…oach. Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Codecov Report
@@ Coverage Diff @@
## master #2178 +/- ##
===========================================
- Coverage 83.99% 64.67% -19.32%
===========================================
Files 119 113 -6
Lines 13063 12844 -219
===========================================
- Hits 10972 8307 -2665
- Misses 2091 4537 +2446
Continue to review full report at Codecov.
|
@devin-petersohn ping—if you have a chance, would love some feedback on this sketch before proceeding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in response, I took some vacation to rest. I'm still getting caught up.
The implementation here, while not super efficient, will be better than the current default behavior. We should still probably throw the warning for defaulting to pandas because we are collecting whole column(s) to the driver to compute the merge.
Otherwise it looks good until we can have a better internal way of joins with conditionals.
…r' into merge_asof/2013 Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
1. Move test to correct location. 2. Test with more interesting indexes. 3. Start setting up infrastructure for testing different on variations. Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Will re-open when it's ready for review. |
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
@devin-petersohn probably still needs testing of some of the fallback edge cases, but this is a lot closer to done, and so could use another passthrough. So many edge cases... and I feel like I'm handling them in super-hacky ways. |
@devin-petersohn any thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good approach given the current internals. Eventually, I would like to implement a non-equi join internally, which would fit this operator better.
Looks like:
|
…in Pandas. Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
@devin-petersohn I guess this is ready for review now? Main caveat I have is that example dataframes for tests were taken from Pandas docs, so potentially licensing issue that would need to fixed, not sure whether that's a problem or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a couple of minor points. Would you be able to fix the failing commitlint test?
Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org> Co-authored-by: Devin Petersohn <devin-petersohn@users.noreply.github.com>
af2cffd
to
69c1031
Compare
I can fix the commit by squash merging, my attempts at rebases have failed due to limits of my Git knowledge. So I can do that, or if you know you can do that. Perhaps the 70 character limit could be removed? It doesn't seem lack it adds much... 😁 |
Part of the 70 limit is a github thing, messages longer than that are chopped. I increased it to 88 in #2489. |
…r' into merge_asof/2013 Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
OK, merged forward, hopefully everything will pass (once GitHub Actions gets unbroken for the day, I guess? Seems like they're having issues). |
…r' into merge_asof/2013 Signed-off-by: Itamar Turner-Trauring <itamar@itamarst.org>
@devin-petersohn another frustrating git commit lint failure. I'm going to just turn this into a patch, and create a completely new PR with a new branch :( |
What do these changes do?
This is a sketch of an approach to
merge_asof
that is more efficient than the current approach, hopefully.@devin-petersohn would appreciate some feedback on e.g. which bits are potentially inefficient.
flake8 modin
black --check modin
git commit -s