-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX-#4138, FIX-#4009: optimize internal '.mask()' flow #4140
Conversation
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
34d6bc9
to
7270664
Compare
Codecov Report
@@ Coverage Diff @@
## master #4140 +/- ##
==========================================
+ Coverage 85.69% 88.47% +2.77%
==========================================
Files 201 201
Lines 16866 16878 +12
==========================================
+ Hits 14454 14932 +478
+ Misses 2412 1946 -466
Continue to review full report at Codecov.
|
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
a753361
to
4d2ea95
Compare
@prutskov @amyskov | ||
@prutskov | ||
@amyskov | ||
@dchigarev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add contributors in a new line, so we won't constantly get conflicts trying to modify the same line in multiple PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes 😄
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchigarev thanks for the changes! Do you have an idea of how much faster mask
is with these changes?
I have ASV results for loc/iloc at the header of the PR, they implicitly show the speedup of the |
@devin-petersohn, added benchmarks for the |
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@@ -626,9 +638,9 @@ def mask( | |||
new_index = self.index[ | |||
# pandas Index is more likely to preserve its metadata if the indexer is slice | |||
slice(row_positions.start, row_positions.stop, row_positions.step) | |||
# TODO: Fast range processing of non-1-step ranges is not yet supported | |||
# TODO: Fast range processing of non-positive-step ranges is not yet supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing a typo [1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dchigarev, LGTM!
@dchigarev Please clear the conflict on release notes. |
ba2f9b2
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
@devin-petersohn the PR has no conflicts and the CI is green |
Signed-off-by: Dmitry Chigarev <dmitry.chigarev@intel.com>
Signed-off-by: Dmitry Chigarev dmitry.chigarev@intel.com
What do these changes do?
This PR eliminates multiple sorting of the same array in the
.mask()
flow, and rewrites negative-to-positive indices conversion into a more optimal way.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
.mask()
#4138added andpassingdocs/development/architecture.rst
is up-to-datePerformance changes according to ASV benchmarks
Benchmarking for the
.mask()
method itself:NPartitions = 112:
NPartitions = 16:
Script to measure