Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#5702: Fix passing RangeIndex to loc. #5719

Merged

Conversation

mvashishtha
Copy link
Collaborator

What do these changes do?

FIX-#5702: Fix passing RangeIndex to loc.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: RangeIndex behaves differently than in Pandas #5702
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

Signed-off-by: mvashishtha <mahesh@ponder.io>
@mvashishtha mvashishtha requested a review from a team as a code owner February 28, 2023 23:50
Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually a full fix, see this:

import pandas
import modin.pandas as pd

df = pd.DataFrame({'a': range(20)})
pdf = pandas.DataFrame({'a': range(20)})
print(len(df.loc[range(10)])) # shows "11"
print(len(pdf.loc[range(10)])) # shows "10"

Comment on lines +3394 to +3395
if isinstance(axis_loc, pandas.RangeIndex):
axis_lookup = axis_loc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the proper fix... I mean, this if should certainly be there as a fastpath solution! But...

For me it feels like there's a bug somewhere else that this fastpath change just masks.

And I think I know where that is, just a few lines below! Note how axis_labels.slice_indexer() expects that its both start and stop arguments are labels which means that they signify closed interval, while RangeIndex/range signify semi-open interval!

Signed-off-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
@vnlitvinov
Copy link
Collaborator

@mvashishtha I've pushed 0c8d268 which should fix both the original issue and issue with passing range(N) objects (but I still think that your original fix should be retained as a fastpath).

@vnlitvinov
Copy link
Collaborator

History digging showed that this was introduced in #3694, cc @dchigarev

Also we're using .slice_indexer() function which is deprecated since 1.4.0, might need to update the code.

@dchigarev
Copy link
Collaborator

Also we're using .slice_indexer() function which is deprecated since 1.4.0, might need to update the code.

it seems that only the kind parameter being deprecated, not the whole method:

>>> idx.slice_indexer(start="a", end="c", step=None)
slice(0, 3, None)
>>> idx.slice_indexer(start="a", end="c", step=None, kind=None)
<stdin>:1: FutureWarning: 'kind' argument in slice_indexer is deprecated and will be removed in a future version.  Do not pass it.
slice(0, 3, None)

@mvashishtha
Copy link
Collaborator Author

I've pushed 0c8d268 which should fix both the original issue and issue with passing range(N) objects (but I still think that your original fix should be retained as a fastpath).

@vnlitvinov with your commit, modin/pandas/test/dataframe/test_binary.py::test_mismatched_row_partitions[idx_aligned-df_df-False] on --execution=BaseOnPython fails because indexing with a slice is off by 1:

import modin.pandas as pd

df = pd.DataFrame([1, 2, 3])
print(df.loc[:2])
print(df._to_pandas().loc[:2])

seems the bounds have to be different for slice. Should we special case slice?

Signed-off-by: mvashishtha <mahesh@ponder.io>
@mvashishtha
Copy link
Collaborator Author

This is not actually a full fix, see this

@vnlitvinov I added test cases for loc with a range() as well.

@vnlitvinov
Copy link
Collaborator

Should we special case slice?

Maybe we should special-case loc.step being None (right now it's treated as being equal to 1, maybe we should treat it as equal to 0 instead?..)

Signed-off-by: mvashishtha <mahesh@ponder.io>
@mvashishtha
Copy link
Collaborator Author

Maybe we should special-case loc.step being None (right now it's treated as being equal to 1, maybe we should treat it as equal to 0 instead?..)

Treating it as 0 seems to work. Done.

@mvashishtha
Copy link
Collaborator Author

@vnlitvinov @modin-project/modin-core can anyone reproduce the test failure here? https://github.com/modin-project/modin/actions/runs/4307371046/jobs/7512428160

python -m pytest -n 2 modin/pandas/test/dataframe/test_indexing.py --execution=BaseOnPython passes for me locally and my github branch seems to be in sync with my local one.

@mvashishtha
Copy link
Collaborator Author

python -m pytest -n 2 modin/pandas/test/dataframe/test_indexing.py --execution=BaseOnPython passes for me locally and my github branch seems to be in sync with my local one.

never mind! I needed MODIN_TEST_DATASET_SIZE=SMALL. By the way, this was not easy to find. I wonder whether there's a way to use the small data in a more obvious way in the CI.

Signed-off-by: mvashishtha <mahesh@ponder.io>
Co-authored-by: Vasily Litvinov <fam1ly.n4me@yandex.ru>
vnlitvinov
vnlitvinov previously approved these changes Mar 3, 2023
Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

This reverts commit 55bbb7b.

@vnlitvinov pointed out "you've checked .stop, not .step for being None"
Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@anmyachev anmyachev merged commit a2ef47c into modin-project:master Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: RangeIndex behaves differently than in Pandas
4 participants