Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dataframe sort_values with multiple ascendings bug in pandas < 1.4 #3234

Merged

Conversation

fyrestone
Copy link
Contributor

@fyrestone fyrestone commented Aug 25, 2022

What do these changes do?

The numpy array is immutable if it is got from shared memory. But, pandas < 1.4 has item setting bugs: pandas-dev/pandas#43406.

pandas >= 1.4.x has fixed this bug. So, this PR is to make the Ray backend works with pandas < 1.4

import pandas as pd
import numpy as np

a = np.arange(2)
a.setflags(write=False)

df = pd.DataFrame(a, columns=["a"])
df["a"] = -df["a"]

assert a[1] > 0  # The original array is immutable.
assert df.iloc[1]["a"] < 0  # The df column is a copy one.

This code raises a ValueError in pandas < 1.4, but works in pandas >= 1.4

Traceback (most recent call last):
  File "/home/admin/Work/mars/t2.py", line 9, in <module>
    df["a"] = -df["a"]
  File "/home/admin/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/core/frame.py", line 3612, in __setitem__
    self._set_item(key, value)
  File "/home/admin/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/core/frame.py", line 3797, in _set_item
    self._set_item_mgr(key, value)
  File "/home/admin/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/core/frame.py", line 3756, in _set_item_mgr
    self._iset_item_mgr(loc, value)
  File "/home/admin/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/core/frame.py", line 3746, in _iset_item_mgr
    self._mgr.iset(loc, value)
  File "/home/admin/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1078, in iset
    blk.set_inplace(blk_locs, value_getitem(val_locs))
  File "/home/admin/.pyenv/versions/3.8.13/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 360, in set_inplace
    self.values[locs] = values
ValueError: assignment destination is read-only

Related issue number

Fixes #3215

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@fyrestone fyrestone self-assigned this Aug 25, 2022
@fyrestone fyrestone changed the title Ray backend dataframe PSRS compatible with pandas 1.3 Fix dataframe sort_values with multiple ascendings bug Aug 26, 2022
@fyrestone fyrestone changed the title Fix dataframe sort_values with multiple ascendings bug Fix dataframe sort_values with multiple ascendings bug in pandas < 1.4 Aug 26, 2022
@fyrestone fyrestone marked this pull request as ready for review August 26, 2022 09:34
@fyrestone fyrestone requested a review from a team as a code owner August 26, 2022 09:34
Copy link
Contributor

@chaokunyang chaokunyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhongchun zhongchun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fyrestone fyrestone merged commit c8c4291 into mars-project:master Aug 30, 2022
aresnow1 pushed a commit to aresnow1/mars that referenced this pull request Sep 7, 2022
mars-project#3234)

* Ray backend dataframe PSRS compatible with pandas 1.3

* Remove print

* Fix assignment bug

Co-authored-by: 刘宝 <po.lb@antgroup.com>
(cherry picked from commit c8c4291)
aresnow1 pushed a commit to aresnow1/mars that referenced this pull request Sep 7, 2022
mars-project#3234)

* Ray backend dataframe PSRS compatible with pandas 1.3

* Remove print

* Fix assignment bug

Co-authored-by: 刘宝 <po.lb@antgroup.com>
(cherry picked from commit c8c4291)
aresnow1 pushed a commit to aresnow1/mars that referenced this pull request Sep 7, 2022
mars-project#3234)

* Ray backend dataframe PSRS compatible with pandas 1.3

* Remove print

* Fix assignment bug

Co-authored-by: 刘宝 <po.lb@antgroup.com>
(cherry picked from commit c8c4291)
qianduoduo0904 pushed a commit to qianduoduo0904/mars that referenced this pull request Sep 22, 2022
mars-project#3234)

* Ray backend dataframe PSRS compatible with pandas 1.3

* Remove print

* Fix assignment bug

Co-authored-by: 刘宝 <po.lb@antgroup.com>
(cherry picked from commit c8c4291)
UranusSeven pushed a commit to xorbitsai/mars that referenced this pull request Sep 22, 2022
mars-project#3234)

* Ray backend dataframe PSRS compatible with pandas 1.3

* Remove print

* Fix assignment bug

Co-authored-by: 刘宝 <po.lb@antgroup.com>
(cherry picked from commit c8c4291)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Mars dataframe sort_values with multiple ascendings returns incorrect result on pandas<1.4
3 participants