Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.DataFrame.replace doesn't work after converting to new dtypes in 1.0.0 #31517

Closed
scottboston opened this issue Jan 31, 2020 · 3 comments · Fixed by #31545
Closed

pd.DataFrame.replace doesn't work after converting to new dtypes in 1.0.0 #31517

scottboston opened this issue Jan 31, 2020 · 3 comments · Fixed by #31545
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@scottboston
Copy link

scottboston commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.show_versions()
df = pd.DataFrame({'grp':[1,2,3,4,5]})
df = df.convert_dtypes()
df.replace(1,10)

Problem description

pd.DataFrame.replace is not working after using pd.DataFrame.convert_dtypes.

Stack trace:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-9-fb2a3a449007> in <module>
      3 df = pd.DataFrame({'grp':[1,2,3,4,5]})
      4 df = df.convert_dtypes()
----> 5 df.replace(1,10)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in replace(self, to_replace, value, inplace, limit, regex, method)
   4167             limit=limit,
   4168             regex=regex,
-> 4169             method=method,
   4170         )
   4171 

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6735                 elif not is_list_like(value):  # NA -> 0
   6736                     new_data = self._data.replace(
-> 6737                         to_replace=to_replace, value=value, inplace=inplace, regex=regex
   6738                     )
   6739                 else:

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py in replace(self, value, **kwargs)
    587     def replace(self, value, **kwargs):
    588         assert np.ndim(value) == 0, value
--> 589         return self.apply("replace", value=value, **kwargs)
    590 
    591     def replace_list(self, src_list, dest_list, inplace=False, regex=False):

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals/managers.py in apply(self, f, filter, **kwargs)
    440                 applied = b.apply(f, **kwargs)
    441             else:
--> 442                 applied = getattr(b, f)(**kwargs)
    443             result_blocks = _extend_blocks(applied, result_blocks)
    444 

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals/blocks.py in replace(self, to_replace, value, inplace, filter, regex, convert)
    769 
    770         try:
--> 771             blocks = self.putmask(mask, value, inplace=inplace)
    772             # Note: it is _not_ the case that self._can_hold_element(value)
    773             #  is always true at this point.  In particular, that can fail

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals/blocks.py in putmask(self, mask, new, align, inplace, axis, transpose)
   1671         mask = _safe_reshape(mask, new_values.shape)
   1672 
-> 1673         new_values[mask] = new
   1674         return [self.make_block(values=new_values)]
   1675 

~/anaconda3/lib/python3.6/site-packages/pandas/core/arrays/integer.py in __setitem__(self, key, value)
    415             mask = mask[0]
    416 
--> 417         self._data[key] = value
    418         self._mask[key] = mask
    419 

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices​

Expected Output

Should work just like it does before using pd.DataFrame.convert_dtypes.

#pd.show_versions()
df = pd.DataFrame({'grp':[1,2,3,4,5]})
#df = df.convert_dtypes()
df.replace(1,10)```

Output:
grp
0 10
1 2
2 3
3 4
4 5
#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.6.9.final.0
python-bits      : 64
OS               : Linux
OS-release       : 4.15.0-76-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.0.0
numpy            : 1.16.4
pytz             : 2019.2
dateutil         : 2.8.0
pip              : 20.0.1
setuptools       : 45.1.0
Cython           : 0.29
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.4.1
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.8.0
pandas_datareader: None
bs4              : 4.8.1
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.4.1
matplotlib       : 2.2.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : None
pyxlsb           : None
s3fs             : None
scipy            : 1.2.1
sqlalchemy       : 1.3.9
tables           : None
tabulate         : 0.8.6
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None
​

</details>
@charlesdong1991
Copy link
Member

thanks for posting your issue @scottboston
this is indeed a regression issue. Seem to be fixed by #31484

@charlesdong1991 charlesdong1991 added the Regression Functionality that used to work in a prior pandas version label Jan 31, 2020
@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Regression Functionality that used to work in a prior pandas version labels Feb 1, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.1 milestone Feb 1, 2020
@michelkluger
Copy link

I am experiencing a similar error with line df["name"].replace(str(np.nan), np.nan, inplace=True)

@michelkluger
Copy link

Items: RangeIndex(start=0, stop=109277, step=1)
ObjectBlock: 109277 dtype: object, to_replace = 'nan'
value = <module 'numpy' from 'p:\miniconda3_64bit\ibp2\lib\site-packages\numpy\init.py'>, inplace = True, regex = False

def replace(self, to_replace, value, inplace: bool, regex: bool) -> "BlockManager":
  assert np.ndim(value) == 0, value

E AssertionError: <module 'numpy' from 'p:\miniconda3_64bit\ibp2\lib\site-packages\numpy\init.py'>

p:\miniconda3_64bit\ibp2\lib\site-packages\pandas\core\internals\managers.py:649: AssertionError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants