[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288

jelther · 2019-02-12T19:47:57Z

Code Sample

import pandas as pd

sample_data = []

sample_data.append({"integer_column":None})
sample_data.append({"integer_column":1})
sample_data.append({"integer_column":2})

df = pd.DataFrame(sample_data)

# Previous type is object
# df.dtypes

df.loc[:,'integer_column'] = df.loc[:,'integer_column'].astype('Int64')

# Check new type is Int64, nullable
# df.dtypes

df.fillna('null_string')

Problem description

Using the new nullable type Int64, it is not possible to fill "NaN" values with other value.

Error raised

TypeError: <U11 cannot be converted to an IntegerDtype

Expected Output

The new dataframe should have replaced it's NaN values with the desired input of .fillna() method.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.24.1
pytest: 3.3.2
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.12
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml.etree: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.1.18
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2019-02-13T04:35:34Z

This works fine if you use an actual integer value to fill, so there's not really much of a point to using Int64 in this case since you're still asking for an object.

In any case I suppose it should still coerce to object for you like using float here would. Investigation and PRs are always welcome

TomAugspurger · 2019-02-13T12:02:23Z

I vaguely recall some discussion on whether ExtensionArray.fillna should allow coercing the array to the dtype of the `fill_value`. I don't recall if we reached a final conclusion. It's somewhat inconvenient to have to manual `.astype` before filling with a different dtype, but the type stability ensured by `ExtensionArray[T].fillna -> ExtensionArray[T]` is nice.

…

On Tue, Feb 12, 2019 at 10:35 PM William Ayd ***@***.***> wrote: This works fine if you use an actual integer value to fill, so there's not really much of a point to using Int64 in this case since you're still asking for an object. In any case I suppose it should still coerce to object for you like using float here would. Investigation and PRs are always welcome — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#25288 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIi1zyiheZ19Ef7M62IDXIE5hr0Psks5vM5YbgaJpZM4a37EH> .

jelther · 2019-02-14T00:19:59Z

In fact, since I'm using pandas for an ETL tool, this doesn't look nice to me.
Having to change the type to "object" inevitable adds ".0" after the integer number and breaks my code.

The alternative I used is to remove the ".0" part after "astype(object)" and fill it with NaNs values.

WillAyd · 2019-02-14T00:23:57Z

@jelther getting slightly off topic but if you are getting zeros appended to your integers its because they are getting cast to float at some point. Explicitly constructing your data frame with dtype=object as an argument would allow you to mix / match the None value with integers without implicit cast to float

jelther · 2019-02-14T00:27:06Z

@jelther getting slightly off topic but if you are getting zeros appended to your integers its because they are getting cast to float at some point. Explicitly constructing your data frame with dtype=object as an argument would allow you to mix / match the None value with integers without implicit cast to float

@WillAyd , I think this is the best alternative but I'm not able to specify since I'm extracting the data from a SQL Server database with "pandas.read_sql". I don't see on the documentation how I would be able to specify the "dtypes" when selecting the data.

jorisvandenbossche · 2019-02-14T22:59:30Z

but the type stability ensured by ExtensionArray[T].fillna -> ExtensionArray[T] is nice.

+1. For eg DatetimeArray, the fill value also needs to be a datetime-like.

gosuto-inzasheru · 2020-02-27T21:23:20Z

I would assume that .fillna() would coerce the series into being of type object when I am trying to fill it with an object. Just like for example when adding a float to it coerces it into being of type float:

>>> pd.Series([1, 2, None], dtype='Int64') + 0.5
0    1.5
1    2.5
2    NaN
dtype: float64

However;

>>> pd.Series([1, 2, None], dtype='Int64').fillna('')
TypeError: <U1 cannot be converted to an IntegerDtype

gosuto-inzasheru · 2020-02-27T21:40:40Z

This behaviour is also displayed when using .fillna() on a series of floats using a string:

>>> pd.Series([1, 2, None], dtype='float64').fillna('')
0    1
1    2
2     
dtype: object

TomAugspurger · 2020-02-29T00:37:31Z

would coerce the series into being of type object when I am trying to fill it with an object

Why do you prefer coercing the series to the dtype of the fill value, rather than the other way around? It's not clear to me that one is preferable to the other.

gosuto-inzasheru · 2020-02-29T07:57:40Z

Because I need to fill the <NA> values with something else. Just like when adding a float to an integer it becomes a float. And like I said, this is already .fillna()'s current behaviour on floats.

But I also understand there is something to say for not doing so. Maybe a boolean argument such as coerce=True would be a solution?

jbrockmendel · 2020-09-22T04:17:42Z

Definitely on the IntegerArray the fillna should raise instead of cast, but in ExtensionBlock.fillna we would expect it to fall back by casting to object

Resolves pandas-dev#25472, resolves pandas-dev#25288.

Resolves pandas-dev#25472, pandas-dev#25288.

alexreg · 2021-10-11T20:47:44Z

@jbrockmendel The docs for ExtensionArray (I presume you meant ExtensionArray rather than the internal ExtensionBlock) now explicitly state: "pandas will recognize instances of this class as proper arrays with a custom type and will not attempt to coerce them to objects". Does that negate your above comment? And if so, should this issue therefore be closed?

jbrockmendel · 2021-10-11T22:26:59Z

@alexreg the methods on ExtensionArray subclasses are in general stricter about what the allow than the Series/DataFrame methods. This less-strict behavior is implemented on the Block subclasses.

alexreg · 2021-10-11T22:33:56Z

@jbrockmendel Oh, I see. That makes more sense now. This seems like a slightly tricky matter to get exactly right, and I'd rather not take it on in my existing PR (or another even). Do you have any inclination to have a go at it yourself?

jbrockmendel · 2021-10-11T22:38:07Z

I'll get to it eventually if no one else does, but it isn't a priority for me ATM.

alexreg · 2021-10-11T23:27:59Z

@jbrockmendel If you don't quickly pointing me to what you think are the right functions/methods to look at, I may have a go (fairly) soon.

jorisvandenbossche · 2021-10-12T06:15:15Z

BTW, I don't think we should change this for the nullable dtypes. I personally think the no-casting behaviour is better (although inconsistent with other dtypes), and unless we have a discussion about which long-term behaviour we want and decide we want casting, IMO we should keep the strict behaviour of fillna for nullable dtypes for now (also on the block level).
This discussion is probably #39584

gosuto-inzasheru · 2021-10-12T09:21:12Z

are you then also saying there is no room for a flag (eg coerce=True) to make it more consistent with other dtypes if needed?

jbrockmendel · 2021-10-12T18:24:20Z

@alexreg notwithstanding Joris's (reasonable) objection, the place where you would change the behavior is in ExtensionBlock.fillna. either a try/except around values = self.values.fillna or a better implementation of _can_hold_element

jorisvandenbossche · 2021-10-12T22:02:26Z

are you then also saying there is no room for a flag (eg coerce=True) to make it more consistent with other dtypes if needed?

Not necessarily, as that has not yet been brought up in the discussion up to now, as far as I know.
Personally, I still don't really see the advantage of that compared to explicitly doing a dtype conversion if you want it (eg if you want to fill the missing integers with a float fill value, you can first cast to float and then fill, eg s.astype("Float64").fillna(some_float))

gosuto-inzasheru · 2021-10-12T22:13:13Z

i brought it up here: #25288 (comment)

i still think it makes more sense to be able to just use .fillna() without explicitly casting first. as i stated earlier in this thread, this is default behaviour for .fillna() on other dtypes, and it also is a logical step when for example adding a float to an Int64.

what you are suggesting from a user perspective, is that now sometimes i can .fillna() directly, and sometimes i will have to cast + fill. as a user i would feel more for consistent behaviour of .fillna().

alexreg · 2021-10-13T00:00:48Z

@jbrockmendel Thanks

For what it's worth, while I certainly see @jorisvandenbossche's point, and feel that sort of behaviour would be appropriate under other circumstances, I agree with @jorijnsmit here. At the end of the day, Pandas has set a precedent for virtually ubiquitous implicit coercion, and to not do so here would indeed seem inconsistent. If we don't want to have implicit coercion here, then we probably shouldn't in lots of other places too, but that design decision has already been made, long ago — and it was probably the right one, given Panda's emphasis on ergonomics over things like strong typing.

WillAyd added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Feb 13, 2019

WillAyd changed the title ~~[0.24.1] New nullable integer can't fill NaN~~ [0.24.1] New nullable integer fillna with non-int doesn't coerce to object Feb 13, 2019

gosuto-inzasheru mentioned this issue Feb 27, 2020

prevent error message when trying to fill strings into an Int64 column nithinmurali/pygsheets#409

Merged

ldsalomone mentioned this issue Mar 6, 2020

import dataframe using set_dataframe, TypeError when column dtype: 'Int64' (with <NA> and integer) nithinmurali/pygsheets#405

Closed

mroeschke added the Bug label Jun 26, 2021

alexreg added a commit to alexreg/pandas that referenced this issue Oct 9, 2021

BUG: permit str dtype -> IntegerDtype conversions

a80118e

Resolves pandas-dev#25472, resolves pandas-dev#25288.

alexreg mentioned this issue Oct 9, 2021

Fix str dtype -> IntegerDtype conversions #43949

Merged

4 tasks

alexreg added a commit to alexreg/pandas that referenced this issue Oct 9, 2021

BUG: permit str dtype -> IntegerDtype conversions

9e19ece

Resolves pandas-dev#25472, resolves pandas-dev#25288.

alexreg added a commit to alexreg/pandas that referenced this issue Oct 10, 2021

BUG: permit str dtype -> IntegerDtype conversions

90c00c3

Resolves pandas-dev#25472, resolves pandas-dev#25288.

alexreg added a commit to alexreg/pandas that referenced this issue Oct 10, 2021

BUG: permit str dtype -> IntegerDtype conversions

1d5f7aa

Resolves pandas-dev#25472, resolves pandas-dev#25288.

alexreg added a commit to alexreg/pandas that referenced this issue Oct 11, 2021

BUG: permit str dtype -> IntegerDtype conversions

ef7888c

Resolves pandas-dev#25472, pandas-dev#25288.

alexreg added a commit to alexreg/pandas that referenced this issue Oct 11, 2021

BUG: permit str dtype -> IntegerDtype conversions

e00d1cc

Resolves pandas-dev#25472, pandas-dev#25288.

jbrockmendel added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Dec 21, 2021

jbrockmendel mentioned this issue Jan 31, 2022

BUG: TypeError when attempting to replace Nullable integer data type with a float value #45729

Closed

3 tasks

roib20 mentioned this issue Feb 1, 2022

BUG: Replacing pd.NA by None has no effect #45601

Closed

3 tasks

phofl mentioned this issue Jan 15, 2023

BUG: behaviour of fillna(NaN) for nullable dtype (or, how to fill nullable array with NaN) #42751

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288

[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288

jelther commented Feb 12, 2019 •

edited

Loading

INSTALLED VERSIONS

WillAyd commented Feb 13, 2019

TomAugspurger commented Feb 13, 2019 via email

jelther commented Feb 14, 2019

WillAyd commented Feb 14, 2019

jelther commented Feb 14, 2019

jorisvandenbossche commented Feb 14, 2019

gosuto-inzasheru commented Feb 27, 2020 •

edited

Loading

gosuto-inzasheru commented Feb 27, 2020

TomAugspurger commented Feb 29, 2020

gosuto-inzasheru commented Feb 29, 2020

jbrockmendel commented Sep 22, 2020

alexreg commented Oct 11, 2021 •

edited

Loading

jbrockmendel commented Oct 11, 2021

alexreg commented Oct 11, 2021

jbrockmendel commented Oct 11, 2021

alexreg commented Oct 11, 2021

jorisvandenbossche commented Oct 12, 2021

gosuto-inzasheru commented Oct 12, 2021

jbrockmendel commented Oct 12, 2021

jorisvandenbossche commented Oct 12, 2021

gosuto-inzasheru commented Oct 12, 2021

alexreg commented Oct 13, 2021

[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288

[0.24.1] New nullable integer fillna with non-int doesn't coerce to object #25288

Comments

jelther commented Feb 12, 2019 • edited Loading

Code Sample

Problem description

Error raised

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Feb 13, 2019

TomAugspurger commented Feb 13, 2019 via email

jelther commented Feb 14, 2019

WillAyd commented Feb 14, 2019

jelther commented Feb 14, 2019

jorisvandenbossche commented Feb 14, 2019

gosuto-inzasheru commented Feb 27, 2020 • edited Loading

gosuto-inzasheru commented Feb 27, 2020

TomAugspurger commented Feb 29, 2020

gosuto-inzasheru commented Feb 29, 2020

jbrockmendel commented Sep 22, 2020

alexreg commented Oct 11, 2021 • edited Loading

jbrockmendel commented Oct 11, 2021

alexreg commented Oct 11, 2021

jbrockmendel commented Oct 11, 2021

alexreg commented Oct 11, 2021

jorisvandenbossche commented Oct 12, 2021

gosuto-inzasheru commented Oct 12, 2021

jbrockmendel commented Oct 12, 2021

jorisvandenbossche commented Oct 12, 2021

gosuto-inzasheru commented Oct 12, 2021

alexreg commented Oct 13, 2021

jelther commented Feb 12, 2019 •

edited

Loading

Output of `pd.show_versions()`

gosuto-inzasheru commented Feb 27, 2020 •

edited

Loading

alexreg commented Oct 11, 2021 •

edited

Loading