Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: column-wise DataFrame.fillna with Series/Dict value #38352

Closed
wants to merge 13 commits into from

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Dec 8, 2020

Picking up #30922

@arw2019 arw2019 added Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Review labels Dec 8, 2020
@@ -260,6 +260,7 @@ Other enhancements
- Added :meth:`~DataFrame.set_flags` for setting table-wide flags on a Series or DataFrame (:issue:`28394`)
- :meth:`DataFrame.applymap` now supports ``na_action`` (:issue:`23803`)
- :class:`Index` with object dtype supports division and multiplication (:issue:`34160`)
- :meth:`DataFrame.fillna` can fill NA values column-wise with a dictionary or :class:`Series` (:issue:`4514`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 1.3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

df = DataFrame([[np.nan, 2], [3, np.nan], [np.nan, np.nan]], columns=list("AB"))
s = Series([100, 200, 300])

expected = DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we have a datetime column mixed in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We end up with object dtype:

In [5]: df = pd.DataFrame([[np.nan, 2], [3, np.nan], [np.nan, np.nan]], columns=list("AB"))
   ...: s = pd.Series(pd.to_datetime([100, 200, 300], unit="ns"))
   ...: 
   ...: result = df.fillna(s, axis=1, downcast="infer")
   ...: result
Out[5]: 
                               A                              B
0  1970-01-01 00:00:00.000000100                            2.0
1                            3.0  1970-01-01 00:00:00.000000200
2  1970-01-01 00:00:00.000000300  1970-01-01 00:00:00.000000300

In [6]: result.dtypes
Out[6]: 
A    object
B    object
dtype: object
In [8]: df = pd.DataFrame({"A": pd.to_datetime([np.nan, 2, np.nan]), "B": pd.to_datetime([3, np.nan, np.nan])})
   ...: s = pd.Series([100, 200, 300])
   ...: 
   ...: result = df.fillna(s, axis=1, downcast="infer")
   ...: result
Out[8]: 
                               A                              B
0                            100  1970-01-01 00:00:00.000000003
1  1970-01-01 00:00:00.000000002                            200
2                            300                            300

In [9]: result.dtypes
Out[9]: 
A    object
B    object

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we want?

@arw2019
Copy link
Member Author

arw2019 commented Dec 15, 2020

Green + addressed comments

@mroeschke
Copy link
Member

Thanks for the PR but it appears this has gone stale. Looks fairly close but let us know if you'd like to continue by merging master and targeting 1.4

@mroeschke mroeschke closed this Jul 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

column-wise fillna with Series/dict NotImplemented
5 participants