Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Dataframe fill_na with series #27197

Closed
VibhuJawa opened this issue Jul 2, 2019 · 7 comments
Closed

BUG: Dataframe fill_na with series #27197

VibhuJawa opened this issue Jul 2, 2019 · 7 comments

Comments

@VibhuJawa
Copy link

Code Sample, a copy-pastable example if possible

import pandas as pd

pdf = pd.DataFrame({'a': [1, 2, None], 'b': [None, None, 5]})

fill_value_ser = pd.Series([3,4,5])
filled_df = pdf.fillna(fill_value_ser)
print(filled_df)

print("\n")
filled_ser = pdf['a'].fillna(fill_value_ser)
print(filled_ser)

Output:

     a    b
0  1.0  NaN
1  2.0  NaN
2  NaN  5.0

0    1.0
1    2.0
2    5.0
Name: a, dtype: float64

Problem description

Currently filling a dataframe with a pandas series does not seem to be working as no values are filled.

Filling a single series seems to be working though.

Expected Output

    a    b
0  1.0  3.0
1  2.0  4.0
2  5.0  5.0

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-50-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 5.0.0
pip: 19.1.1
setuptools: 41.0.1
Cython: 0.29.11
numpy: 1.16.4
scipy: None
pyarrow: 0.12.1
xarray: None
IPython: 7.6.0
sphinx: 2.1.2
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@TomAugspurger
Copy link
Contributor

This may be a Series being dict-like

In [4]: pdf.fillna(pd.Series([0, 1], index=['a', 'c']))
Out[4]:
     a    b
0  1.0  NaN
1  2.0  NaN
2  0.0  5.0

So the keys are the index and the values are the values. 2, 'a' is filled there since the series['a'] is 0.

@VibhuJawa
Copy link
Author

Aah, Got it.

Thanks for replying.

I think to be consistent with the documentation as well as with series.fillna , fillna function should
support Series where the repacment happens at each index for all coulmns.

From the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

value : scalar, dict, Series, or DataFrame

Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to
use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame
will not be filled). This value cannot be a list.

@TomAugspurger
Copy link
Contributor

I think to be consistent with the documentation as well as with series.fillna , fillna function should
support Series where the repacment happens at each index for all coulmns.

Can you give an example?

@VibhuJawa
Copy link
Author

I think to be consistent with the documentation as well as with series.fillna , fillna function should
support Series where the replacement happens at each index for all columns.

Can you give an example?

I would like the behavior to be as follows:

import pandas as pd

df = pd.DataFrame({'a': [1, 2, None], 'b': [None, None, 6]})

fill_value_ser = pd.Series([3,4,5])
filled_df = df.fillna(fill_value_ser)

print(filled_df)
   a    b
0  1.0  3.0
1  2.0  4.0
2  5.0  6.0

Essentially equal to the below logic:

filled_df = pd.DataFrame()
for col in df:
    filled_df[col]=df[col].fillna(fill_value_ser)

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 3, 2019 via email

@VibhuJawa
Copy link
Author

Aah, got it. Sorry for the confusion .

Yeah, the logic with df.fillna(series, axis=1) is what i need.
I still fell the documentation is misleading though.

From the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

value : scalar, dict, Series, or DataFrame

Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to
use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame
will not be filled). This value cannot be a list.

@TomAugspurger
Copy link
Contributor

I'm going to close this as a duplicate of #4514.

We always welcome improvements to the docs if you want to make a PR for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants