Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty dataframe with groupby when nan column #16217

Closed
atbd opened this issue May 3, 2017 · 1 comment
Closed

Empty dataframe with groupby when nan column #16217

atbd opened this issue May 3, 2017 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request Error Reporting Incorrect or improved errors from pandas Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@atbd
Copy link
Contributor

atbd commented May 3, 2017

Hi,
I encountered a 'problem' when my program tried to groupby a dataframe with an empty (ie full of nan) column. Here a code sample with pandas 0.19.2 :

In [4]: a = pd.DataFrame({'a': [2,2,3,4,5], 
                          'b': [5,5,6,6,7], 
                          'c': [np.nan]*5, 
                          'sum': [1]*5})

In [5]: a
Out[5]: 
   a  b   c  sum
0  2  5 NaN    1
1  2  5 NaN    1
2  3  6 NaN    1
3  4  6 NaN    1
4  5  7 NaN    1

In [6]: a.groupby(['a','b','c']).agg({'sum':'sum'})
Out[6]: 
Empty DataFrame
Columns: [sum]
Index: []

In [7]: a.groupby(['a','b']).agg({'sum':'sum'})
Out[7]: 
     sum
a b     
2 5    2
3 6    1
4 6    1
5 7    1

I think we could throw a warning (or maybe an error) when this case is encountered in addition to return an empty dataframe.

INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-75-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: 0.7.9.None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@chris-b1
Copy link
Contributor

chris-b1 commented May 3, 2017

this is a duplicate of #3729. Some progress in #12607, but in the meantime would also probably take a PR for a better error/warning too.

@chris-b1 chris-b1 closed this as completed May 3, 2017
@chris-b1 chris-b1 added Duplicate Report Duplicate issue or pull request Error Reporting Incorrect or improved errors from pandas Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels May 3, 2017
@chris-b1 chris-b1 added this to the No action milestone May 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Error Reporting Incorrect or improved errors from pandas Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

2 participants