Pivot_table drop rows whose entries are all NaN #21969

noob-saibot · 2018-07-18T12:04:19Z

>>> dataframe = pd.DataFrame([['a', 1, 10], ['b', 10, 100], ['c', None, None], ['a', 2, 4]], columns=['lit', 'num1', 'num2'])
>>> dataframe.pivot_table(index='lit', columns='num1', values='num2', aggfunc='max')
num1  1.0   2.0    10.0
lit
a     10.0   4.0    NaN
b      NaN   NaN  100.0

Pivot_table is silently dropping row whose entries fully consisting with NaN. (according to the documentation - dropna : boolean, default True; Do not include columns whose entries are all NaN)

It works fine at version 0.21.1 and 0.22.0.

I found only old bug from 2013.

Expected Output

>>> dataframe.pivot_table(index='lit', columns='num1', values='num2', aggfunc='max')
num1    1    2      10
lit
a     10.0  4.0    NaN
b      NaN  NaN  100.0
c      NaN  NaN    NaN

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.4.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.1
pytest: None
pip: 9.0.1
setuptools: 28.8.0
Cython: 0.27
numpy: 1.14.5
scipy: None
pyarrow: 0.9.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

gfyoung · 2018-07-18T23:59:05Z

@noob-saibot : Can you add a reference to the old issue?

cc @jreback : Is it intended to drop NA from rows as well? Or is this a regression?

noob-saibot · 2018-07-19T07:25:15Z

@gfyoung issue 3729

neomis · 2018-11-20T15:56:07Z

I'm also experiencing this issue in 0.23.4 with an additional snag. If you set dropna=False it will add nonexistent rows to the output table.

import pandas as pd
from numpy import nan
data_set = [
    {'x': 0, 'y': 0, 'parameter_id': 'a', 'result_value': 3.0},
    {'x': 0, 'y': 0, 'parameter_id': 'b', 'result_value': 1.0},
    {'x': 0, 'y': 0, 'parameter_id': 'c', 'result_value': 3.0},
    {'x': 0, 'y': 0, 'parameter_id': 'd', 'result_value': nan},
    {'x': 0, 'y': 1, 'parameter_id': 'a', 'result_value': 1.0},
    {'x': 0, 'y': 1, 'parameter_id': 'b', 'result_value': 3.0},
    {'x': 0, 'y': 1, 'parameter_id': 'c', 'result_value': nan},
    {'x': 0, 'y': 1, 'parameter_id': 'd', 'result_value': nan},
    {'x': 0, 'y': 2, 'parameter_id': 'a', 'result_value': 1.0},
    {'x': 0, 'y': 2, 'parameter_id': 'b', 'result_value': 3.0},
    {'x': 0, 'y': 2, 'parameter_id': 'c', 'result_value': nan},
    {'x': 0, 'y': 2, 'parameter_id': 'd', 'result_value': nan},
    {'x': 1, 'y': 0, 'parameter_id': 'a', 'result_value': 1.0},
    {'x': 1, 'y': 0, 'parameter_id': 'b', 'result_value': 3.0},
    {'x': 1, 'y': 0, 'parameter_id': 'c', 'result_value': nan},
    {'x': 1, 'y': 0, 'parameter_id': 'd', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'a', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'b', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'c', 'result_value': nan},
    {'x': 1, 'y': 1, 'parameter_id': 'd', 'result_value': nan}]

df = pd.DataFrame(data_set)
df.pivot_table(index=['x', 'y'], columns='parameter_id',
                           values='result_value', dropna=False).reset_index()

Output

parameter_id  x  y    a    b    c   d
0             0  0  3.0  1.0  3.0 NaN
1             0  1  1.0  3.0  NaN NaN
2             0  2  1.0  3.0  NaN NaN
3             1  0  1.0  3.0  NaN NaN
4             1  1  NaN  NaN  NaN NaN
5             1  2  NaN  NaN  NaN NaN

Expected Output

parameter_id  x  y    a    b    c   d
0             0  0  3.0  1.0  3.0 NaN
1             0  1  1.0  3.0  NaN NaN
2             0  2  1.0  3.0  NaN NaN
3             1  0  1.0  3.0  NaN NaN
4             1  1  NaN  NaN  NaN NaN

gfyoung added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Jul 18, 2018

tompollard mentioned this issue Aug 1, 2018

pivot_table drops columns for aggregate functions that return all None, even if dropna=False #22159

Closed

john-bodley mentioned this issue Aug 13, 2019

[viz] Revert dropna logic for pivot tables apache/superset#8040

Merged

12 tasks

mroeschke added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot_table drop rows whose entries are all NaN #21969

Pivot_table drop rows whose entries are all NaN #21969

noob-saibot commented Jul 18, 2018 •

edited

Loading

gfyoung commented Jul 18, 2018

noob-saibot commented Jul 19, 2018 •

edited

Loading

neomis commented Nov 20, 2018

Pivot_table drop rows whose entries are all NaN #21969

Pivot_table drop rows whose entries are all NaN #21969

Comments

noob-saibot commented Jul 18, 2018 • edited Loading

Expected Output

Output of pd.show_versions()

gfyoung commented Jul 18, 2018

noob-saibot commented Jul 19, 2018 • edited Loading

neomis commented Nov 20, 2018

noob-saibot commented Jul 18, 2018 •

edited

Loading

Output of `pd.show_versions()`

noob-saibot commented Jul 19, 2018 •

edited

Loading