Pivot table drops column/index names=nan when dropna=false #14246

OXPHOS · 2016-09-18T17:53:08Z

closes pivot_table margins bottom-left total does not correspond to other content when dropna=False #14072
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

OXPHOS · 2016-09-18T17:56:36Z

Input:

import pandas as pd
M = pd.DataFrame([
[1, 'a', 'A'], 
[1, 'b', 'B'], 
[1, 'c', None]], 
columns=['x', 'y', 'z'])
P = M.pivot_table(values='x', index='y', columns='z', aggfunc='sum',
                             fill_value=0, margins=True, dropna=False)
print P

Output was:

z	A	B	All
y
a	1.0	0.0	1.0
b	0.0	1.0	1.0
All	1.0	1.0	3.0

Output after fix:

z	NaN	A	B	All
y
a	0.0	0.0	1.0	1.0
b	0.0	1.0	0.0	1.0
c	1.0	0.0	0.0	1.0
All	1.0	1.0	1.0	3.0

I think the rearrangement of columns is due to the hash function problem similar as the one in #12679 and needs extra work to solve.

jreback · 2016-09-18T18:10:15Z

pandas/core/categorical.py

@@ -230,7 +230,7 @@ class Categorical(PandasObject):
    _typ = 'categorical'

    def __init__(self, values, categories=None, ordered=False,
-                 name=None, fastpath=False):


not a good idea to do this

@jreback So the key of the issue is to pass a dropna to core/algorithms.factorize(), and factorize() is called directly in categorical.__init__(). Without the dropna here, we lose the dropna value from m = MultiIndex.from_arrays(cartesian_product(table.columns.levels), names=table.columns.names) and here, and generate the wrong result:

z NaN A B All

y

a 0 1 1

b 0 0 1 1

c 0 0 0 1

All NaN 1 1 3

While the correct result should be:

z NaN A B All

y

a 0 1 1

b 0 0 1 1

c 1 0 0 1

All 1 1 1 3

jreback · 2016-09-18T18:10:33Z

pandas/core/frame.py

@@ -4323,7 +4323,8 @@ def infer(x):
    # ----------------------------------------------------------------------
    # Merging / joining methods

-    def append(self, other, ignore_index=False, verify_integrity=False):
+    def append(self, other, ignore_index=False, verify_integrity=False,
+               dropna=True):


again not a good idea

I'll see what I can do about this one

this can be removed

Sorry I was wrong about it. The removal of the dropna fails test pandas.tools.tests.test_pivot.test_margin_dropna. When margins are appended, the missing of dropna leads to the drop of NaN level. This could also be solved by specifying the dropna parameter in MultiIndex class constructor.

jreback · 2016-09-18T18:11:07Z

pandas/indexes/base.py

@@ -1392,7 +1392,7 @@ def __getitem__(self, key):
        else:
            return result

-    def append(self, other):
+    def append(self, other, dropna=True):


The problem here is that I need to call the derived append() in Multi.py and the dropna in Multi.append() cannot be omitted. The possible way to remove this dropna as well as the one here is: have an parameter dropna in MultiIndex.__init__(). I am not sure how you like this idea. I would be appreciated if you could tell me your thoughts on it. Thanks!

jreback · 2016-09-18T18:11:13Z

pandas/indexes/multi.py


    @classmethod
-    def from_tuples(cls, tuples, sortorder=None, names=None):
+    def from_tuples(cls, tuples, sortorder=None, names=None, dropna=True):
        """


This is removed

And this is the same as here

jreback · 2016-09-18T18:11:28Z

pandas/tools/merge.py

@@ -1269,7 +1269,7 @@ def _get_join_keys(llab, rlab, shape, sort):

 def concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
           keys=None, levels=None, names=None, verify_integrity=False,
-           copy=True):
+           copy=True, dropna=True):


jreback · 2016-09-18T18:12:05Z

test_margin_dropna.py

+import numpy as np
+
+
+a = np.array(['foo', 'foo', 'foo', 'bar',


pls put tests with current tests

Yeah this should not be here.

OXPHOS · 2016-09-18T18:26:30Z

@jreback I realized that this issue has a lot overlap with Issue #3729 and PR #12607.
The problem is that dropna is not passed at all from pivot_table and the algorithm.factorize just set the dropna=True.
I don't see other options than passing the param on and adding a param to classes seems to be the easiest solution.
I did the nosetests at local and it is failing on 2 out of 10000+ with weird type errors. I think these can be fixed.

codecov-io · 2016-09-22T03:45:56Z

Current coverage is 84.77% (diff: 100%)

Merging #14246 into master will increase coverage by <.01%

@@             master     #14246   diff @@
==========================================
  Files           145        145          
  Lines         51129      51154    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43343      43368    +25   
  Misses         7786       7786          
  Partials          0          0

Powered by Codecov. Last update 74e20a0...c27d3d3

OXPHOS · 2016-09-22T03:57:31Z

@jreback All tests pass now. I can move dropna out of __init__ if you think the modification is too aggressive, but it's gonna be quite verbose. [EDIT: I was thinking wrong. I need to think about how to restrain the use of dropna in groupby.py.]
Also the tests of pivot_table are a little bit messy..I will clean them up and add the test for dropna if you think we could continue working on this PR.

OXPHOS · 2016-09-22T04:36:44Z

Sorry I was thinking wrong. Will figure out how to limit the use of dropna in groupby.py

jreback · 2016-11-25T14:38:10Z

can you rebase

OXPHOS · 2016-12-03T14:58:47Z

@jreback It seems that the appveyor exceeded time limit.

Also I did a pep8radius master --diff and it suggested changes to 13 files I haven't touched. Not sure whether I should update those as well.

jreback · 2016-12-30T21:23:26Z

can you rebase and we can see where this is

OXPHOS · 2017-01-03T05:28:23Z

@jreback I made some major changes and now the changes are restrained to pivot.py. Also some tests with dropna=False were not accurate.

jreback · 2017-01-03T11:31:03Z

pandas/tools/pivot.py

+    keys = index + columns
+
+    if not dropna:
+        key_data = np.array(data[keys], dtype='object')


what the heck is this?

@jreback converting NaN values in keys to special strings to avoid the passing of dropna around.

that is not acceptable, we use masks if needed. converting things like that will just lead to future reader confusion and bugs.

OXPHOS · 2017-01-23T06:12:56Z

Update: There is a conflict between the merge recently (#14944) and the fix I am trying to do. I need the check_null here to be dropna. I am still trying to solve the conflict.

jreback · 2017-03-20T13:58:29Z

@OXPHOS can you rebase / update

OXPHOS · 2017-04-26T05:57:13Z

I started over and now I'm half way through. I'll open a new pr once I cleaned up.

jreback requested changes Sep 18, 2016

View reviewed changes

jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 19, 2016

OXPHOS force-pushed the pivot_table_dropna branch 2 times, most recently from 73c42bc to 2e679ea Compare September 28, 2016 21:46

OXPHOS force-pushed the pivot_table_dropna branch from 2e679ea to 72f98e6 Compare December 3, 2016 05:59

OXPHOS added 3 commits January 2, 2017 19:20

fix dropna=false tests

8d75c55

fix MultiIndex initiation from np.nan

0f38f43

fix indexes dropna=false

2e3f8e0

OXPHOS force-pushed the pivot_table_dropna branch from 72f98e6 to 2e3f8e0 Compare January 3, 2017 02:40

fix style

c27d3d3

jreback reviewed Jan 3, 2017

View reviewed changes

OXPHOS closed this Apr 26, 2017

OXPHOS mentioned this pull request Apr 26, 2017

BUG:Pivot table drops column/index names=nan when dropna=false #16142

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pivot table drops column/index names=nan when dropna=false #14246

Pivot table drops column/index names=nan when dropna=false #14246

OXPHOS commented Sep 18, 2016 •

edited

Loading

OXPHOS commented Sep 18, 2016

jreback Sep 18, 2016

OXPHOS Sep 28, 2016 •

edited

Loading

jreback Sep 18, 2016

OXPHOS Sep 22, 2016

OXPHOS Sep 28, 2016

OXPHOS Oct 1, 2016

jreback Sep 18, 2016

OXPHOS Sep 28, 2016 •

edited

Loading

jreback Sep 18, 2016

OXPHOS Sep 28, 2016

OXPHOS Oct 1, 2016

jreback Sep 18, 2016

jreback Sep 18, 2016

OXPHOS Sep 18, 2016

OXPHOS commented Sep 18, 2016

codecov-io commented Sep 22, 2016 •

edited

Loading

OXPHOS commented Sep 22, 2016 •

edited

Loading

OXPHOS commented Sep 22, 2016

jreback commented Nov 25, 2016

OXPHOS commented Dec 3, 2016

jreback commented Dec 30, 2016

OXPHOS commented Jan 3, 2017

jreback Jan 3, 2017

OXPHOS Jan 3, 2017 •

edited

Loading

jreback Jan 9, 2017

OXPHOS commented Jan 23, 2017

jreback commented Mar 20, 2017

OXPHOS commented Apr 26, 2017

Pivot table drops column/index names=nan when dropna=false #14246

Pivot table drops column/index names=nan when dropna=false #14246

Conversation

OXPHOS commented Sep 18, 2016 • edited Loading

OXPHOS commented Sep 18, 2016

Choose a reason for hiding this comment

OXPHOS Sep 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OXPHOS Sep 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OXPHOS commented Sep 18, 2016

codecov-io commented Sep 22, 2016 • edited Loading

Current coverage is 84.77% (diff: 100%)

OXPHOS commented Sep 22, 2016 • edited Loading

OXPHOS commented Sep 22, 2016

jreback commented Nov 25, 2016

OXPHOS commented Dec 3, 2016

jreback commented Dec 30, 2016

OXPHOS commented Jan 3, 2017

Choose a reason for hiding this comment

OXPHOS Jan 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OXPHOS commented Jan 23, 2017

jreback commented Mar 20, 2017

OXPHOS commented Apr 26, 2017

OXPHOS commented Sep 18, 2016 •

edited

Loading

OXPHOS Sep 28, 2016 •

edited

Loading

OXPHOS Sep 28, 2016 •

edited

Loading

codecov-io commented Sep 22, 2016 •

edited

Loading

OXPHOS commented Sep 22, 2016 •

edited

Loading

OXPHOS Jan 3, 2017 •

edited

Loading