Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:Pivot table drops column/index names=nan when dropna=false #16142

Closed
wants to merge 2 commits into from

Conversation

OXPHOS
Copy link
Contributor

@OXPHOS OXPHOS commented Apr 26, 2017

@@ -548,10 +548,6 @@ def _validate_categories(cls, categories, fastpath=False):

if not fastpath:

# Categories cannot contain NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have some unintentional changes in here? This shouldn't be removed.

Copy link
Contributor Author

@OXPHOS OXPHOS Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the Index, Index([None, u'A', u'B'], dtype='object'), needs to be passed to Categorical when doing MultiIndex, as when dropna=False, None could also be the index/column name. Or I didn't get this correctly?

@@ -159,15 +159,15 @@ def pivot_table(data, values=None, index=None, columns=None, aggfunc='mean',
if isinstance(table, DataFrame):
table = table.sort_index(axis=1)

if fill_value is not None:
table = table.fillna(value=fill_value, downcast='infer')

if margins:
if dropna:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this if dropna, most of the tests pass (including a fix for this one) other than

        df = pd.DataFrame({'a': [1, 2, 2, 2, 2, np.nan],
                           'b': [3, 3, 4, 4, 4, 4]})
        actual = pd.crosstab(df.a, df.b, margins=True, dropna=False)
        expected = pd.DataFrame([[1, 0, 1], [1, 3, 4], [2, 4, 6]])
        expected.index = Index([1.0, 2.0, 'All'], name='a')
        expected.columns = Index([3, 4, 'All'], name='b')

Here's the result and expected

(Pdb) pp actual
b    3  4  All
a
1.0  1  0    1
2.0  1  3    4
All  2  3    5
(Pdb) pp expected
b    3  4  All
a
1.0  1  0    1
2.0  1  3    4
All  2  4    6

You have more experience with this section of the code than I do, but the margins on the expected look incorrect to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely right. I think it should be (if dropna=False):

b 3 4 All
a
1.0 1 0 1
2.0 1 3 4
np.nan 0 1 1
All 2 4 6
  • I need to fix the np.nan as it is still being ignored even with the current fix when dropna=False (i.e. the fix only works for None)
  • Actually I didn't get why removing dropna here would help yet. I'll check closer.

@jreback
Copy link
Contributor

jreback commented Apr 27, 2017

this is doing similar things to changes in #12607

@TomAugspurger
Copy link
Contributor

this is doing similar things to changes in #12607

Ah I see. That is a much larger change that the original issue I was looking at :)

@OXPHOS OXPHOS changed the title Fix 14072 pivot_table dropna BUG:Pivot table drops column/index names=nan when dropna=false Apr 27, 2017
@OXPHOS
Copy link
Contributor Author

OXPHOS commented Apr 27, 2017

I think the change in Cython is definitely required. The problem is how to pass dropna to it without disturbing too many existing structures.
I just reset my developing environment and am trying to use Anaconda Python2.7. Interestingly, numerous tests failed on me even with the master branch. So I just tested pivot and groupby at local. I'll do more research on the weekend.

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 27, 2017
@OXPHOS
Copy link
Contributor Author

OXPHOS commented May 1, 2017

Some tests will be failing and many are actually different/separate problems. I already located several and will update soon.

@jreback
Copy link
Contributor

jreback commented Jun 10, 2017

can you rebase and update?

@jreback
Copy link
Contributor

jreback commented Jul 26, 2017

needs a rebase. if you'd like to continue, pls comment.

@jreback jreback closed this Jul 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pivot_table margins bottom-left total does not correspond to other content when dropna=False
3 participants