Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: to_csv and multiindex columns with header kw #5539

Closed
jankatins opened this issue Nov 17, 2013 · 5 comments
Closed

Regression: to_csv and multiindex columns with header kw #5539

jankatins opened this issue Nov 17, 2013 · 5 comments
Labels
API Design Bug IO CSV read_csv, to_csv MultiIndex Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jankatins
Copy link
Contributor

This used to work (October 2012), but doesn't anymore:

from pandas import DataFrame
import numpy as np
import StringIO
a = ["a","b","a","b","a","b","a","b","a","b","a","b"]
b = ["c","d","e","c","d","e","c","d","e","c","d","e"]
c = [1,2,3,4,5,6,7,8,9,10,11,12]
d = list(reversed(c))
df = DataFrame({"a":a, "b":b, "c":c, "d":d})
_agg_funs = [np.mean, np.std, np.min, np.max]
groupby_variables = ["a","b"]
df_grouped = df.groupby(groupby_variables, as_index=True).agg(_agg_funs)
output = StringIO.StringIO()
df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns])
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + [var + "_" + agg for (var, agg) in df_grouped.columns]
print(index == expected_index) # This was true in October 2012!
print(index)
print(expected_index) 

False
['', '', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']

Probably related to #3575

@jankatins
Copy link
Contributor Author

It seems that "header" is simple ignored :-(

This also does not work:

df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns], index_label=df_grouped.index.names, index=False)

@jankatins
Copy link
Contributor Author

This finally worked:

[...]
cols = [var + "_" + agg for (var, agg) in df_grouped.columns]
df_grouped.columns = cols
output = StringIO.StringIO()
df_grouped.to_csv(output)
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + cols
print(index == expected_index)
print(index)
print(expected_index) # And worked in october 2012!

True
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']

@jreback
Copy link
Contributor

jreback commented Nov 18, 2013

header only applies to read_csv
Their is an option tupleize_cols which you can set to get the prior 0.12 behavior of writing tuples for the column multi-index if you want (though as of 0.13, its not necessary and turned off)

This is your first example on master

In [31]: output = StringIO.StringIO()

In [32]: df_grouped.to_csv(output)

In [33]: print output.getvalue()
,,c,c,c,c,d,d,d,d
,,mean,std,amin,amax,mean,std,amin,amax
a,b,,,,,,,,
a,c,4,4.242640687119285,1,7,9,4.242640687119285,6,12
a,d,8,4.242640687119285,5,11,5,4.242640687119285,2,8
a,e,6,4.242640687119285,3,9,7,4.242640687119285,4,10
b,c,7,4.242640687119285,4,10,6,4.242640687119285,3,9
b,d,5,4.242640687119285,2,8,8,4.242640687119285,5,11
b,e,9,4.242640687119285,6,12,4,4.242640687119285,1,7


In [34]: pd.read_csv(StringIO.StringIO(output.getvalue()),header=[0,1],index_col=[0,1])
Out[34]: 
        c                           d                      
     mean       std  amin  amax  mean       std  amin  amax
a b                                                        
a c     4  4.242641     1     7     9  4.242641     6    12
  d     8  4.242641     5    11     5  4.242641     2     8
  e     6  4.242641     3     9     7  4.242641     4    10
b c     7  4.242641     4    10     6  4.242641     3     9
  d     5  4.242641     2     8     8  4.242641     5    11
  e     9  4.242641     6    12     4  4.242641     1     7

@jankatins
Copy link
Contributor Author

I did use master (or something a few days old). I found that very surprising, as my "half a year old code" broke with the newer pandas due to this (I wanted to import that into R).

Also:

String Form:<unbound method DataFrame.to_csv>
[...]
    def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
               cols=None, header=True, index=True, index_label=None,
               mode='w', nanRep=None, encoding=None, quoting=None,
               line_terminator='\n', chunksize=None,
               tupleize_cols=False, date_format=None, **kwds):
        r"""Write DataFrame to a comma-separated values (csv) file

        Parameters
        ----------
[...]
        header : boolean or list of string, default True
            Write out column names. If a list of string is given it is assumed
            to be aliases for the column names

Collab Edit: #4797

@jreback
Copy link
Contributor

jreback commented Nov 18, 2013

might not be well tested with a column multi index - and is actually very odd in that case anyhow

marking as a bug/API issue for 0.14

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 4, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539
@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 4, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 4, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539
gfyoung added a commit that referenced this issue Nov 5, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes gh-5539
1kastner pushed a commit to 1kastner/pandas that referenced this issue Nov 5, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539
No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes pandas-devgh-5539

(cherry picked from commit e1f3a70)
TomAugspurger pushed a commit that referenced this issue Dec 11, 2017
Previously, MultiIndex columns weren't being
overwritten when header was passed in for to_csv.

Closes gh-5539

(cherry picked from commit e1f3a70)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug IO CSV read_csv, to_csv MultiIndex Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

2 participants