Regression: to_csv and multiindex columns with header kw #5539

jankatins · 2013-11-17T16:35:39Z

This used to work (October 2012), but doesn't anymore:

from pandas import DataFrame
import numpy as np
import StringIO
a = ["a","b","a","b","a","b","a","b","a","b","a","b"]
b = ["c","d","e","c","d","e","c","d","e","c","d","e"]
c = [1,2,3,4,5,6,7,8,9,10,11,12]
d = list(reversed(c))
df = DataFrame({"a":a, "b":b, "c":c, "d":d})
_agg_funs = [np.mean, np.std, np.min, np.max]
groupby_variables = ["a","b"]
df_grouped = df.groupby(groupby_variables, as_index=True).agg(_agg_funs)
output = StringIO.StringIO()
df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns])
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + [var + "_" + agg for (var, agg) in df_grouped.columns]
print(index == expected_index) # This was true in October 2012!
print(index)
print(expected_index) 

False
['', '', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']

Probably related to #3575

jankatins · 2013-11-17T16:46:18Z

It seems that "header" is simple ignored :-(

This also does not work:

df_grouped.to_csv(output, header=[var + "_" + agg for (var, agg) in df_grouped.columns], index_label=df_grouped.index.names, index=False)

jankatins · 2013-11-17T16:51:36Z

This finally worked:

[...]
cols = [var + "_" + agg for (var, agg) in df_grouped.columns]
df_grouped.columns = cols
output = StringIO.StringIO()
df_grouped.to_csv(output)
index = output.getvalue().split("\n")[0].split(",")
expected_index = groupby_variables + cols
print(index == expected_index)
print(index)
print(expected_index) # And worked in october 2012!

True
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']
['a', 'b', 'c_mean', 'c_std', 'c_amin', 'c_amax', 'd_mean', 'd_std', 'd_amin', 'd_amax']

jreback · 2013-11-18T20:05:35Z

header only applies to read_csv
Their is an option tupleize_cols which you can set to get the prior 0.12 behavior of writing tuples for the column multi-index if you want (though as of 0.13, its not necessary and turned off)

This is your first example on master

In [31]: output = StringIO.StringIO()

In [32]: df_grouped.to_csv(output)

In [33]: print output.getvalue()
,,c,c,c,c,d,d,d,d
,,mean,std,amin,amax,mean,std,amin,amax
a,b,,,,,,,,
a,c,4,4.242640687119285,1,7,9,4.242640687119285,6,12
a,d,8,4.242640687119285,5,11,5,4.242640687119285,2,8
a,e,6,4.242640687119285,3,9,7,4.242640687119285,4,10
b,c,7,4.242640687119285,4,10,6,4.242640687119285,3,9
b,d,5,4.242640687119285,2,8,8,4.242640687119285,5,11
b,e,9,4.242640687119285,6,12,4,4.242640687119285,1,7


In [34]: pd.read_csv(StringIO.StringIO(output.getvalue()),header=[0,1],index_col=[0,1])
Out[34]: 
        c                           d                      
     mean       std  amin  amax  mean       std  amin  amax
a b                                                        
a c     4  4.242641     1     7     9  4.242641     6    12
  d     8  4.242641     5    11     5  4.242641     2     8
  e     6  4.242641     3     9     7  4.242641     4    10
b c     7  4.242641     4    10     6  4.242641     3     9
  d     5  4.242641     2     8     8  4.242641     5    11
  e     9  4.242641     6    12     4  4.242641     1     7

jankatins · 2013-11-18T22:00:47Z

I did use master (or something a few days old). I found that very surprising, as my "half a year old code" broke with the newer pandas due to this (I wanted to import that into R).

Also:

String Form:<unbound method DataFrame.to_csv>
[...]
    def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
               cols=None, header=True, index=True, index_label=None,
               mode='w', nanRep=None, encoding=None, quoting=None,
               line_terminator='\n', chunksize=None,
               tupleize_cols=False, date_format=None, **kwds):
        r"""Write DataFrame to a comma-separated values (csv) file

        Parameters
        ----------
[...]
        header : boolean or list of string, default True
            Write out column names. If a list of string is given it is assumed
            to be aliases for the column names

Collab Edit: #4797

jreback · 2013-11-18T22:05:01Z

might not be well tested with a column multi index - and is actually very odd in that case anyhow

marking as a bug/API issue for 0.14

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes gh-5539

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539 (cherry picked from commit e1f3a70)

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes gh-5539 (cherry picked from commit e1f3a70)

jreback modified the milestones: 0.15.0, 0.14.0 Mar 28, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 4, 2017

BUG: Override mi-columns in to_csv if requested

c9c39c5

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539

gfyoung mentioned this issue Nov 4, 2017

BUG: Override mi-columns in to_csv if requested #18110

Merged

jreback modified the milestones: Next Major Release, 0.21.1 Nov 4, 2017

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 4, 2017

BUG: Override mi-columns in to_csv if requested

ccd5098

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539

gfyoung closed this as completed in #18110 Nov 5, 2017

gfyoung added a commit that referenced this issue Nov 5, 2017

BUG: Override mi-columns in to_csv if requested (#18110)

e1f3a70

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes gh-5539

1kastner pushed a commit to 1kastner/pandas that referenced this issue Nov 5, 2017

BUG: Override mi-columns in to_csv if requested (pandas-dev#18110)

8587a3d

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539

No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

BUG: Override mi-columns in to_csv if requested (pandas-dev#18110)

3d824fb

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes pandas-devgh-5539

TomAugspurger pushed a commit that referenced this issue Dec 11, 2017

BUG: Override mi-columns in to_csv if requested (#18110)

283bba9

Previously, MultiIndex columns weren't being overwritten when header was passed in for to_csv. Closes gh-5539 (cherry picked from commit e1f3a70)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: to_csv and multiindex columns with header kw #5539

Regression: to_csv and multiindex columns with header kw #5539

jankatins commented Nov 17, 2013

jankatins commented Nov 17, 2013

jankatins commented Nov 17, 2013

jreback commented Nov 18, 2013

jankatins commented Nov 18, 2013

jreback commented Nov 18, 2013

Regression: to_csv and multiindex columns with header kw #5539

Regression: to_csv and multiindex columns with header kw #5539

Comments

jankatins commented Nov 17, 2013

jankatins commented Nov 17, 2013

jankatins commented Nov 17, 2013

jreback commented Nov 18, 2013

jankatins commented Nov 18, 2013

jreback commented Nov 18, 2013