Defaulting to_csv to infer compression #22004

dhimmel · 2018-07-20T20:40:12Z

This issue follows up on #17900 by thanks @Dobatymo and @gfyoung with review from @jreback. #17900 added an 'infer' option to compression in _get_handle. The main user-facing benefit here is that df.to_csv will be able to infer compression just like pandas.read_csv. However, unlike read_csv the default value for compression is None rather than 'infer'

Unfortunately, much of the convenience of compression='infer' is lost if you have to explicitly specify it. In summary, I think there is a major convenience to the following command to work and automatically perform gzip compression:

df.to_csv('path.csv.gz')

Compatibility assessment

Defaulting to infer would only affect users who are currently using paths with compression extensions but not actually compressing. That's pretty bad practice IMO. Hence, I'm in favor of breaking backwards compatibility and changing the default for compression to infer. It looks like this would go into the major release 0.24?

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-07-20T21:57:19Z

I agree conceptually. Probably need to handle cases where this would potentially conflict with the compression argument. PRs welcome

dhimmel · 2018-07-21T15:03:50Z

I am happy to open a PR. I think the solution will be as simple as changing the compression default to infer in:

pandas/pandas/core/frame.py

Lines 1714 to 1716 in 322dbf4

    
           def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None, 
        
                      columns=None, header=True, index=True, index_label=None, 
        
                      mode='w', encoding=None, compression=None, quoting=None,

Looks like to_pickle already defaults to infer:

pandas/pandas/io/pickle.py

Line 11 in 322dbf4

def to_pickle(obj, path, compression='infer', protocol=pkl.HIGHEST_PROTOCOL):

to_json should also probably be switched to default to infer:

pandas/pandas/io/json/json.py

Lines 29 to 32 in 322dbf4

    
           def to_json(path_or_buf, obj, orient=None, date_format='epoch', 
        
                       double_precision=10, force_ascii=True, date_unit='ms', 
        
                       default_handler=None, lines=False, compression=None, 
        
                       index=True):

I don't think the other to_* methods have a compression argument but I should double check.

Closes pandas-devgh-22004.

dhimmel mentioned this issue Jul 20, 2018

ENH: Add 'infer' option to compression in _get_handle() #17900

Merged

4 tasks

WillAyd added IO CSV read_csv, to_csv Enhancement labels Jul 20, 2018

WillAyd added this to the Contributions Welcome milestone Jul 21, 2018

dhimmel mentioned this issue Jul 21, 2018

Default to_* methods to compression='infer' #22011

Merged

4 tasks

jreback modified the milestones: Contributions Welcome, 0.24.0 Jul 31, 2018

gfyoung closed this as completed in 93f154c Aug 1, 2018

dberenbaum pushed a commit to dberenbaum/pandas that referenced this issue Aug 3, 2018

API: Default to_* methods to compression='infer' (pandas-dev#22011)

c872e40

Closes pandas-devgh-22004.

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this issue Oct 1, 2018

API: Default to_* methods to compression='infer' (pandas-dev#22011)

bc40588

Closes pandas-devgh-22004.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defaulting to_csv to infer compression #22004

Defaulting to_csv to infer compression #22004

dhimmel commented Jul 20, 2018

WillAyd commented Jul 20, 2018

dhimmel commented Jul 21, 2018

Defaulting to_csv to infer compression #22004

Defaulting to_csv to infer compression #22004

Comments

dhimmel commented Jul 20, 2018

Compatibility assessment

WillAyd commented Jul 20, 2018

dhimmel commented Jul 21, 2018