Default to_* methods to compression='infer' #22011

dhimmel · 2018-07-21T15:14:44Z

closes Defaulting to_csv to infer compression #22004
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This PR does the following:

Update default compression for to_csv, to_json, and to_pickle methods to infer.
Adds test_compression_defaults_to_infer to test that compression='infer' is default for the relevant to_* methods.
Fixes a bug in CSVFormatter where setting compression='infer' with a file object would produce a RuntimeWarning.
Adds documentation to test_compression_warning which can fail due to a pytest bug.
Cleans up how the encoding argument in CSVFormatter is processed.
Moves compression tests from pandas/tests/test_common.py to pandas/tests/io/test_common.py

dhimmel · 2018-07-21T15:15:21Z

pandas/io/json/json.py

@@ -28,7 +28,7 @@
 # interface to/from
 def to_json(path_or_buf, obj, orient=None, date_format='epoch',
            double_precision=10, force_ascii=True, date_unit='ms',
-            default_handler=None, lines=False, compression=None,
+            default_handler=None, lines=False, compression='infer',


Not sure where to update the to_json docs... didn't see a docstring in this function.

pandas/pandas/core/generic.py

Line 1905 in 322dbf4

Convert the object to a JSON string.

WillAyd

Needs tests

dhimmel · 2018-07-21T16:01:22Z

The following test is failing in Python 2.7 (see travis log):

pandas/pandas/tests/test_common.py

Lines 254 to 264 in 1033e8b

    
           # GH 21227 
        
           def test_compression_warning(compression_only): 
        
               df = DataFrame(100 * [[0.123456, 0.234567, 0.567567], 
        
                                     [12.32112, 123123.2, 321321.2]], 
        
                              columns=['X', 'Y', 'Z']) 
        
               with tm.ensure_clean() as filename: 
        
                   f, _handles = _get_handle(filename, 'w', compression=compression_only) 
        
                   with tm.assert_produces_warning(RuntimeWarning, 
        
                                                   check_stacklevel=False): 
        
                       with f: 
        
                           df.to_csv(f, compression=compression_only)

Let me look into #21227 as to what this test is for.

Update: this test was added in 91451cb / #21478 (not #21227). @minggli can you explain the purpose of test_compression_warning? I'm not sure why switching the compression default is causing this to fail. It seems like in Python 2, the test expects a RuntimeWarning that is not occurring.

Here is the relevant pytest fixture:

pandas/pandas/conftest.py

Lines 131 to 138 in 55cbd7d

    
           @pytest.fixture(params=['gzip', 'bz2', 'zip', 
        
                                   pytest.param('xz', marks=td.skip_if_no_lzma)]) 
        
           def compression_only(request): 
        
               """ 
        
               Fixture for trying common compression types in compression tests excluding 
        
               uncompressed case 
        
               """ 
        
               return request.param

codecov · 2018-07-21T18:43:39Z

Codecov Report

Merging #22011 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22011      +/-   ##
==========================================
- Coverage   92.07%   92.06%   -0.01%     
==========================================
  Files         170      170              
  Lines       50690    50704      +14     
==========================================
+ Hits        46672    46680       +8     
- Misses       4018     4024       +6

Flag	Coverage Δ
#multiple	`90.47% <100%> (-0.01%)`	⬇️
#single	`42.3% <100%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/series.py	`94.11% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.26% <ø> (ø)`	⬆️
pandas/io/json/json.py	`92.47% <ø> (ø)`	⬆️
pandas/core/generic.py	`96.47% <ø> (ø)`	⬆️
pandas/io/formats/csvs.py	`98.21% <100%> (+0.55%)`	⬆️
pandas/core/arrays/datetimelike.py	`94.02% <0%> (-1.04%)`	⬇️
pandas/core/dtypes/common.py	`94.87% <0%> (-0.34%)`	⬇️
pandas/util/testing.py	`85.69% <0%> (-0.21%)`	⬇️
pandas/core/indexes/datetimes.py	`95.54% <0%> (-0.14%)`	⬇️
pandas/core/indexing.py	`93.79% <0%> (-0.03%)`	⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d30c4a0...cf5b62e. Read the comment docs.

dhimmel · 2018-07-21T19:08:37Z

Needs tests

@WillAyd I believe the different values for compression are already being tested, such that we don't need to test that compression='infer' works. Are you saying that I should be testing that changing the default actually changes the default? It seems like that could create an excessive amount of tests were we to test every default argument, but am happy to proceed as you see fit.

minggli · 2018-07-21T19:13:00Z

@dhimmel saw your comment. thanks for contributing! this test_compression_warning expects a RuntimeWarning when a file handle is passed to to_csv method with compression kwarg is supplied.

This is because it's not supported as stated in:

pandas/pandas/io/formats/csvs.py

Line 134 in 1033e8b

if self.compression and hasattr(self.path_or_buf, 'write'):

I don't see why it would fail your work on to_json.

Having checked out your PR, I saw the issue but can't replicate it on master. Are you working off the latest master branch?

WillAyd · 2018-07-21T19:21:55Z

Are you saying that I should be testing that changing the default actually changes the default

Yep. Don't need to overthink it with all the various parameter combinations but at least need a test to ensure this now defaults to infer (since that is what you are changing)

dhimmel · 2018-07-21T19:30:43Z

Are you working off the latest master branch?

@minggli, this PR currently branches off at 322dbf4, which is now two commits behind. I can rebase, but don't think that's the issue.

I don't see why it would fail your work on to_json.

I was thinking the change of the to_csv compression default may have caused this issue, but it doesn't make sense to me. The test explicitly specifies compression in df.to_csv(f, compression=compression_only), so I don't see how my PR would effect test_compression_warning.

minggli · 2018-07-21T21:42:01Z

@dhimmel,

I think the change of default in to_csv did change things. there is no infer_compression procedure in to_csv. I think that's what caused the failing of test.

add _infer_compression in to_csv should solve this problem.

by the way, it appears that #17900 has removed zip from docstrings in to_csv and to_json based on Oct 2017 discussion but zip compression for writing has been added in 0.23 Jun 2018:

pandas/doc/source/whatsnew/v0.23.0.txt

Line 543 in 67b6277

    
           - zip compression is supported via ``compression=zip`` in :func:`DataFrame.to_pickle`, :func:`Series.to_pickle`, :func:`DataFrame.to_csv`, :func:`Series.to_csv`, :func:`DataFrame.to_json`, :func:`Series.to_json`. (:issue:`17778`)

could you add 'zip' back in the docstrings of to_csv and to_json?

dhimmel · 2018-07-21T23:24:05Z

there is no infer_compression procedure in to_csv

Hmm. I thought to_csv now supports compression='infer' as of #17900 by @Dobatymo and @gfyoung. Basically, compression should be inferred by _get_handle but it appears the zip logic operates outside of this delegation.

could you add 'zip' back in the docstrings of to_csv and to_json?

Is there consensus on this change. IIRC from reading past issues, it was considered poor practice to save a single file to a zip archive. Thus pandas would be lenient on reading, but stricter on writing. I don't have a strong opinion, but don't want to hold up this PR for another controversial issue. So let's diagnose the failing test and see if it's somehow related to the zip issue.

minggli · 2018-07-22T00:00:30Z

#17900 added infer in _get_handle but to_csv uses compression argument before calling _get_handle; therefore raises RuntimeWarning at https://github.com/dhimmel/pandas/blob/648bf4d1810a2c2b9cbff1d4b941ab7cb7bc0b35/pandas/io/formats/csvs.py#L133 because compression='infer' instead of None as it was before. It on itself shouldn't be a problem but pytest-dev/pytest#2917 exists so that test_compression_warning fails because the warning has been raised in earlier test before this case. simply changing the order of the test is not thread-safe so I think the best way is:

inside to_csv:
compression = _infer_compression(path_or_buf, compression)
or inside csvs
self.compression = _infer_compression(path_or_buf, compression)

In regards to zip in docstring, it's what production is showing right and supported already.

import os
import pandas as pd

a = pd.DataFrame(10000 * [[123, 234, 435]], columns=['A', 'B', 'C'])
a.to_csv('test_compressed', index=False, compression='zip')

b = pd.read_csv('test_compressed', compression='zip')

assert a.equals(b)
os.remove('test_compressed')

dhimmel · 2018-07-22T00:12:24Z

So I'm a bit concerned with what has happened since I last worked on compression & file IO. My impression was that we should start to use a unified API for inferring compression and opening files: see #15008. _get_handle now delegates to _infer_compression . It seems however that CSVFormatter.save has now gotten further away from using the unified API. In be724fa I replace CSVFormatter.save with the simplified workflow. If writing to zip is a feature that we want to support, then shouldn't this go in _get_handle?

Happy to revert be724fa or put it in another PR, just wanted to see the CI test output to know what this would break.

jreback · 2018-07-23T11:31:09Z

_get_handle now delegates to _infer_compression .

why is this a problem? this is clear separation of concerns. I don't think the goal has drifted from when you last worked on this. A number of bugs / consolidations have happened in the interim. Happy to take further cleanup.

dhimmel · 2018-07-23T15:29:55Z

why is this a problem?

Having _get_handle call _infer_compression is good. It simplifies the number of functions that individual to_* methods have to call. However, the issue with CSVFormatter.save is that performs a custom workflow for handling zip files, which has made the code quite complicated with several lines calling _get_handle. The issue came up because the failing test is part of this custom compression workflow CSVFormatter now contains. However, I apologize for getting a bit sidetracked. I should keep this PR focused on changing the default for compression to infer.

Attempt to diagnose testing failure of Python 2 test_compression_warning https://travis-ci.org/pandas-dev/pandas/jobs/407300547#L3853

pep8speaks · 2018-07-23T21:08:43Z

Hello @dhimmel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 01, 2018 at 14:41 Hours UTC

dhimmel · 2018-07-24T16:17:40Z

Why is `test_compression_warning` failing?

There is a testing failure that has completely stumped me. It is only occurring in some Python 2 builds. The failing test is:

pandas/pandas/tests/test_common.py

Lines 254 to 264 in 1033e8b

    
           # GH 21227 
        
           def test_compression_warning(compression_only): 
        
               df = DataFrame(100 * [[0.123456, 0.234567, 0.567567], 
        
                                     [12.32112, 123123.2, 321321.2]], 
        
                              columns=['X', 'Y', 'Z']) 
        
               with tm.ensure_clean() as filename: 
        
                   f, _handles = _get_handle(filename, 'w', compression=compression_only) 
        
                   with tm.assert_produces_warning(RuntimeWarning, 
        
                                                   check_stacklevel=False): 
        
                       with f: 
        
                           df.to_csv(f, compression=compression_only)

The error is:

E               AssertionError: Did not see expected warning of class 'RuntimeWarning'.

It doesn't make sense to me how the changes in this PR would effect whether the RuntimeWarning. This PR only changes the default compression value. The test specifies compression so the default should not matter.

In cebc0d9 I added some debugging statements to help diagnose the testing failure

In CSVFormatter.save:

In test_compression_warning:

Here's the output:

------------------------------ Captured log call -------------------------------
test_common.py             261 WARNING  debug_1: gzip
test_common.py             266 WARNING  debug_2: gzip
csvs.py                    138 WARNING  debug_3: gzip
csvs.py                    139 WARNING  debug_4: <gzip open file '/tmp/tmpQYAr5L', mode 'wb' at 0x7f7c02e56db0 0x7f7c04698b10>
csvs.py                    141 WARNING  debug_5: True
csvs.py                    143 WARNING  debug_6: in loop, should RuntimeWarn

So the logging shows in the failing test that the code enters the if statement that triggers the RuntimeWarning. So I don't understand how the warning could not be raised or where this issue could be coming from.

@WillAyd other than this issue how do things look? I added a test for to_csv defaulting to inference.

WillAyd

Generally looks OK, though I think the test can be improved. Will have to take a look again after some of the logging things get cleaned up

WillAyd · 2018-07-24T16:33:29Z

pandas/tests/io/formats/test_to_csv.py

+        Test that to_csv defaults to inferring compression from paths.
+        https://github.com/pandas-dev/pandas/pull/22011
+        """
+        df = DataFrame({"A": [1]})


While I suppose this "works" it is definitely focused on gzip and doesn't make any assertions about how the behavior works across a combination of infer, None and an explicit compression.

We have a fixture called compression in conftest.py that covers the various compression options - is there a way to parametrize that as part of this test and make more comprehensive assertions about the potential combinations of behavior?

The possible values that compression can take for to_csv are already tested. The purpose of this test is simply to test that compression is defaulting to 'infer'. I kept the test simple so that it ONLY tests that the default is infer and won't break or malfunction should other aspects of the API change. The test will break if the default for compression is put back to None (and hopefully not for other reasons).

I think it may make sense to make a similar test for to_json, but don't see how testing compression='infer' for all possible compression extensions is within scope of this PR as this PR just changes the default and is not creating the infer option. Is the worry that there is currently inadequate testing?

My point was that this test only makes sure that gzip compression works by default with infer, but how does it guarantee that other compression types play nice with infer? I'll admit it's a subtle distinction but nuances like that do pop up and it doesn't seem like it should be that much more work to leverage the existing compression fixture for this test to increase coverage

it should be that much more work to leverage the existing compression fixture for this test to increase coverage

In abd19e3, I modified an existing parametrized test to look for compression by default for paths where inference should occur. This actually caught an issue (we hadn't switched default for Series.to_csv -- fixed in 2f670fe).

Should I delete the gzip test? Note the parametrized test doesn't test that the right compression is occurring, just that a compression is occurring.

Instead of going about it in this fashion can you not do a round trip with to_csv and read_csv using infer with a file extension for the former and the parametrized value as an argument for compression in the latter? So in pseudo-code:

with tm.ensure_clean('compressed.csv.{}'.format(compression_only)) as path: df.to_csv(path) result = pd.read_csv(path, compression=compression_only) tm.assert_frame_equal(result, df)

I'm not a big fan of the roundtrip approach here, since it never actually tests that the file is compressed on disk. Given that the to_* and read_* methods rely on much of the same compression infrastructure, I think it's possible to modify the code such that all compression gets disabled and the roudtrip works perfectly. Now hopefully there are enough other tests to catch such a situation.

WillAyd · 2018-07-24T16:35:26Z

@minggli any thoughts on the issue @dhimmel is describing above?

minggli · 2018-07-24T18:26:52Z

@dhimmel

#17900 added infer in _get_handle but to_csv uses compression argument before calling _get_handle; therefore raises RuntimeWarning at https://github.com/dhimmel/pandas/blob/648bf4d1810a2c2b9cbff1d4b941ab7cb7bc0b35/pandas/io/formats/csvs.py#L133 because compression='infer' instead of None as it was before. It on itself shouldn't be a problem but pytest-dev/pytest#2917 exists so that test_compression_warning fails because the warning has been raised in earlier test before this case. simply changing the order of the test is not thread-safe so I think the best way is:

inside to_csv:
compression = _infer_compression(path_or_buf, compression)
or inside csvs
self.compression = _infer_compression(path_or_buf, compression)

codecov · 2018-07-30T17:13:58Z

Codecov Report

Merging #22011 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #22011      +/-   ##
==========================================
+ Coverage   92.07%   92.07%   +<.01%     
==========================================
  Files         170      170              
  Lines       50690    50685       -5     
==========================================
- Hits        46672    46669       -3     
+ Misses       4018     4016       -2

Flag	Coverage Δ
#multiple	`90.48% <100%> (ø)`	⬆️
#single	`42.31% <100%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`96.47% <ø> (ø)`	⬆️
pandas/io/json/json.py	`92.47% <ø> (ø)`	⬆️
pandas/core/frame.py	`97.26% <ø> (ø)`	⬆️
pandas/core/series.py	`94.11% <ø> (ø)`	⬆️
pandas/io/formats/csvs.py	`98.21% <100%> (+0.55%)`	⬆️
pandas/core/indexing.py	`93.79% <0%> (-0.03%)`	⬇️
pandas/core/groupby/generic.py	`86.79% <0%> (-0.02%)`	⬇️
pandas/core/dtypes/dtypes.py	`96.03% <0%> (-0.02%)`	⬇️
pandas/core/sparse/series.py	`95.22% <0%> (ø)`	⬆️
pandas/core/arrays/interval.py	`92.6% <0%> (+0.27%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d30c4a0...12f14e2. Read the comment docs.

dhimmel · 2018-07-30T21:55:32Z

Tests passing as of e3a0f56. @WillAyd and @jreback ready for re-review.

jreback

lgtm. ping on green

jreback · 2018-07-31T13:19:32Z

pandas/core/frame.py

            If 'infer' and `path_or_buf` is path-like, then detect compression
            from the following extensions: '.gz', '.bz2', '.zip' or '.xz'
            (otherwise no compression).
+            .. versionchanged:: 0.24.0


I think you need to have a blank after this or it has a warning, @TomAugspurger @datapythonista ?

Done in f8829a6, but would be good to hear from @TomAugspurger and @datapythonista, since we have complex situations such as:

DOCLINE DOCLINE .. versionchanged:: 0.23.0 here is what was added .. versionchanged:: 0.24.0 here is what changed DOCLINE

For example, is the above OKAY or do we need additional blanks?

@dhimmel I think you need the additional blank lines (before, and not sure if after).

The reason is not that much an standard in this case, but about sphinx understanding the directive. What we expect in the documentation, is that it's rendered like in the validate method of the merge docstring: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

But if you don't leave the right blank lines, sphinx doesn't detect it's a directive, and the text is rendered as it is. See this case: https://pandas.pydata.org/pandas-docs/version/0.23.1/generated/pandas.IntervalIndex.from_tuples.html

So, the best is if you can build the documentation, and check that it's rendered all right. This can be done by ./doc/make.py html (or ./doc/make.py html --single pandas.DataFrame.read_csv)

Let me know if you have any issue.

Updated in eadf68e and built the docs locally to confirm they're rendering properly.

Turns out the blank line before is required. After is not required. In between multiple statements is not required.

jreback · 2018-07-31T13:19:42Z

pandas/core/generic.py

            A string representing the compression to use in the output file,
            only used when the first argument is a filename.

            .. versionadded:: 0.21.0
+            .. versionchanged:: 0.24.0
+               'infer' option added and set to default



e.g. like this is good

jreback · 2018-07-31T13:19:50Z

pandas/core/series.py

+            Allowed values are None, 'gzip', 'bz2', 'zip', 'xz', and 'infer'.
+            This input is only used when the first argument is a filename.
+            .. versionchanged:: 0.24.0
+               'infer' option added and set to default
        date_format: string, default None


jreback · 2018-07-31T13:20:23Z

pandas/tests/io/test_common.py


 import pandas as pd
-import pandas.util.testing as tm
+import pandas.io.common as cmn


can you rename to icom

Done in af8c137

jreback · 2018-07-31T13:21:19Z

pandas/tests/io/test_common.py

@@ -285,4 +285,100 @@ def test_unknown_engine(self):
            df = tm.makeDataFrame()
            df.to_csv(path)
            with tm.assert_raises_regex(ValueError, 'Unknown engine'):
-                read_csv(path, engine='pyt')
+                pd.read_csv(path, engine='pyt')
+


can you put a market comment (e.g. line of --- or whatever), and put compression tests to delineate this section of the tests (also ok with a new test file tests_compression.py (maybe simpler)

dhimmel · 2018-07-31T21:20:32Z

lgtm. ping on green

@jreback it's 🍏, i.e. #008000

Refs pandas-dev#22011 (comment) Blanks are needed before but not after or in between.

jreback · 2018-07-31T23:55:32Z

lgtm

@WillAyd merge when satisfied

WillAyd

Small nits / question around comments

WillAyd · 2018-08-01T00:50:43Z

pandas/tests/io/test_compression.py

+def test_dataframe_compression_defaults_to_infer(
+        write_method, write_kwargs, read_method, compression_only):
+    # Test that DataFrame.to_* methods default to inferring compression from
+    # paths. GH 22004


Just change comment to # GH22004 (standard in other tests). The rest here doesn't add anything that isn't inferred from the test name

Done in cf5b62e

WillAyd · 2018-08-01T00:50:55Z

pandas/tests/io/test_compression.py

+def test_series_compression_defaults_to_infer(
+        write_method, write_kwargs, read_method, read_kwargs,
+        compression_only):
+    # Test that Series.to_* methods default to inferring compression from


Same as above

WillAyd · 2018-08-01T01:01:38Z

pandas/tests/io/test_compression.py

+    # Assert that passing a file object to to_csv while explicitly specifying a
+    # compression protocol triggers a RuntimeWarning, as per GH 21227.
+    # Note that pytest has an issue that causes assert_produces_warning to fail
+    # in Python 2 if the warning has occurred in previous tests


May be overlooking it but where in the links is there mention about Python 2 behavior? We get some random Resource Warnings in our tests which is maybe related so could be good info, but wasn't immediately apparent to me where that was

The pytest issue at https://git.io/fNEBm / pytest-dev/pytest#2917.

WillAyd

lgtm I'll merge after AppVeyor goes green. Thanks @dhimmel

dhimmel · 2018-08-01T18:13:10Z

Not sure what the Travis failure is about:

_____________________________ ERROR collecting gw0 _____________________________
Different tests were collected between gw1 and gw0. The difference is:
--- gw1
+++ gw0

gfyoung · 2018-08-01T20:59:14Z

@dhimmel : Looks like all of our required builds are passing (that gw0 / gw1 can be finicky)!

However, I would like to double check our non-required builds (they're acting up a bit) first before merging this (cc @jreback @WillAyd ).

gfyoung · 2018-08-01T21:23:42Z

@dhimmel : Thanks a lot for this!

Closes pandas-devgh-22004.

* master: (47 commits) Run tests in conda build [ci skip] (pandas-dev#22190) TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165) CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184) Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179) DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175) DOC: updated Series.str.contains see also section (pandas-dev#22176) 0.23.4 whatsnew (pandas-dev#22177) fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973) BUG: Fix get dummies unicode error (pandas-dev#22131) Fixed py36-only syntax [ci skip] (pandas-dev#22167) DEPR: pd.read_table (pandas-dev#21954) DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119) BUG: Matplotlib scatter datetime (pandas-dev#22039) CLN: Use public method to capture UTC offsets (pandas-dev#22164) implement tslibs/src to make tslibs self-contained (pandas-dev#22152) Fix categorical from codes nan 21767 (pandas-dev#21775) BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125) use memoryviews instead of ndarrays (pandas-dev#22147) Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155) API: Default to_* methods to compression='infer' (pandas-dev#22011) ...

dhimmel · 2018-08-06T21:35:15Z

Just wanted to add that I posted on Steem about this pull request, as part of the https://utopian.io initiative. Today, the utopian account upvoted my post, thereby rewarding it, as an incentive for open source contributions! So thanks Utopian and happy to help any other Pandas contributors get set up with a Steem account and use Utopian, just email me.

Closes pandas-devgh-22004.

Default to_csv & to_json to compression='infer'

8689167

dhimmel commented Jul 21, 2018

View reviewed changes

WillAyd requested changes Jul 21, 2018

View reviewed changes

WillAyd added the IO JSON read_json, to_json, json_normalize label Jul 21, 2018

to_json compression=infer in pandas/core/generic.py

3ccfb00

dhimmel added 2 commits July 21, 2018 19:34

Simplify CSVFormatter.save

648bf4d

Exploratory commit of what CSVFormatter.save should look like

be724fa

dhimmel added 4 commits July 23, 2018 14:52

fixup! Simplify CSVFormatter.save

9fe27c9

"Revert changes not related to compression default

65f0689

TST: test to_csv infers compression by default

868e671

Debugging print statements

c3b76ee

Attempt to diagnose testing failure of Python 2 test_compression_warning https://travis-ci.org/pandas-dev/pandas/jobs/407300547#L3853

Debugging: use logging rather than print

cebc0d9

dhimmel force-pushed the default-to-infer-compression branch from d41ede5 to cebc0d9 Compare July 23, 2018 21:11

WillAyd requested changes Jul 24, 2018

View reviewed changes

Organize / simplify pandas/tests/test_common.py imports

12f14e2

dhimmel added 2 commits July 30, 2018 13:14

Ignore flake error needed for test

6db23d9

fixup! Organize / simplify pandas/tests/test_common.py imports

e3a0f56

jreback requested changes Jul 31, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Jul 31, 2018

jreback added IO Data IO issues that don't fit into a more specific label IO CSV read_csv, to_csv labels Jul 31, 2018

dhimmel added 3 commits July 31, 2018 10:03

change import: cmn to icom

af8c137

Blank lines after versionchanged

f8829a6

Move compression tests to new file tests/io/test_compression.py

918c0f8

blank lines before .. versionchanged

eadf68e

Refs pandas-dev#22011 (comment) Blanks are needed before but not after or in between.

jreback approved these changes Jul 31, 2018

View reviewed changes

WillAyd requested changes Aug 1, 2018

View reviewed changes

Remove comments and space after GH

cf5b62e

WillAyd approved these changes Aug 1, 2018

View reviewed changes

gfyoung merged commit 93f154c into pandas-dev:master Aug 1, 2018

toobaz mentioned this pull request Aug 2, 2018

DEPR: Deprecate Series.to_csv signature #21896

Merged

4 tasks

dberenbaum pushed a commit to dberenbaum/pandas that referenced this pull request Aug 3, 2018

API: Default to_* methods to compression='infer' (pandas-dev#22011)

c872e40

Closes pandas-devgh-22004.

dhimmel mentioned this pull request Sep 21, 2018

In-memory to_csv compression #22555

Closed

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

API: Default to_* methods to compression='infer' (pandas-dev#22011)

bc40588

Closes pandas-devgh-22004.

WillAyd mentioned this pull request Apr 29, 2019

to_pickle compression does not work with in-memory buffers #26237

Closed

Default to_* methods to compression='infer' #22011

Default to_* methods to compression='infer' #22011

Conversation

dhimmel commented Jul 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

dhimmel commented Jul 21, 2018 • edited Loading

codecov bot commented Jul 21, 2018 • edited Loading

Codecov Report

dhimmel commented Jul 21, 2018 • edited Loading

minggli commented Jul 21, 2018 • edited Loading

WillAyd commented Jul 21, 2018

dhimmel commented Jul 21, 2018 • edited Loading

minggli commented Jul 21, 2018 • edited Loading

dhimmel commented Jul 21, 2018 • edited Loading

minggli commented Jul 22, 2018 • edited Loading

dhimmel commented Jul 22, 2018 • edited Loading

jreback commented Jul 23, 2018

dhimmel commented Jul 23, 2018 • edited Loading

pep8speaks commented Jul 23, 2018 • edited Loading

Comment last updated on August 01, 2018 at 14:41 Hours UTC

dhimmel commented Jul 24, 2018

Why is test_compression_warning failing?

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Jul 24, 2018

minggli commented Jul 24, 2018

codecov bot commented Jul 30, 2018

Codecov Report

dhimmel commented Jul 30, 2018

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhimmel commented Jul 31, 2018 • edited Loading

jreback commented Jul 31, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

dhimmel commented Aug 1, 2018

gfyoung commented Aug 1, 2018 • edited Loading

gfyoung commented Aug 1, 2018

dhimmel commented Aug 6, 2018

dhimmel commented Jul 21, 2018 •

edited

Loading

dhimmel commented Jul 21, 2018 •

edited

Loading

codecov bot commented Jul 21, 2018 •

edited

Loading

dhimmel commented Jul 21, 2018 •

edited

Loading

minggli commented Jul 21, 2018 •

edited

Loading

dhimmel commented Jul 21, 2018 •

edited

Loading

minggli commented Jul 21, 2018 •

edited

Loading

dhimmel commented Jul 21, 2018 •

edited

Loading

minggli commented Jul 22, 2018 •

edited

Loading

dhimmel commented Jul 22, 2018 •

edited

Loading

dhimmel commented Jul 23, 2018 •

edited

Loading

pep8speaks commented Jul 23, 2018 •

edited

Loading

Why is `test_compression_warning` failing?

dhimmel commented Jul 31, 2018 •

edited

Loading

gfyoung commented Aug 1, 2018 •

edited

Loading