Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to_* methods to compression='infer' #22011

Merged
merged 40 commits into from
Aug 1, 2018

Conversation

dhimmel
Copy link
Contributor

@dhimmel dhimmel commented Jul 21, 2018

This PR does the following:

  • Update default compression for to_csv, to_json, and to_pickle methods to infer.
  • Adds test_compression_defaults_to_infer to test that compression='infer' is default for the relevant to_* methods.
  • Fixes a bug in CSVFormatter where setting compression='infer' with a file object would produce a RuntimeWarning.
  • Adds documentation to test_compression_warning which can fail due to a pytest bug.
  • Cleans up how the encoding argument in CSVFormatter is processed.
  • Moves compression tests from pandas/tests/test_common.py to pandas/tests/io/test_common.py

@@ -28,7 +28,7 @@
# interface to/from
def to_json(path_or_buf, obj, orient=None, date_format='epoch',
double_precision=10, force_ascii=True, date_unit='ms',
default_handler=None, lines=False, compression=None,
default_handler=None, lines=False, compression='infer',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where to update the to_json docs... didn't see a docstring in this function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert the object to a JSON string.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Jul 21, 2018
@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 21, 2018

The following test is failing in Python 2.7 (see travis log):

# GH 21227
def test_compression_warning(compression_only):
df = DataFrame(100 * [[0.123456, 0.234567, 0.567567],
[12.32112, 123123.2, 321321.2]],
columns=['X', 'Y', 'Z'])
with tm.ensure_clean() as filename:
f, _handles = _get_handle(filename, 'w', compression=compression_only)
with tm.assert_produces_warning(RuntimeWarning,
check_stacklevel=False):
with f:
df.to_csv(f, compression=compression_only)

Let me look into #21227 as to what this test is for.

Update: this test was added in 91451cb / #21478 (not #21227). @minggli can you explain the purpose of test_compression_warning? I'm not sure why switching the compression default is causing this to fail. It seems like in Python 2, the test expects a RuntimeWarning that is not occurring.

Here is the relevant pytest fixture:

pandas/pandas/conftest.py

Lines 131 to 138 in 55cbd7d

@pytest.fixture(params=['gzip', 'bz2', 'zip',
pytest.param('xz', marks=td.skip_if_no_lzma)])
def compression_only(request):
"""
Fixture for trying common compression types in compression tests excluding
uncompressed case
"""
return request.param

@codecov
Copy link

codecov bot commented Jul 21, 2018

Codecov Report

Merging #22011 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22011      +/-   ##
==========================================
- Coverage   92.07%   92.06%   -0.01%     
==========================================
  Files         170      170              
  Lines       50690    50704      +14     
==========================================
+ Hits        46672    46680       +8     
- Misses       4018     4024       +6
Flag Coverage Δ
#multiple 90.47% <100%> (-0.01%) ⬇️
#single 42.3% <100%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 94.11% <ø> (ø) ⬆️
pandas/core/frame.py 97.26% <ø> (ø) ⬆️
pandas/io/json/json.py 92.47% <ø> (ø) ⬆️
pandas/core/generic.py 96.47% <ø> (ø) ⬆️
pandas/io/formats/csvs.py 98.21% <100%> (+0.55%) ⬆️
pandas/core/arrays/datetimelike.py 94.02% <0%> (-1.04%) ⬇️
pandas/core/dtypes/common.py 94.87% <0%> (-0.34%) ⬇️
pandas/util/testing.py 85.69% <0%> (-0.21%) ⬇️
pandas/core/indexes/datetimes.py 95.54% <0%> (-0.14%) ⬇️
pandas/core/indexing.py 93.79% <0%> (-0.03%) ⬇️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d30c4a0...cf5b62e. Read the comment docs.

@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 21, 2018

Needs tests

@WillAyd I believe the different values for compression are already being tested, such that we don't need to test that compression='infer' works. Are you saying that I should be testing that changing the default actually changes the default? It seems like that could create an excessive amount of tests were we to test every default argument, but am happy to proceed as you see fit.

@minggli
Copy link
Contributor

minggli commented Jul 21, 2018

@dhimmel saw your comment. thanks for contributing! this test_compression_warning expects a RuntimeWarning when a file handle is passed to to_csv method with compression kwarg is supplied.

This is because it's not supported as stated in:

if self.compression and hasattr(self.path_or_buf, 'write'):

I don't see why it would fail your work on to_json.

Having checked out your PR, I saw the issue but can't replicate it on master. Are you working off the latest master branch?

@WillAyd
Copy link
Member

WillAyd commented Jul 21, 2018

Are you saying that I should be testing that changing the default actually changes the default

Yep. Don't need to overthink it with all the various parameter combinations but at least need a test to ensure this now defaults to infer (since that is what you are changing)

@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 21, 2018

Are you working off the latest master branch?

@minggli, this PR currently branches off at 322dbf4, which is now two commits behind. I can rebase, but don't think that's the issue.

I don't see why it would fail your work on to_json.

I was thinking the change of the to_csv compression default may have caused this issue, but it doesn't make sense to me. The test explicitly specifies compression in df.to_csv(f, compression=compression_only), so I don't see how my PR would effect test_compression_warning.

@minggli
Copy link
Contributor

minggli commented Jul 21, 2018

@dhimmel,

I think the change of default in to_csv did change things. there is no infer_compression procedure in to_csv. I think that's what caused the failing of test.

add _infer_compression in to_csv should solve this problem.

by the way, it appears that #17900 has removed zip from docstrings in to_csv and to_json based on Oct 2017 discussion but zip compression for writing has been added in 0.23 Jun 2018:

- zip compression is supported via ``compression=zip`` in :func:`DataFrame.to_pickle`, :func:`Series.to_pickle`, :func:`DataFrame.to_csv`, :func:`Series.to_csv`, :func:`DataFrame.to_json`, :func:`Series.to_json`. (:issue:`17778`)

could you add 'zip' back in the docstrings of to_csv and to_json?

@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 21, 2018

there is no infer_compression procedure in to_csv

Hmm. I thought to_csv now supports compression='infer' as of #17900 by @Dobatymo and @gfyoung. Basically, compression should be inferred by _get_handle but it appears the zip logic operates outside of this delegation.

could you add 'zip' back in the docstrings of to_csv and to_json?

Is there consensus on this change. IIRC from reading past issues, it was considered poor practice to save a single file to a zip archive. Thus pandas would be lenient on reading, but stricter on writing. I don't have a strong opinion, but don't want to hold up this PR for another controversial issue. So let's diagnose the failing test and see if it's somehow related to the zip issue.

@minggli
Copy link
Contributor

minggli commented Jul 22, 2018

#17900 added infer in _get_handle but to_csv uses compression argument before calling _get_handle; therefore raises RuntimeWarning at https://github.com/dhimmel/pandas/blob/648bf4d1810a2c2b9cbff1d4b941ab7cb7bc0b35/pandas/io/formats/csvs.py#L133 because compression='infer' instead of None as it was before. It on itself shouldn't be a problem but pytest-dev/pytest#2917 exists so that test_compression_warning fails because the warning has been raised in earlier test before this case. simply changing the order of the test is not thread-safe so I think the best way is:

inside to_csv:
compression = _infer_compression(path_or_buf, compression)
or inside csvs
self.compression = _infer_compression(path_or_buf, compression)

In regards to zip in docstring, it's what production is showing right and supported already.

import os
import pandas as pd

a = pd.DataFrame(10000 * [[123, 234, 435]], columns=['A', 'B', 'C'])
a.to_csv('test_compressed', index=False, compression='zip')

b = pd.read_csv('test_compressed', compression='zip')

assert a.equals(b)
os.remove('test_compressed')

@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 22, 2018

So I'm a bit concerned with what has happened since I last worked on compression & file IO. My impression was that we should start to use a unified API for inferring compression and opening files: see #15008. _get_handle now delegates to _infer_compression . It seems however that CSVFormatter.save has now gotten further away from using the unified API. In be724fa I replace CSVFormatter.save with the simplified workflow. If writing to zip is a feature that we want to support, then shouldn't this go in _get_handle?

Happy to revert be724fa or put it in another PR, just wanted to see the CI test output to know what this would break.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2018

_get_handle now delegates to _infer_compression .

why is this a problem? this is clear separation of concerns. I don't think the goal has drifted from when you last worked on this. A number of bugs / consolidations have happened in the interim. Happy to take further cleanup.

@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 23, 2018

why is this a problem?

Having _get_handle call _infer_compression is good. It simplifies the number of functions that individual to_* methods have to call. However, the issue with CSVFormatter.save is that performs a custom workflow for handling zip files, which has made the code quite complicated with several lines calling _get_handle. The issue came up because the failing test is part of this custom compression workflow CSVFormatter now contains. However, I apologize for getting a bit sidetracked. I should keep this PR focused on changing the default for compression to infer.

@pep8speaks
Copy link

pep8speaks commented Jul 23, 2018

Hello @dhimmel! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 01, 2018 at 14:41 Hours UTC

@dhimmel dhimmel force-pushed the default-to-infer-compression branch from d41ede5 to cebc0d9 Compare July 23, 2018 21:11
@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 24, 2018

Why is test_compression_warning failing?

There is a testing failure that has completely stumped me. It is only occurring in some Python 2 builds. The failing test is:

# GH 21227
def test_compression_warning(compression_only):
df = DataFrame(100 * [[0.123456, 0.234567, 0.567567],
[12.32112, 123123.2, 321321.2]],
columns=['X', 'Y', 'Z'])
with tm.ensure_clean() as filename:
f, _handles = _get_handle(filename, 'w', compression=compression_only)
with tm.assert_produces_warning(RuntimeWarning,
check_stacklevel=False):
with f:
df.to_csv(f, compression=compression_only)

The error is:

E               AssertionError: Did not see expected warning of class 'RuntimeWarning'.

It doesn't make sense to me how the changes in this PR would effect whether the RuntimeWarning. This PR only changes the default compression value. The test specifies compression so the default should not matter.

In cebc0d9 I added some debugging statements to help diagnose the testing failure

In CSVFormatter.save:

warning-debug

In test_compression_warning:

test-debug-changes

Here's the output:

------------------------------ Captured log call -------------------------------
test_common.py             261 WARNING  debug_1: gzip
test_common.py             266 WARNING  debug_2: gzip
csvs.py                    138 WARNING  debug_3: gzip
csvs.py                    139 WARNING  debug_4: <gzip open file '/tmp/tmpQYAr5L', mode 'wb' at 0x7f7c02e56db0 0x7f7c04698b10>
csvs.py                    141 WARNING  debug_5: True
csvs.py                    143 WARNING  debug_6: in loop, should RuntimeWarn

So the logging shows in the failing test that the code enters the if statement that triggers the RuntimeWarning. So I don't understand how the warning could not be raised or where this issue could be coming from.

@WillAyd other than this issue how do things look? I added a test for to_csv defaulting to inference.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks OK, though I think the test can be improved. Will have to take a look again after some of the logging things get cleaned up

Test that to_csv defaults to inferring compression from paths.
https://github.com/pandas-dev/pandas/pull/22011
"""
df = DataFrame({"A": [1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I suppose this "works" it is definitely focused on gzip and doesn't make any assertions about how the behavior works across a combination of infer, None and an explicit compression.

We have a fixture called compression in conftest.py that covers the various compression options - is there a way to parametrize that as part of this test and make more comprehensive assertions about the potential combinations of behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possible values that compression can take for to_csv are already tested. The purpose of this test is simply to test that compression is defaulting to 'infer'. I kept the test simple so that it ONLY tests that the default is infer and won't break or malfunction should other aspects of the API change. The test will break if the default for compression is put back to None (and hopefully not for other reasons).

I think it may make sense to make a similar test for to_json, but don't see how testing compression='infer' for all possible compression extensions is within scope of this PR as this PR just changes the default and is not creating the infer option. Is the worry that there is currently inadequate testing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that this test only makes sure that gzip compression works by default with infer, but how does it guarantee that other compression types play nice with infer? I'll admit it's a subtle distinction but nuances like that do pop up and it doesn't seem like it should be that much more work to leverage the existing compression fixture for this test to increase coverage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be that much more work to leverage the existing compression fixture for this test to increase coverage

In abd19e3, I modified an existing parametrized test to look for compression by default for paths where inference should occur. This actually caught an issue (we hadn't switched default for Series.to_csv -- fixed in 2f670fe).

Should I delete the gzip test? Note the parametrized test doesn't test that the right compression is occurring, just that a compression is occurring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of going about it in this fashion can you not do a round trip with to_csv and read_csv using infer with a file extension for the former and the parametrized value as an argument for compression in the latter? So in pseudo-code:

with tm.ensure_clean('compressed.csv.{}'.format(compression_only)) as path:
    df.to_csv(path)
    result = pd.read_csv(path, compression=compression_only)

tm.assert_frame_equal(result, df)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the roundtrip approach here, since it never actually tests that the file is compressed on disk. Given that the to_* and read_* methods rely on much of the same compression infrastructure, I think it's possible to modify the code such that all compression gets disabled and the roudtrip works perfectly. Now hopefully there are enough other tests to catch such a situation.

@WillAyd
Copy link
Member

WillAyd commented Jul 24, 2018

@minggli any thoughts on the issue @dhimmel is describing above?

@minggli
Copy link
Contributor

minggli commented Jul 24, 2018

@dhimmel

#17900 added infer in _get_handle but to_csv uses compression argument before calling _get_handle; therefore raises RuntimeWarning at https://github.com/dhimmel/pandas/blob/648bf4d1810a2c2b9cbff1d4b941ab7cb7bc0b35/pandas/io/formats/csvs.py#L133 because compression='infer' instead of None as it was before. It on itself shouldn't be a problem but pytest-dev/pytest#2917 exists so that test_compression_warning fails because the warning has been raised in earlier test before this case. simply changing the order of the test is not thread-safe so I think the best way is:

inside to_csv:
compression = _infer_compression(path_or_buf, compression)
or inside csvs
self.compression = _infer_compression(path_or_buf, compression)

@codecov
Copy link

codecov bot commented Jul 30, 2018

Codecov Report

Merging #22011 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22011      +/-   ##
==========================================
+ Coverage   92.07%   92.07%   +<.01%     
==========================================
  Files         170      170              
  Lines       50690    50685       -5     
==========================================
- Hits        46672    46669       -3     
+ Misses       4018     4016       -2
Flag Coverage Δ
#multiple 90.48% <100%> (ø) ⬆️
#single 42.31% <100%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/generic.py 96.47% <ø> (ø) ⬆️
pandas/io/json/json.py 92.47% <ø> (ø) ⬆️
pandas/core/frame.py 97.26% <ø> (ø) ⬆️
pandas/core/series.py 94.11% <ø> (ø) ⬆️
pandas/io/formats/csvs.py 98.21% <100%> (+0.55%) ⬆️
pandas/core/indexing.py 93.79% <0%> (-0.03%) ⬇️
pandas/core/groupby/generic.py 86.79% <0%> (-0.02%) ⬇️
pandas/core/dtypes/dtypes.py 96.03% <0%> (-0.02%) ⬇️
pandas/core/sparse/series.py 95.22% <0%> (ø) ⬆️
pandas/core/arrays/interval.py 92.6% <0%> (+0.27%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d30c4a0...12f14e2. Read the comment docs.

@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 30, 2018

Tests passing as of e3a0f56. @WillAyd and @jreback ready for re-review.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ping on green

If 'infer' and `path_or_buf` is path-like, then detect compression
from the following extensions: '.gz', '.bz2', '.zip' or '.xz'
(otherwise no compression).
.. versionchanged:: 0.24.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to have a blank after this or it has a warning, @TomAugspurger @datapythonista ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f8829a6, but would be good to hear from @TomAugspurger and @datapythonista, since we have complex situations such as:

DOCLINE
DOCLINE
.. versionchanged:: 0.23.0
   here is what was added
.. versionchanged:: 0.24.0 here is what changed

DOCLINE

For example, is the above OKAY or do we need additional blanks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhimmel I think you need the additional blank lines (before, and not sure if after).

The reason is not that much an standard in this case, but about sphinx understanding the directive. What we expect in the documentation, is that it's rendered like in the validate method of the merge docstring: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

But if you don't leave the right blank lines, sphinx doesn't detect it's a directive, and the text is rendered as it is. See this case: https://pandas.pydata.org/pandas-docs/version/0.23.1/generated/pandas.IntervalIndex.from_tuples.html

So, the best is if you can build the documentation, and check that it's rendered all right. This can be done by ./doc/make.py html (or ./doc/make.py html --single pandas.DataFrame.read_csv)

Let me know if you have any issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in eadf68e and built the docs locally to confirm they're rendering properly.

Turns out the blank line before is required. After is not required. In between multiple statements is not required.

A string representing the compression to use in the output file,
only used when the first argument is a filename.

.. versionadded:: 0.21.0
.. versionchanged:: 0.24.0
'infer' option added and set to default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. like this is good

Allowed values are None, 'gzip', 'bz2', 'zip', 'xz', and 'infer'.
This input is only used when the first argument is a filename.
.. versionchanged:: 0.24.0
'infer' option added and set to default
date_format: string, default None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same


import pandas as pd
import pandas.util.testing as tm
import pandas.io.common as cmn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to icom

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in af8c137

@@ -285,4 +285,100 @@ def test_unknown_engine(self):
df = tm.makeDataFrame()
df.to_csv(path)
with tm.assert_raises_regex(ValueError, 'Unknown engine'):
read_csv(path, engine='pyt')
pd.read_csv(path, engine='pyt')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a market comment (e.g. line of --- or whatever), and put compression tests to delineate this section of the tests (also ok with a new test file tests_compression.py (maybe simpler)

@jreback jreback added this to the 0.24.0 milestone Jul 31, 2018
@jreback jreback added IO Data IO issues that don't fit into a more specific label IO CSV read_csv, to_csv labels Jul 31, 2018
@dhimmel
Copy link
Contributor Author

dhimmel commented Jul 31, 2018

lgtm. ping on green

@jreback it's 🍏, i.e. #008000

Refs 
pandas-dev#22011 (comment)

Blanks are needed before but not after or in between.
@jreback
Copy link
Contributor

jreback commented Jul 31, 2018

lgtm

@WillAyd merge when satisfied

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits / question around comments

def test_dataframe_compression_defaults_to_infer(
write_method, write_kwargs, read_method, compression_only):
# Test that DataFrame.to_* methods default to inferring compression from
# paths. GH 22004
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change comment to # GH22004 (standard in other tests). The rest here doesn't add anything that isn't inferred from the test name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in cf5b62e

def test_series_compression_defaults_to_infer(
write_method, write_kwargs, read_method, read_kwargs,
compression_only):
# Test that Series.to_* methods default to inferring compression from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

# Assert that passing a file object to to_csv while explicitly specifying a
# compression protocol triggers a RuntimeWarning, as per GH 21227.
# Note that pytest has an issue that causes assert_produces_warning to fail
# in Python 2 if the warning has occurred in previous tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be overlooking it but where in the links is there mention about Python 2 behavior? We get some random Resource Warnings in our tests which is maybe related so could be good info, but wasn't immediately apparent to me where that was

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm I'll merge after AppVeyor goes green. Thanks @dhimmel

@dhimmel
Copy link
Contributor Author

dhimmel commented Aug 1, 2018

Not sure what the Travis failure is about:

_____________________________ ERROR collecting gw0 _____________________________
Different tests were collected between gw1 and gw0. The difference is:
--- gw1
+++ gw0

@gfyoung
Copy link
Member

gfyoung commented Aug 1, 2018

@dhimmel : Looks like all of our required builds are passing (that gw0 / gw1 can be finicky)!

However, I would like to double check our non-required builds (they're acting up a bit) first before merging this (cc @jreback @WillAyd ).

@gfyoung gfyoung merged commit 93f154c into pandas-dev:master Aug 1, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 1, 2018

@dhimmel : Thanks a lot for this!

dberenbaum pushed a commit to dberenbaum/pandas that referenced this pull request Aug 3, 2018
minggli added a commit to minggli/pandas that referenced this pull request Aug 5, 2018
* master: (47 commits)
  Run tests in conda build [ci skip] (pandas-dev#22190)
  TST: Check DatetimeIndex.drop on DST boundary (pandas-dev#22165)
  CI: Fix Travis failures due to lint.sh on pandas/core/strings.py (pandas-dev#22184)
  Documentation: typo fixes in MultiIndex / Advanced Indexing (pandas-dev#22179)
  DOC: added .join to 'see also' in Series.str.cat (pandas-dev#22175)
  DOC: updated Series.str.contains see also section (pandas-dev#22176)
  0.23.4 whatsnew (pandas-dev#22177)
  fix: scalar timestamp assignment (pandas-dev#19843) (pandas-dev#19973)
  BUG: Fix get dummies unicode error (pandas-dev#22131)
  Fixed py36-only syntax [ci skip] (pandas-dev#22167)
  DEPR: pd.read_table (pandas-dev#21954)
  DEPR: Removing previously deprecated datetools module (pandas-dev#6581) (pandas-dev#19119)
  BUG: Matplotlib scatter datetime (pandas-dev#22039)
  CLN: Use public method to capture UTC offsets (pandas-dev#22164)
  implement tslibs/src to make tslibs self-contained (pandas-dev#22152)
  Fix categorical from codes nan 21767 (pandas-dev#21775)
  BUG: Better handling of invalid na_option argument for groupby.rank(pandas-dev#22124) (pandas-dev#22125)
  use memoryviews instead of ndarrays (pandas-dev#22147)
  Remove depr. warning in SeriesGroupBy.count (pandas-dev#22155)
  API: Default to_* methods to compression='infer' (pandas-dev#22011)
  ...
@dhimmel
Copy link
Contributor Author

dhimmel commented Aug 6, 2018

Just wanted to add that I posted on Steem about this pull request, as part of the https://utopian.io initiative. Today, the utopian account upvoted my post, thereby rewarding it, as an incentive for open source contributions! So thanks Utopian and happy to help any other Pandas contributors get set up with a Steem account and use Utopian, just email me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label IO JSON read_json, to_json, json_normalize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants