Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: Compression Inference Tests for read_* #17262

Closed
gfyoung opened this issue Aug 15, 2017 · 9 comments
Closed

TST: Compression Inference Tests for read_* #17262

gfyoung opened this issue Aug 15, 2017 · 9 comments
Labels
IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite
Milestone

Comments

@gfyoung
Copy link
Member

gfyoung commented Aug 15, 2017

xref: #17206 (comment)

cc @dhimmel

@gfyoung gfyoung added IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite labels Aug 15, 2017
@gfyoung gfyoung added this to the 0.21.0 milestone Aug 15, 2017
@dhimmel
Copy link
Contributor

dhimmel commented Aug 15, 2017

See #17206 (comment). In short, not all read_* methods use functionality from io.common. This would be nice, see #15008, but it's a huge task. If we tested compression inference for all read_* methods, many would likely fail... there's lot's of undesirable duplicated functionality across the read code.

@gfyoung
Copy link
Member Author

gfyoung commented Aug 15, 2017

Fair enough. Perhaps you can testing whichever functions hit the io.common path and revise your whatsnew entry to reflect that.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2017

If we tested compression inference for all read_* methods, many would likely fail

@dhimmel but that's exactly the point. I am quite happy to have a comprehensive test, that xfails things that are not converted / implemented.

@dhimmel
Copy link
Contributor

dhimmel commented Aug 16, 2017

Looking through the source code, I believe io.common._infer_compression is only called in the following three places:

io.pickle.to_pickle()

inferred_compression = _infer_compression(path, compression)

io.pickle.read_pickle()

inferred_compression = _infer_compression(path, compression)

io.parsers._read()

compression = _infer_compression(filepath_or_buffer, compression)

The first two are for pickle IO. Tracking down where/how io.parsers._read gets used has been a bit more challenging. Will keep looking into it.

@dhimmel
Copy link
Contributor

dhimmel commented Aug 16, 2017

io.parsers._read is called by io.parsers._make_parser_function:

return _read(filepath_or_buffer, kwds)

_make_parser_function is then used in io.parsers to create read_csv and read_table

pandas/pandas/io/parsers.py

Lines 667 to 671 in a46e5be

read_csv = _make_parser_function('read_csv', sep=',')
read_csv = Appender(_read_csv_doc)(read_csv)
read_table = _make_parser_function('read_table', sep='\t')
read_table = Appender(_read_table_doc)(read_table)

Therefore, I believe, at the current time, _infer_compression is only used by the user-facing IO functions of read_pickle, write_pickle, read_csv, and read_table?

@gfyoung @jreback: should I revise the What's New Entry for #17206 to list these four functions?

- `read_*` methods can now infer compression from non-string paths, such as ``pathlib.Path`` objects (:issue:`17206`).

@gfyoung
Copy link
Member Author

gfyoung commented Aug 16, 2017

@dhimmel : Agreed, part of your PR should be to revise the whatsnew and add tests for any affected functions (and maybe even those that aren't affected but should eventually gain support).

@jreback
Copy link
Contributor

jreback commented Sep 23, 2017

@dhimmel any interest in working on this one would be great.

@jreback
Copy link
Contributor

jreback commented Jul 8, 2018

@gfyoung can you evaluate this issue, e.g. tick boxes close, etc.

@gfyoung
Copy link
Member Author

gfyoung commented Jul 8, 2018

This looks good to go! #17900 actually took care of this.

@gfyoung gfyoung closed this as completed Jul 8, 2018
@gfyoung gfyoung modified the milestones: Contributions Welcome, 0.24.0 Jul 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

3 participants