-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniform file IO API and consolidated codebase #15008
Comments
Here are my thoughts on the API.
Regarding the consolidated codebase:
|
That sounds great! Regarding the py2/py3 separation, I think we should just do what is most practical here (having a certain separation makes the code more clear, too much separation can make it more complex again. In any case, having a few but scattered One more consolidation that would be possible for |
Let's wait for #13317 and any other IO PRs that I don't know about to be merged. I'm hesitant to commit since I know it will cut into my other obligations. But if no one else is interested in implementing, I'll consider.
Totally agree. There are still a few things I need to understand before I can make that call. One issue is
Agree the |
@dhimmel can you annotate the above (or maybe make it a table) add an x/check if supports pathlib like things / compression / url |
agree! 👍
It seems to be better to spilt _get_handle into two or more functions to make each single function simpler |
@gfyoung can you evaluate this issue, e.g. close, tick boxes, etc. |
@jreback : This looks to be a much more substantial refactoring at the moment. The checkboxes were more of an enumeration of methods instead of actual tasks AFAICT. |
Request for API consistency between to_sql and to_gbq: Desired solution:
Do you prefer having a separate ticket? |
Yes, the |
There are at least three things that many of the IO methods must deal with: reading from URL, reading/writing to a compressed format, and different text encodings. It would be great if all io functions where these factors were relevant could use the same code (consolidated codebase) and expose the same options (uniform API).
In #14576, we consolidated the codebase but more consolidation is possible. In
io.common.py
, there are three functions that must be sequentially called to get a file-like object:get_filepath_or_buffer
,_infer_compression
, and_get_handle
. This should be consolidated into a single function, which can then delegate to sub functions.Currently, pandas supports the following io methods. First for reading:
And then for writing:
Some of these should definitely use the consilidated/uniform API, such as
read_csv
,read_html
,read_pickle
,read_excel
.Some functions perhaps should be kept separate, such as
read_feather
orread_clipboard
.The text was updated successfully, but these errors were encountered: