Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a community supported set of typical converters for read_csv #1180

Closed
timmie opened this issue May 2, 2012 · 6 comments
Closed

create a community supported set of typical converters for read_csv #1180

timmie opened this issue May 2, 2012 · 6 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label

Comments

@timmie
Copy link
Contributor

timmie commented May 2, 2012

Unfortauntely, nearly no input data file uses ISO format but rather random columns and formats.

Following
https://groups.google.com/forum/?fromgroups#!topic/pydata/pZjQMX_avmY

and

#854

I would suggest to insert:

https://github.com/pydata/pandas/tree/master/pandas/tseries/converters.py

Where users could contribute/share typical converters for getting their date and times parsed into a pandas object from input.

@timmie
Copy link
Contributor Author

timmie commented May 7, 2012

OK, I assume that this is not core Pandas stuff.

So we could have a separate pandas.contrib package.

There users could upload their converters along with a sample data file.

The converters may then be documented in proper docstrings.

I did this for sckits.timeseries.tsfromtxt. and it worked quite well.
Only problem is how to assign meaningful names for the converter functions.

What do you think?

@changhiskhan
Copy link
Contributor

I think this is a great idea. We hope to make an announcement about this once v0.8 is released and the API is stable.

In the mean time, would you be interested in taking the lead and create pandas/io/converters.py with a some docs and a few sample converters? Further feedback on the converter API/interface would be greatly appreciated.

@timmie
Copy link
Contributor Author

timmie commented May 7, 2012

Yes, sure.

But I'd rather wait until the mutli-column date time functionality is there:
#1186
#1174

@timmie
Copy link
Contributor Author

timmie commented May 8, 2012

Here an example (still for tsfromtxt):

def dc_h_0to23_cols(year, month, day, hour):
    """column separated datetime counting 0-23 

    .. csv-table:: Hourly Values: 0-23
           :header: "YYYY", "MM", "DD", "HH:MM", "value"
           :delim: ;

           2004;2;1;00:00;0
           2004;2;1;01:00;0
           [...];[...];[...];[...];[...]
           2004;2;1;22:00;0
           2004;2;1;23:00;0

    Note
    -----
    assumed datecols::

        datecols = (0,1, 2, 3)

    """

@ghost
Copy link

ghost commented Jan 1, 2014

It seems to me like this wiill either stagnate or grow into a melange of tailored
solutions to the 1001 weird data problems found in the wild that most pepole won't see.

I don't think users will look for these recipes when they encounter these problems
in their own data. They'll either hack a collection of helpers to suit the data they
work with or just solve the problem with a once-off. There's no general pattern here
to grow into a coherent collection of solutions.

The idea of a pandas.contrib is interesting in itself, not clear conception of that project
yet. We'll wait for that concensus to materialize.

closing.

@timmie
Copy link
Contributor Author

timmie commented Jan 1, 2014

@y-p
I understand that you want to close this very stalled PR.
But the solution is not understood:
What if we add an example file for each converter template?
I think it could be a useful resource...

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

2 participants