Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor change to csv reading #146

Closed
wants to merge 3 commits into from
Closed

Conversation

MLnick
Copy link

@MLnick MLnick commented Sep 16, 2011

Hi Wes

Firstly, congrats on such an amazing project! I love prototyping in python / numpy / ipython, but I always envy some of R's features. I tried pandas out about 9 months ago and although it was interesting, it seemed very rough around the edges. Now, however, it is looking really polished and I've been using it for prototyping and testing some trading models, and everything works extremely well. I hope it keeps growing, together with the integration with scikits statsmodels/timeseries and maybe even scikits.learn in future ...

Anyway, as I was starting to dive into the code, I came across the read_csv functions and noticed that there was full duplication in read_table. The csv module in python actually has full support for arbitrary delimiters, so there is no need for the duplication. Also, there is csv.Sniffer().sniff(sample) that attempts to sniff out the delimiter automatically. This commit tries to "magically" handle any arbitrary CSV file without needing to specify a separator, whether separated by blank spaces, tabs, commas, semicolons or other weird separators (I have a file at work with "^" separators :). If it doesn't work, one can fall back on specifying the separator (so read_csv looks more like read_table). In future it could make sense to simply have one read_data or read_table function.

Incidentally, the csv.Sniffer() also tries to sniff out other things like quote escaping and double quoting, but this commit effectively only uses it for the delimiter. If problems with quote / string escaping crop up with users one could always let the sniffer try to figure out the full dialect.

@wesm
Copy link
Member

wesm commented Sep 17, 2011

Rebased into wesm/master-- thanks for these changes, a big help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants