Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi Wes
Firstly, congrats on such an amazing project! I love prototyping in python / numpy / ipython, but I always envy some of R's features. I tried pandas out about 9 months ago and although it was interesting, it seemed very rough around the edges. Now, however, it is looking really polished and I've been using it for prototyping and testing some trading models, and everything works extremely well. I hope it keeps growing, together with the integration with scikits statsmodels/timeseries and maybe even scikits.learn in future ...
Anyway, as I was starting to dive into the code, I came across the
read_csv
functions and noticed that there was full duplication inread_table
. Thecsv
module in python actually has full support for arbitrary delimiters, so there is no need for the duplication. Also, there iscsv.Sniffer().sniff(sample)
that attempts to sniff out the delimiter automatically. This commit tries to "magically" handle any arbitrary CSV file without needing to specify a separator, whether separated by blank spaces, tabs, commas, semicolons or other weird separators (I have a file at work with "^" separators :). If it doesn't work, one can fall back on specifying the separator (soread_csv
looks more likeread_table
). In future it could make sense to simply have oneread_data
orread_table
function.Incidentally, the
csv.Sniffer()
also tries to sniff out other things like quote escaping and double quoting, but this commit effectively only uses it for the delimiter. If problems with quote / string escaping crop up with users one could always let the sniffer try to figure out the full dialect.