Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore comments in tokenizer #68

Closed
hadley opened this issue Mar 11, 2015 · 13 comments
Closed

Ignore comments in tokenizer #68

hadley opened this issue Mar 11, 2015 · 13 comments

Comments

@hadley
Copy link
Member

hadley commented Mar 11, 2015

read.csv() etc support a comment argument to ignore (e.g.) everything after #.

Please 👍 this issue if you'd like this feature.

@hadley hadley changed the title Support for comments Ignore comments in tokenizer Mar 11, 2015
@lmullen
Copy link

lmullen commented Mar 11, 2015

👍 I sometimes encounter CSV files with comments. I'm sorry to say I even used to add them myself.

@jimhester
Copy link
Collaborator

👎 for me, the only time I have used this feature in read.csv() is when it causes an incorrect parse due to my data having # in it and I have to read the man page to turn it off.

If you do decide to add it please make the default off!

@leondutoit
Copy link

👍 default == off, I see people commenting in delimited files all the time

@hadley
Copy link
Member Author

hadley commented Mar 12, 2015

The default would definitely be off - I've also had bad experiences where I had to turn it off

@jennybc
Copy link
Member

jennybc commented Mar 12, 2015

👍

Some instruments write reasonable delimited files but put metadata in the file itself, not necessarily at the top, so this would be a good complement to skip =. Also more flexible than skip =, which I assume requires a specific number of lines.

@PeteHaitch
Copy link

👍 for the reasons @jennybc said.

@davharris
Copy link

👍

I was just about to open an issue to suggest this. My current use case is reading in files produced by the Stan package for MCMC (example output; note the comment lines at the beginning, middle, and end of the file).

Currently, I have to make two passes through the file and do some extra fiddling, and it would be nice if all of that could be automated.

If it doesn't add too much complication, it might be nice if the comments (or at least their positions) could be included as an attribute as well, similar to how "problems" are handled.

Thanks for making yet another great package!

@defconst
Copy link

👍

Sometimes needed for comments and metadata.

@rpruim
Copy link

rpruim commented May 8, 2015

My bad for just adding an issue to request this -- sorry for the cruft. Seems to me that it does no harm when not used and can be make it or break it when the file you need need to read has comments in it.

Frankly, I'd like to see more people put meta data in commented portions of delimited files because otherwise it tends to go missing.

@sjackman
Copy link

👍

@stefano-meschiari
Copy link

👍

1 similar comment
@kcha
Copy link

kcha commented Sep 1, 2015

👍

@hadley hadley closed this as completed in 2ccdde4 Sep 23, 2015
@jennybc
Copy link
Member

jennybc commented Sep 23, 2015

This is very exciting. 🎉

@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests