-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does agate use lineterminator LF instead of Python's default CRLF? #669
Comments
I don't recall the exact reasoning for this, but as with all the output formatting it was done to maximize compatibility with something or other. I'm not particularly persuaded by that RFC because it only "documents the format that seems to be followed by most implementations." Also, this is the first complaint I've heard about it. That being said, I'm always open to being wrong if somebody can show me a good example of this breaking things. (And demonstrate that CRLF won't break things elsewhere!) |
Firstly, I think one tool shouldn't modify something without telling the user. I can understand that if you have [LF] in input you should have [LF] in output. But if I have [CRLF] in input, why should I have [LF] output? At least, you should tell the user of this transformation, even if you don't explain it. Some old softwares, specially in the Windows world, are not able to read correctly documents with [LF]: NotePad, for example, is often used to open CSV files by beginners. Secondly, I know that the RFC "documents the format that seems to be followed by most implementations" but why modify the end-of-line when people would like to follow the only "specification" that exist. Some people wants to follow the RFC, even if it's not really a clear specification. I'm currently writing a document for new open data publishers. CSV Lint, which I recommand to them, is yelling when we have [lf] end of lines:
This make difficult to promote csvclean because I have to provide them a horrible command line as a workaround: Third point:
I'm not able to demonstrate that CRLF won't break things elsewhere. Don't you have unit tests and real life CSV files to give a try? I'm not a programmer but if you want to, I can build real life CSV test files with different sort of end-of-lines, encodings and so on. |
@CharlesNepote Hey Charles.
You are welcome to your beliefs, but this runs directly contrary to the stated intention of csvkit. The relevant documentation is in the Principles of Development. In short, we always prefer a uniform output format. This is the only way we can effectively support piping between the tools. (And it's also just good practice to encourage people to use common output formats.)
I get that. And I even encourage it. However, that spec existed when I created csvkit and I saw no evidence anyone took it seriously. Perhaps its better supported today than it was. In either case, the vast majority of software will parse either LF or CRLF correctly.
I agree this is annoying and I'll consider changing the default in the next major release, however, this is the first complaint I've heard so I'm skeptical it's a common use-case. CSVLint could also add a flag to ignore the line endings...
Yeah, tons of them, but what you're describing isn't a unit testing issue, it's an integration issue. csvkit is working as intended. If others care to weigh in on this issue, I'm all ears. |
Thanks for your answers.
Ok, I understand quite well and this point of view has also clear advantages. At least there might be a small documentation and UI issue:
If you agree, I'll open a generic bug report in csvkit: "Better explain common transformations in each csvkit tool". And I'll try to make more precise suggestions. That said, I'm still a bit disappointed that two great and so useful tools such as csvkit and CSVLint doesn't share the same ideas... That's life ;) Maybe there are more tools that tend to strictly follow the RFC but I can't say. I'll try to investigate. Maybe you're right and this issue only affects a few tools such as NotePad.
I don't think it would be good: CSVLint UI is probably better with as less options as possible. At least they could write a better error message not to frighten users. |
@CharlesNepote Yes, please open a documentation issue on csvkit. If you can also open a pull request, we will be able to make the changes more quickly. |
wireservice/csvkit#792
The text was updated successfully, but these errors were encountered: