-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added utf-8 BOM removal #630
Conversation
After reading some discussions over internet, I would say removing UTF-8 BOM symbol on file format is not a nice idea. |
Why do you think it's not a good idea? Apparently, the BOM is not mandatory for UTF-8 files and messes up with non-UTF-8 applications (including ktlint) that don't expect the non-ASCII characters at the beginning of the file and try to parse them as ASCII characters, resulting in bad parsing of the file. My opinion is that removing it from the start of the file is safe, as the bytes in the BOM cannot be there for any other reason that would need them to be there, and their presence is just preventing ktlint from operating as expected. Do you have any other ideas in mind on how to handle UTF-8 files with BOM? I guess it can probably be dealt with relatively easily when doing just file validation, but when fixing style violations it will likely be trickier, as the whole text content of the file is going to be manipulated. |
For example, I've read following issue: editorconfig/editorconfig#297, where some people complain about removing BOM support. Generally, I would say that it is not responsibility of ktlint to remove BOM on file format. BOM itself does not relate to Kotlin code style and it may happen that people added it intentionally.
Current approach for removing it in
|
Right, makes sense, I updated my PR accordingly. |
@Raibaz could you fix code style? Other then that your PR looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution!
This fixes #272 by adding explicit removal of the UTF-8 BOM from the content of the files being parsed; it also introduces a minor refactoring that reduces code duplication.
Note that this also removes the BOM from the output when formatting files; it shouldn't be a big deal, as UTF-8 files can work without it and it is in fact suggested not to include it.