Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize all EOLs as LF #163

Closed
stasm opened this issue Aug 3, 2018 · 9 comments
Closed

Normalize all EOLs as LF #163

stasm opened this issue Aug 3, 2018 · 9 comments

Comments

@stasm
Copy link
Contributor

stasm commented Aug 3, 2018

I propose that the AST normalize the parsing of all supported line endings to LF. We already use LF to represent blank lines in multiline patterns. Normalizing line endings in the parsed results removes surprises and ensures consistency across platforms.

Serializers can implement an option to serialize using CRLF if so desired or required by the platform.

In essence, this is similar to git's core.autocrlf = input option.

@Pike
Copy link
Contributor

Pike commented Aug 7, 2018

I'm not strongly opinionated here, but I've looked at this a bit in the light of cross-channel. Interestingly, this change would benefit cross channel in a scenario where an old file has windows line endings, and a new one got that "fixed". With newline normalization, the old and the new white-space are considered equal, and that avoids duplication.

I'm not sure about EOF -> \n. That leaves spans in the AST that extend outside of the original string. That could make tooling harder.

I checked locally, I wouldn't see a positive impact of that in cross-channel, missing trailing newline seems to be supported well by the current behavior.

@stasm
Copy link
Contributor Author

stasm commented Aug 8, 2018

As I make progress in #160, I see that it removes all hardcoded \n from the reference parser. Consequently, this issue might be less pressing than I initially thought it was. I'm happy about the positive-to-neutral feedback so far :)

@zbraniecki
Copy link
Collaborator

This would help high-performance parsers in particular. As I noted in the previous conversation, the EOL problem is a type of problem where most of the time we pay maximal price to satisfy least likely scenario. This feels wrong.

Removing that allows any system that requires CRLF to replace_all before sending it to Fluent, while our parsers can skip paying that price in total.

In favor.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

I think maybe my description of this issue wasn't clear. Or maybe I've misunderstood your comment :) I only propose that we don't store CRLF in the AST. I still believe the parser should accept both LF and CRLF as input. in #154 we only removed the support for the bare CR. Is this how you understood this issue?

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

@Pike Thanks for verifying this proposal in light of the current Mozilla tooling.

I'm not sure about EOF -> \n. That leaves spans in the AST that extend outside of the original string. That could make tooling harder.

Yeah, I'm also not sure about this. This is only important for Junk, right? All other interesting entries don't include the trailing EOL in their spans.

@Pike
Copy link
Contributor

Pike commented Aug 10, 2018

Both #160 and #116 could change that, right?

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

That's not my plan in #160. My plan is for parsers to continue to trim the trailing EOLs when building the AST regardless of how the EBNF defines the grammar. I think the same applies to #116.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

To put it differently: the EBNF is not where we define where spans start and end.

@stasm
Copy link
Contributor Author

stasm commented Aug 10, 2018

We talked about this OTP and the consensus was to go ahead with the normalization of supported EOLs (LF and CRLF). We won't be making any changes to the parsing of EOF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants