Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

rustfix doesn't work on files with CRLF line endings #176

Closed
Rantanen opened this issue Oct 1, 2019 · 10 comments
Closed

rustfix doesn't work on files with CRLF line endings #176

Rantanen opened this issue Oct 1, 2019 · 10 comments

Comments

@Rantanen
Copy link

Rantanen commented Oct 1, 2019

I had trouble earlier where cargo fix --edition resulted in a lot of mangled code, such as use ::foo::bar; -> usecrate::foo::barr; with the first space being omitted and last character duplicated.

I tracked this into the way span offsets and replacements work. As far as I could gather, the span offsets count both LF and CRLF as a single character, this means each CRLF makes span offsets off by one.

However I'm a bit confused on this for two reasons:

  • "Does not work with CRLF" would seem like a problem that most people on Windows would encounter, yet I couldn't find any references to this issue anywhere. The 2018 edition has been out for almost a year so I would have expected to find reports of this elsewhere.
  • The tests/parse_and_replace.rs has a not(windows) cfg with a comment to fix these tests on Windows; So at some point a problem with the tests has been recognized, but no further issue has been raised - so I'm assuming the breakage has only affected tests and the fixes themselves have been working just fine on Windows.

Is the span change a recent breakage by rustc/cargo or is this something specific to my environment? I'm testing this with a recent nightly (2019-09-30).

@ehuss
Copy link
Collaborator

ehuss commented Oct 1, 2019

This looks to be a relatively recent breakage in 1.39, most likely caused by rust-lang/rust#62948. It is now reporting post-normalized byte offsets. That seems like a bug in rustc to me.

cc @matklad, were you aware that JSON diagnostic spans are now reporting incorrect byte offsets? Is that something that can be easily fixed? To me, this seems like a pretty big regression for cargo fix to no longer work for windows users, and something that would be good to get fixed for beta.

@matklad
Copy link
Member

matklad commented Oct 1, 2019

No, I wasn't aware of that. Note, however, that that PR didn't modify any tests. That I think is reasonable, IIRC, we use (line, offset) coordinates, and those are invariant with respect to line endings.

So maybe something interprets those offsets wrong?

Additionally, I should note that Rust since-forever did some normalization (namely, BOM-stripping), so if some code relied on the fact that offsets are non-normalized, that code was technically not correct (albeit in a pretty narrow edge case).

@matklad
Copy link
Member

matklad commented Oct 1, 2019

Ah, I see now that JSON reports both (line, col) and byte offsets coordinates. We definitely should fix the latter, both for \r\n and for BOM. Presumably, rust-lang/rust#62948 didn't change any tests because there were no tests with BOM / \r\n?

Not sure how annoying this would be to fix: computing the table of "adjustments" and storing it alongside SourceFile shouldn't be hard. Threading this info to the place where we produce byte_start, byte_end might be tricker.

Additionally, I am traveling this week, so won't be able to look into this immediately.

cc @petrochenkov

@Rantanen
Copy link
Author

Rantanen commented Oct 1, 2019

Oh, and I guess the reason for the #[cfg(not(windows))] in rustfix parse_and_replace.rs tests is that those tests depend on recorded rustc output that has been recorded in LF-only environment and thus the byte offsets there are incompatible with CRLF line endings in Windows.

@ehuss
Copy link
Collaborator

ehuss commented Oct 1, 2019

Ah, yea, it looks like rustfix uses the byte_start/end values (not the line/column values). I think it could be switched to line/column, but I don't know how hard that would be. But the byte_start/end values should be fixed, otherwise they are misleading/wrong.

I'm not surprised there aren't any BOM/CRLF tests for the JSON output. I see maybe 4 tests specifically for JSON, so it doesn't really have much explicit test coverage at all.

@Rantanen
Copy link
Author

Rantanen commented Oct 2, 2019

I had a quick look at this with the intention of taking a stab at implementing the fix in rust-lang/rust. I'll try to file an issue over at rust-lang/rust at some point, but given it's late, I'm lazy and such issue doesn't exist yet, I'd love to get a comment on the following here:

My current plan is as follows:

  • Add normalized_chars: Vec<NormalizedChar> to libsyntax_pos::SourceFile
  • In SourceFile::new, add &mut Vec<NormalizedChar> parameter to both the remove_bom and normalize_newlines functions and fill that Vec as those functions strip characters from the original source string.
  • Assume the only bit that is interested in the diagnostics reporting is the DiagnosticSpan in libsyntax::json. Fortunately the from_span_full there seems to have access to the SourceFile itself (Receiving libsyntax_pos::Loc from the JsonEmitter -> SourceMapper and the Loc has a Lrc<SourceFile> as one of the fields).

@matklad, does this make any sense or do you foresee any problems with this approach?

@matklad
Copy link
Member

matklad commented Oct 2, 2019 via email

@Rantanen
Copy link
Author

Rantanen commented Oct 2, 2019

Yeah that makes sense, although I planned on using i32 as the offset to allow negative offsets - not currently needed, but given they shouldn't complicate the actual implementation either, I figured we could just as well support the hypothetical scenario where normalization adds bytes in the future anyway.

Also I feel a signed value would make it (very slightly) clearer that the data is relative offset instead of absolute position.

I was also playing with the idea of using (usize, usize) to mark the last normalized character with the final calculation being final = result.1 + (original - result.0) - this would make things even more awesome and flexible as then we could support non-linear normalizations(!!!), but I don't think I want to contribute anything towards such insanity. :)

Centril added a commit to Centril/rust that referenced this issue Oct 8, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
Centril added a commit to Centril/rust that referenced this issue Oct 8, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
matklad added a commit to matklad/rust that referenced this issue Oct 10, 2019
…, r=petrochenkov"

This reverts commit ef1ecbe, reversing
changes made to fc8765d.

That changed unfortunately broke rustfix on windows:

rust-lang/rustfix#176

Specifically, what ef1ecbe did was to
enforce normalization of \r\n to \n at file loading time, similarly to
how we deal with Byte Order Mark. Normalization changes raw offsets in
files, which are exposed via `--error-format=json`, and used by rusfix.

The proper solution here (which also handles the latent case with BOM) is

rust-lang#65074

However, since it's somewhat involved, and we are time sensitive, we
prefer to revert the original change on beta.
Centril added a commit to Centril/rust that referenced this issue Oct 25, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
Centril added a commit to Centril/rust that referenced this issue Oct 25, 2019
Fix the start/end byte positions in the compiler JSON output

Track the changes made during normalization in the `SourceFile` and use this information to correct the `start_byte` and `end_byte` fields in the JSON output.

This should ensure the start/end byte fields can be used to index the original file, even if Rust normalized the source code for parsing purposes. Both CRLF to LF and BOM removal are handled with this one.

The rough plan was discussed with @matklad in rust-lang/rustfix#176 - although I ended up going with `u32` offset tracking so I wouldn't need to deal with `u32 + i32` arithmetics when applying the offset to the span byte positions.

Fixes rust-lang#65029
Mark-Simulacrum pushed a commit to Mark-Simulacrum/rust that referenced this issue Oct 26, 2019
…, r=petrochenkov"

This reverts commit ef1ecbe, reversing
changes made to fc8765d.

That changed unfortunately broke rustfix on windows:

rust-lang/rustfix#176

Specifically, what ef1ecbe did was to
enforce normalization of \r\n to \n at file loading time, similarly to
how we deal with Byte Order Mark. Normalization changes raw offsets in
files, which are exposed via `--error-format=json`, and used by rusfix.

The proper solution here (which also handles the latent case with BOM) is

rust-lang#65074

However, since it's somewhat involved, and we are time sensitive, we
prefer to revert the original change on beta.
@ehuss
Copy link
Collaborator

ehuss commented Oct 28, 2019

This appears to be fixed in the latest beta and nightly. Should this issue be closed?
And thanks for working on the fix @Rantanen!

@Rantanen
Copy link
Author

Oh, probably! Forgot this one got left open.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants