Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that all file loading happens via SourceMap #63525

Merged
merged 2 commits into from
Aug 16, 2019

Conversation

matklad
Copy link
Member

@matklad matklad commented Aug 13, 2019

That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using FileLoader for binary
files in the future as well

cc #62948

r? @petrochenkov

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 13, 2019
cx.source_map().new_source_file(file.into(), src);

base::MacEager::expr(cx.expr_str(sp, interned_src))
match cx.source_map().load_binary_file(&file) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether include_str should normalize file contents... This approach does not do normalization, just like the previous version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it should.
The content included by include_str is still text / source code, if it's a long multi-line string encoded on Windows and stored in a file with UTF-8 BOM, then all those \r\ns and BOM shouldn't get into the string for the same reason why any other (non-include_bytes) inclusion method filters them away from string literals.

Am I right that #62948 applied the normalization to include_str as well and crater found no regression?
This should give as the right to make the change (in this PR or in #62948, not much difference).

@petrochenkov
Copy link
Contributor

r=me if you don't plan to address the include_str issue in this PR.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 13, 2019
@matklad
Copy link
Member Author

matklad commented Aug 14, 2019

Am I right that #62948 applied the normalization to include_str as well and crater found no regression?

I think that's not the case

  • it was a check run, so it couldn't have detected this.
  • similar to include_bytes, normalization was applied only to the data stored in the SourceMap, the actual included string was not normalized.
  • we know that code tested by crater doesn't include_str files with isolated \r though.

I've added commit that adds normalization to include_str, which currently means BOM removal. It seems reasonable to me because programs (ie, compiler) are supposed to work the same regardless of the presence of BOM. Moreover, if we don't strip BOM, there's a high chance that it ends up being in the middle of some text stream produced by the program.

This is technically a breaking change, so we should probably FCP it?

@matklad
Copy link
Member Author

matklad commented Aug 14, 2019

we know that code tested by crater doesn't include_str files with isolated \r though.

Actually, even this is not true:

https://github.com/nabijaczleweli/codepage-437/blob/2e17570377121dbd2bab5adebfcb46172804b95b/tests/cp437_control/mod.rs#L7

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-6.0 of your PR failed (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-08-14T07:16:12.7992804Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-08-14T07:16:12.8215170Z ##[command]git config gc.auto 0
2019-08-14T07:16:12.8281514Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-08-14T07:16:12.8340369Z ##[command]git config --get-all http.proxy
2019-08-14T07:16:12.8507856Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/63525/merge:refs/remotes/pull/63525/merge
---
2019-08-14T07:16:47.9082646Z do so (now or later) by using -b with the checkout command again. Example:
2019-08-14T07:16:47.9082682Z 
2019-08-14T07:16:47.9082951Z   git checkout -b <new-branch-name>
2019-08-14T07:16:47.9082984Z 
2019-08-14T07:16:47.9083058Z HEAD is now at a0394cbd6 Merge a9539d26c8a1ad712ab267cf268cf9382e8e3b51 into 60960a260f7b5c695fd0717311d72ce62dd4eb43
2019-08-14T07:16:47.9241723Z ##[section]Starting: Collect CPU-usage statistics in the background
2019-08-14T07:16:47.9244991Z ==============================================================================
2019-08-14T07:16:47.9245055Z Task         : Bash
2019-08-14T07:16:47.9245124Z Description  : Run a Bash script on macOS, Linux, or Windows
---
2019-08-14T08:20:38.4718807Z .................................................................................................... 1300/8876
2019-08-14T08:20:45.0594073Z .................................................................................................... 1400/8876
2019-08-14T08:20:51.5005907Z .................................................................................................... 1500/8876
2019-08-14T08:21:02.3701079Z ....................................................................................i............... 1600/8876
2019-08-14T08:21:10.2804886Z i................................................................................................... 1700/8876
2019-08-14T08:21:17.1908312Z ...........................................................................iiiii.................... 1800/8876
2019-08-14T08:21:39.9297110Z .................................................................................................... 2000/8876
2019-08-14T08:21:42.4636502Z .................................................................................................... 2100/8876
2019-08-14T08:21:45.3137163Z .................................................................................................... 2200/8876
2019-08-14T08:21:53.2129437Z .................................................................................................... 2300/8876
---
2019-08-14T08:25:54.6872874Z .................................................................................................... 5300/8876
2019-08-14T08:26:02.2079100Z .........i.......................................................................................... 5400/8876
2019-08-14T08:26:07.7927947Z .................................................................................................... 5500/8876
2019-08-14T08:26:20.4199569Z .................................................................................................... 5600/8876
2019-08-14T08:26:34.7056198Z ....ii...i..ii...........i.......................................................................... 5700/8876
2019-08-14T08:26:50.8705475Z .................................................................................................... 5900/8876
2019-08-14T08:26:55.7624602Z .................................................................................................... 6000/8876
2019-08-14T08:26:55.7624602Z .................................................................................................... 6000/8876
2019-08-14T08:27:10.2967102Z .....i..ii.......................................................................................... 6100/8876
2019-08-14T08:27:29.6584902Z ................................................i................................................... 6300/8876
2019-08-14T08:27:31.8903728Z .................................................................................................... 6400/8876
2019-08-14T08:27:34.4588392Z ....................i............................................................................... 6500/8876
2019-08-14T08:27:39.1733742Z .................................................................................................... 6600/8876
---
2019-08-14T08:31:45.6341068Z ------------------------------------------
2019-08-14T08:31:45.6341129Z stderr:
2019-08-14T08:31:45.6341351Z ------------------------------------------
2019-08-14T08:31:45.6341597Z thread 'main' panicked at 'assertion failed: `(left == right)`
2019-08-14T08:31:45.6341707Z   left: `[239, 187, 191, 84, 104, 105, 115, 32, 102, 105, 108, 101, 32, 115, 116, 97, 114, 116, 115, 32, 119, 105, 116, 104, 32, 66, 79, 77, 46, 10, 76, 105, 110, 101, 115, 32, 97, 114, 101, 32, 115, 101, 112, 97, 114, 97, 116, 101, 100, 32, 98, 121, 32, 92, 114, 92, 110, 46, 10]`,
2019-08-14T08:31:45.6342309Z  right: `[239, 187, 191, 84, 104, 105, 115, 32, 102, 105, 108, 101, 32, 115, 116, 97, 114, 116, 115, 32, 119, 105, 116, 104, 32, 66, 79, 77, 46, 13, 10, 76, 105, 110, 101, 115, 32, 97, 114, 101, 32, 115, 101, 112, 97, 114, 97, 116, 101, 100, 32, 98, 121, 32, 92, 114, 92, 110, 46, 13, 10]`', /checkout/src/test/ui/include-macros/normalization.rs:4:5
2019-08-14T08:31:45.6342446Z 
2019-08-14T08:31:45.6342672Z ------------------------------------------
2019-08-14T08:31:45.6342719Z 
2019-08-14T08:31:45.6342745Z 
---
2019-08-14T08:31:45.6385221Z thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:536:22
2019-08-14T08:31:45.6385506Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
2019-08-14T08:31:45.6405553Z 
2019-08-14T08:31:45.6405653Z 
2019-08-14T08:31:45.6407500Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
2019-08-14T08:31:45.6408007Z 
2019-08-14T08:31:45.6408033Z 
2019-08-14T08:31:45.6416254Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
2019-08-14T08:31:45.6416339Z Build completed unsuccessfully in 1:08:40
2019-08-14T08:31:45.6416339Z Build completed unsuccessfully in 1:08:40
2019-08-14T08:31:46.3715398Z ##[error]Bash exited with code '1'.
2019-08-14T08:31:46.3776457Z ##[section]Starting: Checkout
2019-08-14T08:31:46.3778418Z ==============================================================================
2019-08-14T08:31:46.3778473Z Task         : Get sources
2019-08-14T08:31:46.3778519Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@matklad matklad force-pushed the centraliza-file-loading branch 2 times, most recently from a5d9445 to 9f09e91 Compare August 14, 2019 09:31
@petrochenkov
Copy link
Contributor

Actually, even this is not true:

That example kinda makes me rethink the prohibition of bare \rs.
Raw strings (including include_str) cannot use escaping and have no choice than to use bare \r if the carriage return needs to be in the string.

Actually, we may want to prohibit only \r\r\n rather than all bare \rs at loading time, this will already ensure that the file can survive repeated EOL conversions (e.g. by Git).

@matklad
Copy link
Member Author

matklad commented Aug 14, 2019

this will already ensure that the file can survive repeated EOL conversions (e.g. by Git).

It looks like Git treats files with lone CR as binary and doesn't convert them:
https://github.com/git/git/blob/e35b8cb8e212e3557efc565157ceb5cbaaf0d87f/convert.c#L90-L103

I also begin to feel that forbidding lone CR causes more problems that it solves. Sole \r does not seem that different from any other weird unicode/ascii junk like \v. I'll implement simple \r\r\n -> \r\n conversion, and we can have a separate "weird whitespace" lint if we want to, but it seems to be unlikely that folks will hit this problem in practice.

@matklad
Copy link
Member Author

matklad commented Aug 14, 2019

Marking as waiting for review. I've changed this to normalize the string, and I think we should move forward with this regardless of #62948.

@rustbot modify labels: -S-waiting-on-author, +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 14, 2019
@petrochenkov
Copy link
Contributor

Let's just keep the status quo behavior of include_str (at least for now) and not introduce a new third flavor of normalization.

I wanted to write out some possible use cases for various flavor of normalization, but I just need to go and sleep now.

TLDR is that BOM+EOL normalization is the common case requirement for text (mod m;, include!, include_str! (simply long strings, generated strings)) and should attempt to to change include_str in this direction and see what breaks (crater, some other crates.io search, live experiment on nightly), but include_bytes_as_str reading bytes without any normalization like include_bytes and packaging them as str instead of [u8; N] would also make sense, even if rarely (some ASCII-based binary protocols requiring \r\n (do they exist?)? Old WinAPI text controls unable to work with \n?).

@matklad matklad force-pushed the centraliza-file-loading branch from 9f09e91 to 0885a8d Compare August 15, 2019 07:38
That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using `FileLoader` for binary
files in the future as well
@matklad matklad force-pushed the centraliza-file-loading branch from 0885a8d to 14bc998 Compare August 15, 2019 07:43
@matklad
Copy link
Member Author

matklad commented Aug 15, 2019

based on #63525 (comment)

@bors r=petrochenkov

I've removed the normalization commit, but I've added the test that normalization doesn't happen.

I think the way forward here is to complete #62948 first, and then have a separate discussion/crater runs about include_str normalization.

but include_bytes_as_str reading bytes without any normalization like include_bytes and packaging them as str instead of [u8; N] would also make sense, even if rarely (some ASCII-based binary protocols requiring \r\n (do they exist?)?

I feel this can be build on top of include_bytes! with 80% of usability, which seems ok for a rare use-case.

fn my_text_file_with_crlf() -> &'static str {
    unsafe { 
        std::str::from_utf8_unchecked(&include_bytes!("my_text_file_with_crlf.txt")[..])
    }
}

#[test] 
fn my_text_file_is_acutualy_a_text_file() {
    std::str::from_utf8(&include_bytes!("my_text_file_with_crlf.txt")[..])
        .unwrap()
}

@bors
Copy link
Contributor

bors commented Aug 15, 2019

📌 Commit 14bc998 has been approved by petrochenkov

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2019
@bors
Copy link
Contributor

bors commented Aug 15, 2019

💡 This pull request was already approved, no need to approve it again.

@bors
Copy link
Contributor

bors commented Aug 15, 2019

📌 Commit 14bc998 has been approved by petrochenkov

Centril added a commit to Centril/rust that referenced this pull request Aug 15, 2019
…petrochenkov

Make sure that all file loading happens via SourceMap

That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using `FileLoader` for binary
files in the future as well

cc rust-lang#62948

r? @petrochenkov
Centril added a commit to Centril/rust that referenced this pull request Aug 16, 2019
…petrochenkov

Make sure that all file loading happens via SourceMap

That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using `FileLoader` for binary
files in the future as well

cc rust-lang#62948

r? @petrochenkov
bors added a commit that referenced this pull request Aug 16, 2019
Rollup of 10 pull requests

Successful merges:

 - #60492 (Add custom nth_back for Chain)
 - #61780 (Finalize the error type for `try_reserve`)
 - #63495 ( Remove redundant `ty` fields from `mir::Constant` and `hair::pattern::PatternRange`.)
 - #63525 (Make sure that all file loading happens via SourceMap)
 - #63595 (add sparc64-unknown-openbsd target)
 - #63604 (Some update for vxWorks)
 - #63613 (Hygienize use of built-in macros in the standard library)
 - #63632 (A couple of comment fixes.)
 - #63634 (ci: properly set the job name in CPU stats)
 - #63636 (ci: move linkcheck from mingw-2 to mingw-1)

Failed merges:

r? @ghost
@bors
Copy link
Contributor

bors commented Aug 16, 2019

☔ The latest upstream changes (presumably #63640) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 16, 2019
@bors bors merged commit 14bc998 into rust-lang:master Aug 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants