Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure all iterations in Rayon iterators run in the presence of panics #68171

Closed
wants to merge 3 commits into from

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented Jan 13, 2020

This ensures that fatal errors cannot non-deterministically hide errors that occur later.

r? @Mark-Simulacrum

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 13, 2020
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-7 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2020-01-13T05:30:49.9028066Z ##[command]git remote add origin https://github.com/rust-lang/rust
2020-01-13T05:30:49.9110200Z ##[command]git config gc.auto 0
2020-01-13T05:30:49.9185983Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2020-01-13T05:30:49.9247252Z ##[command]git config --get-all http.proxy
2020-01-13T05:30:50.5539966Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/68171/merge:refs/remotes/pull/68171/merge
---
2020-01-13T06:28:32.8741686Z ........................................i...............i........................................... 4900/9518
2020-01-13T06:28:41.7971140Z .................................................................................................... 5000/9518
2020-01-13T06:28:47.9003149Z ...................................................................................i................ 5100/9518
2020-01-13T06:28:53.0984069Z .................................................................................................... 5200/9518
2020-01-13T06:29:02.9838156Z ......................................................ii.ii...........i............................. 5300/9518
2020-01-13T06:29:11.9000245Z .................................................................................................... 5500/9518
2020-01-13T06:29:21.8759454Z .................................................................................................... 5600/9518
2020-01-13T06:29:27.9257426Z .......................................i............................................................ 5700/9518
2020-01-13T06:29:34.1816761Z .................................................................................................... 5800/9518
2020-01-13T06:29:34.1816761Z .................................................................................................... 5800/9518
2020-01-13T06:29:44.5128756Z .................................................................................................... 5900/9518
2020-01-13T06:29:53.8969609Z ..............................ii...i..ii...........i................................................ 6000/9518
2020-01-13T06:30:11.7922168Z .................................................................................................... 6200/9518
2020-01-13T06:30:19.5835631Z .................................................................................................... 6300/9518
2020-01-13T06:30:19.5835631Z .................................................................................................... 6300/9518
2020-01-13T06:30:31.7683193Z ..........................................................i..ii..................................... 6400/9518
2020-01-13T06:30:59.4782615Z .................................................................................................... 6600/9518
2020-01-13T06:31:01.6186267Z ..................................i................................................................. 6700/9518
2020-01-13T06:31:03.7953368Z .................................................................................................... 6800/9518
2020-01-13T06:31:06.2739284Z ..................................i................................................................. 6900/9518
---
2020-01-13T06:32:39.1921665Z .................................................................................................... 7500/9518
2020-01-13T06:32:43.4589455Z .................................................................................................... 7600/9518
2020-01-13T06:32:49.4256651Z .................................................................................................... 7700/9518
2020-01-13T06:32:56.4341513Z .................................................................................................... 7800/9518
2020-01-13T06:33:05.9311431Z ...................................................................................iiii............. 7900/9518
2020-01-13T06:33:21.8877954Z .................i......i........................................................................... 8100/9518
2020-01-13T06:33:26.9071010Z .................................................................................................... 8200/9518
2020-01-13T06:33:39.8946205Z .................................................................................................... 8300/9518
2020-01-13T06:33:49.4264580Z .................................................................................................... 8400/9518
---
2020-01-13T06:35:44.9452896Z 18 For more information about an error, try `rustc --explain E0432`.
2020-01-13T06:35:44.9453062Z 
2020-01-13T06:35:44.9453208Z 
2020-01-13T06:35:44.9453518Z The actual stderr differed from the expected stderr.
2020-01-13T06:35:44.9453998Z Actual stderr saved to /checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/privacy/privacy2/privacy2.stderr
2020-01-13T06:35:44.9454476Z To update references, rerun the tests and pass the `--bless` flag
2020-01-13T06:35:44.9454966Z To only update this specific test, also pass `--test-args privacy/privacy2.rs`
2020-01-13T06:35:44.9455344Z error: 1 errors occurred comparing output.
2020-01-13T06:35:44.9455494Z status: exit code: 1
2020-01-13T06:35:44.9455494Z status: exit code: 1
2020-01-13T06:35:44.9456477Z command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/privacy/privacy2.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/privacy/privacy2" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/privacy/privacy2/auxiliary" "-A" "unused"
2020-01-13T06:35:44.9457147Z ------------------------------------------
2020-01-13T06:35:44.9457314Z 
2020-01-13T06:35:44.9457706Z ------------------------------------------
2020-01-13T06:35:44.9457909Z stderr:
---
2020-01-13T06:35:44.9459510Z 
2020-01-13T06:35:44.9459655Z error[E0603]: function `foo` is private
2020-01-13T06:35:44.9460377Z   --> /checkout/src/test/ui/privacy/privacy2.rs:23:20
2020-01-13T06:35:44.9460628Z    |
2020-01-13T06:35:44.9460812Z LL |     use bar::glob::foo;
2020-01-13T06:35:44.9461079Z 
2020-01-13T06:35:44.9461214Z error: requires `sized` lang_item
2020-01-13T06:35:44.9461350Z 
2020-01-13T06:35:44.9461495Z error: requires `sized` lang_item
---
2020-01-13T06:35:44.9467658Z 12 
2020-01-13T06:35:44.9467782Z 
2020-01-13T06:35:44.9467900Z 
2020-01-13T06:35:44.9468038Z The actual stderr differed from the expected stderr.
2020-01-13T06:35:44.9468489Z Actual stderr saved to /checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/privacy/privacy3/privacy3.stderr
2020-01-13T06:35:44.9469072Z To update references, rerun the tests and pass the `--bless` flag
2020-01-13T06:35:44.9469576Z To only update this specific test, also pass `--test-args privacy/privacy3.rs`
2020-01-13T06:35:44.9470159Z error: 1 errors occurred comparing output.
2020-01-13T06:35:44.9470376Z status: exit code: 1
2020-01-13T06:35:44.9470376Z status: exit code: 1
2020-01-13T06:35:44.9471408Z command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/privacy/privacy3.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/privacy/privacy3" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/privacy/privacy3/auxiliary" "-A" "unused"
2020-01-13T06:35:44.9472061Z ------------------------------------------
2020-01-13T06:35:44.9473039Z 
2020-01-13T06:35:44.9474286Z ------------------------------------------
2020-01-13T06:35:44.9476398Z stderr:
2020-01-13T06:35:44.9476398Z stderr:
2020-01-13T06:35:44.9477111Z ------------------------------------------
2020-01-13T06:35:44.9477331Z error[E0432]: unresolved import `bar::gpriv`
2020-01-13T06:35:44.9477911Z    |
2020-01-13T06:35:44.9477911Z    |
2020-01-13T06:35:44.9478057Z LL |     use bar::gpriv;
2020-01-13T06:35:44.9478224Z    |         ^^^^^^^^^^ no `gpriv` in `bar`
2020-01-13T06:35:44.9478507Z error: requires `sized` lang_item
2020-01-13T06:35:44.9478629Z 
2020-01-13T06:35:44.9478767Z error: requires `sized` lang_item
2020-01-13T06:35:44.9478888Z 
---
2020-01-13T06:35:44.9483207Z thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:387:22
2020-01-13T06:35:44.9483398Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
2020-01-13T06:35:44.9485968Z 
2020-01-13T06:35:44.9488740Z 
2020-01-13T06:35:44.9492421Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-7/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "7.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
2020-01-13T06:35:44.9493104Z 
2020-01-13T06:35:44.9493225Z 
2020-01-13T06:35:44.9503781Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
2020-01-13T06:35:44.9503873Z Build completed unsuccessfully in 0:59:27
2020-01-13T06:35:44.9503873Z Build completed unsuccessfully in 0:59:27
2020-01-13T06:35:44.9556259Z == clock drift check ==
2020-01-13T06:35:44.9577885Z   local time: Mon Jan 13 06:35:44 UTC 2020
2020-01-13T06:35:44.9995220Z   network time: Mon, 13 Jan 2020 06:35:44 GMT
2020-01-13T06:35:44.9995302Z == end clock drift check ==
2020-01-13T06:35:45.4232933Z 
2020-01-13T06:35:45.4330565Z ##[error]Bash exited with code '1'.
2020-01-13T06:35:45.4361950Z ##[section]Starting: Checkout
2020-01-13T06:35:45.4363564Z ==============================================================================
2020-01-13T06:35:45.4363618Z Task         : Get sources
2020-01-13T06:35:45.4363665Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not great that we have to do this, but makes sense too. Overall I'm not sure about the approach. (see followup comment)

@@ -181,46 +199,40 @@ cfg_if! {
($($blocks:tt),*) => {
// We catch panics here ensuring that all the blocks execute.
// This makes behavior consistent with the parallel compiler.
let mut panic = None;
let panic = ::rustc_data_structures::sync::Lock::new(None);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use $crate here? That should make it work anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

src/librustc_data_structures/sync.rs Show resolved Hide resolved
@Mark-Simulacrum
Copy link
Member

I'm not sure that this is quite the right approach to take, at least in the parallel case. I would appreciate @cuviper taking a look -- is there some better way to get this behavior (all "pieces" of a parallel iterator running to conclusion) in rayon?

I'm also a bit worried that we might not actually get the right behavior even with this patch, as I'm not sure if Rayon will ever loose work when a worker thread panics. We should also be careful more generally in that regard, as I would want us to avoid losing track of jobserver tokens and eventually stalling out entirely due to panics in worker threads (Cc @alexcrichton re:lazy spawn, too, in case this has effects there).

@Zoxc Zoxc mentioned this pull request Jan 14, 2020
@alexcrichton
Copy link
Member

In terms of robustness of jobserver tokens it sort of depends on the rayon integration with jobserver, but I don't think it's necessary to catch panics to be robust, we'd just need to audit

@Zoxc Zoxc mentioned this pull request Jan 14, 2020
@Mark-Simulacrum
Copy link
Member

Well, I guess it is true that presumably whenever we do panic we're going to end up failing the build, so it doesn't matter too much in practice whether we're leaking a jobserver token.

@Zoxc
Copy link
Contributor Author

Zoxc commented Jan 16, 2020

I'm not sure that this is quite the right approach to take, at least in the parallel case. I would appreciate @cuviper taking a look -- is there some better way to get this behavior (all "pieces" of a parallel iterator running to conclusion) in rayon?

The way to do this is to not use panics for error reporting in rustc, but just for actual bugs. That is easier said than done though. I also don't think there is a way to get the behavior we want here from Rayon iterators.

I'm also a bit worried that we might not actually get the right behavior even with this patch, as I'm not sure if Rayon will ever loose work when a worker thread panics.

The primitives exposed by rayon_core (join, spawn and scope) will not lose work due to panics.

@bors
Copy link
Contributor

bors commented Jan 16, 2020

☔ The latest upstream changes (presumably #68272) made this pull request unmergeable. Please resolve the merge conflicts.

@Mark-Simulacrum
Copy link
Member

We discussed this a bit in our last parallel meeting, and I felt that we probably don't want to do this, at least not yet. It's not clear that we care about determinism in the error path (which is the only case that this affects, I believe) -- any tests that are currently different between master and parallel can be tagged as -Zthreads=1, I imagine? Or we can do some sorting of the error messages when comparing?

This feels like it may have a nontrivial performance cost, and regardless I would not myself want us to commit to making sure things continue to work (e.g., we don't ICE) after the first fatal error panic in the compiler.

As such, I would not be willing to accept this PR as-is, as I believe it goes in the wrong direction. Happy to hear arguments against that position though!

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 19, 2020
@Zoxc
Copy link
Contributor Author

Zoxc commented Jan 19, 2020

It's not clear that we care about determinism in the error path

It is very clear we care about this. Running the compiler multiple times on the same input should produce the same set of compiler errors.

any tests that are currently different between master and parallel can be tagged as -Zthreads=1, I imagine?

No. All tests already run with -Zthreads=1 so they produce a deterministic error message ordering.

This feels like it may have a nontrivial performance cost

I don't think that is likely, but we can measure that.

I would not myself want us to commit to making sure things continue to work (e.g., we don't ICE) after the first fatal error panic in the compiler.

This is a property we already committed to by having any parallelism at all because work can happen in parallel with the the first fatal error being raised and that work cannot ICE.

@Mark-Simulacrum
Copy link
Member

It is very clear we care about this. Running the compiler multiple times on the same input should produce the same set of compiler errors.

I don't personally think this is true; I would be fine with nondeterministic (and different) output from the compiler when it errors.

To be clear, I think the performance cost isn't critical here, or at least I wouldn't worry about it. We're unlikely to use parallelism at a granularity where it would matter in practice, I suspect.

This is a property we already committed to by having any parallelism at all because work can happen in parallel with the the first fatal error being raised and that work cannot ICE.

I think that's slightly different -- right? In the sense that today, the first fatal error will propagate out and exit the compiler, even if ICEs occur on other threads. Whereas this PR would instead propagate the ICE, I think, if I follow it correctly.

@Zoxc
Copy link
Contributor Author

Zoxc commented Jan 21, 2020

In the sense that today, the first fatal error will propagate out and exit the compiler, even if ICEs occur on other threads.

That's not true, we'll unwind with either the ICE or the fatal error. Being first doesn't matter. That's an unrelated thing we should fix. If there's both an ICE and a fatal error, we should unwind with the ICE essentially giving fatal errors less priority. Currently we could have a scenario where the compiler ICEs, but exits like there was a regular fatal error. It will still print the ICE to stderr though, as that happens before unwinding.

@Mark-Simulacrum
Copy link
Member

I don't follow. In the situation where an ICE only occurs if an error has been encountered (i.e. we have stored a Ty::Err or something in "global" state), then in previous parallel and non-parallel code, we would never hit such an ICE. In the current PR, we would consistently hit that ICE. "If there's both an ICE and a fatal error, we should unwind with the ICE essentially giving fatal errors less priority." -- I disagree; I think if we do have a fatal error already there's no need to print ICEs that occur (possibly as a result of that error). In practice of course I think we can't 100% catch this in parallel code... but that seems like a bad reason to not even try.

I believe that we have an underlying disagreement that goes beyond the specifics of this PR -- I don't think it's true that we need to be deterministic when we're going to fail to compile. You, I believe, think we should be. I don't know how to resolve that disagreement.

@Zoxc
Copy link
Contributor Author

Zoxc commented Jan 30, 2020

It seems like you want to suppress panics that occur after the first fatal error. That seems like a horrible idea to me as it masks compiler bugs, but anyway that is ortogonal to this PR.

The effect of this PR would be to make the set of error messages deterministic for the parallel compiler and also make test output consistent between the parallel and non-parallel compiler with the downside of a probably insignificant performance regression.

Let's see what @rust-lang/compiler has to say. @estebank and @michaelwoerister in particular might have opinions.

@Mark-Simulacrum
Copy link
Member

This PR also has the effect that, in both cases, we always evaluate all elements of some iterators (those that are "par_iter", but the current trajectory is towards many such iterators I believe), regardless of fatal errors or ICEs during that evaluation.

@Zoxc
Copy link
Contributor Author

Zoxc commented Jan 31, 2020

@Mark-Simulacrum Yes, for the non-parallel compiler previous users of par_iter will now execute all the iterations in the presence of panics. This brings it in-line with the other abstractions in sync like parallel!, par_for_each_in and join which ensure that all the "parts" execute before unwinding panics. For the parallel compiler, executing all the iterations in the presence of panics was already a behavior Rayon could take, though in practice it is likely to miss some iterations.

@Zoxc
Copy link
Contributor Author

Zoxc commented Jan 31, 2020

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Jan 31, 2020

⌛ Trying commit 934abf2 with merge 91caf4340bf16936868be675355f00a5e07ccc36...

@bors
Copy link
Contributor

bors commented Jan 31, 2020

☀️ Try build successful - checks-azure
Build commit: 91caf4340bf16936868be675355f00a5e07ccc36 (91caf4340bf16936868be675355f00a5e07ccc36)

@rust-timer
Copy link
Collaborator

Queued 91caf4340bf16936868be675355f00a5e07ccc36 with parent 34700c1, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 91caf4340bf16936868be675355f00a5e07ccc36, comparison URL.

@estebank
Copy link
Contributor

Let's see what ... has to say. @estebank and ... in particular might have opinions.

I'm ok with this as long as we don't kill performance.

@michaelwoerister
Copy link
Member

I don't really have a lot of time to look into this. Is this something that would warrant a design meeting? That would certainly make it easier for me personally to schedule time for it.

@Mark-Simulacrum
Copy link
Member

I don't know that this individually has the weight needed for a design meeting. Perhaps it does. I am not sure that I can drive such a discussion.

I'm going to try and get some folks from the parallel compiler WG to weigh in as well, though I think (hope!) I've faithfully represented those discussions here as well.

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 2, 2020
@crlf0710
Copy link
Member

crlf0710 commented Apr 5, 2020

@rustbot modify labels to +T-compiler

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2020
@nikomatsakis
Copy link
Contributor

Closing this pull request as Zoxc is stepping back from compiler development; see rust-lang/team#316.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Jul 19, 2022
use `par_for_each_in` in `par_body_owners` and `collect_crate_mono_items`

Using `par_iter` in non-parallel mode will cause the entire process to abort when any iteration panics.  So we can use `par_for_each_in` instead to make the error message consistent with parallel mode. This means that the compiler will output more error messages in some cases. This fixes the following ui tests when set `parallel-compiler = true`:
```
    [ui] src/test\ui\privacy\privacy2.rs
    [ui] src/test\ui\privacy\privacy3.rs
    [ui] src/test\ui\type_length_limit.rs
```

This refers to rust-lang#68171

Updates rust-lang#75760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.