-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSX linker segfaulting on Travis #38878
Comments
Is there a way to collect the coredump from the segfault so we could attempt to track down the reason behind the segfault? Perhaps we could at least pass |
@Mark-Simulacrum your guess is as good as mine! |
If you set |
This commit attempts to debug the segfaults that we've been seeing on OSX on Travis. I have no idea what's going on here mostly, but let's try to look at core dumps and get backtraces to see what's going on. This commit itself is mostly a complete shot in the dark, I'm not sure if this even works... cc rust-lang#38878
travis: Attempt to debug OSX linker segfaults This commit attempts to debug the segfaults that we've been seeing on OSX on Travis. I have no idea what's going on here mostly, but let's try to look at core dumps and get backtraces to see what's going on. This commit itself is mostly a complete shot in the dark, I'm not sure if this even works... cc #38878
travis: Attempt to debug OSX linker segfaults This commit attempts to debug the segfaults that we've been seeing on OSX on Travis. I have no idea what's going on here mostly, but let's try to look at core dumps and get backtraces to see what's going on. This commit itself is mostly a complete shot in the dark, I'm not sure if this even works... cc #38878
https://travis-ci.org/rust-lang/rust/jobs/193795162 is the first job where we got a stack trace:
I wouldn't necessarily call that... illuminating |
I wonder if there would be a way to print what the files we're linking are? Maybe that would help since maybe the linker segfaults on an improperly formatted file or something like that; knowing what the files are (names and lengths) may help. I think passing |
PRs are always welcome! I don't have any magical tricks up my sleeves to implement tricks like that unfortunately. |
Next successful stack trace: https://travis-ci.org/rust-lang/rust/jobs/194499380
|
Well the pthreads explains why it's nondeterministic at least... |
This is a complete random shot in the dark to help suppress the OSX linker segfaults being found on rust-lang#38878. The segfault happens apparently during an assertion in [this source file][1]. That apparently is related to a worker thread pool for parsing a bunch of object files. Presumably there's some concurrency bug triggering the segfault? Poking around the source to see if we could disable this multithreading behavior didn't turn up many results, but one check in the [file above][1] was related to `_options.pipelineEnabled()` which seemed suspicious. That in turn is read from [this file] in the `fPipelineFifo` instance variable (if it's non-null). That instance variable is in turn set from [another file][3] as a result of `getenv("LD_PIPELINE_FIFO")`. This PR now sets that env var for all builders, including the OSX ones. Will this help? I have no idea! But it at least seems related and hopefully isn't too hard to try out and/or back out. [1]: https://opensource.apple.com/source/ld64/ld64-274.2/src/ld/InputFiles.cpp.auto.html [2]: https://opensource.apple.com/source/ld64/ld64-274.2/src/ld/Options.h.auto.html [3]: https://opensource.apple.com/source/ld64/ld64-274.2/src/ld/Options.cpp.auto.html
Random attempt to help this: #40243 |
This is a last-ditch attempt to help our pain with dealing with rust-lang#38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug rust-lang#38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to rust-lang#38878.
rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with #38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug #38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to #38878.
… r=arielb1 rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with rust-lang#38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug rust-lang#38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to rust-lang#38878.
… r=arielb1 rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with rust-lang#38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug rust-lang#38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to rust-lang#38878.
… r=arielb1 rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with rust-lang#38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug rust-lang#38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to rust-lang#38878.
rustc: Support auto-retry linking on a segfault This is a last-ditch attempt to help our pain with dealing with #38878 on the bots. A new environment variable is added to the compiler, `RUSTC_RETRY_LINKER_ON_SEGFAULT`, which will instruct the compiler to automatically retry the final linker invocation if it looks like the linker segfaulted (up to 2 extra times). Unfortunately there have been no successful attempts to debug #38878. The only information seems to be that the linker (e.g. `ld` on OSX) is segfaulting somewhere in some thread pool implementation. This appears to be spurious as failed PRs will later merge. The hope is that this helps the queue keep moving without clogging and delaying PRs due to #38878.
Looks like #40422 did the trick, we haven't seen this in ~2 weeks, so closing. |
Fix #38878 again — restart linker when seeing SIGBUS in additional to SIGSEGV. In #45985 (comment) we see a linker crashed due to Bus Error (signal 10) on macOS. The error was not caught by #40422 since the PR only handles Segmentation Fault (signal 11). The crash log indicates the problem is the same as #38878, so we just amend #40422 to include SIGBUS as well. (Additionally, modified how the crash logs are printed so that irrelevant logs are truly filtered out.)
I've seen this quite a lot recently
Example logs:
Example Travis runs:
I'm opening a tracking issue so we can collect some more logs and hopefully draw conclusions from them at some point. Until then I'm not really sure how we'd deal with this...
The text was updated successfully, but these errors were encountered: