stable rustc --version hangs forever #56736

fuchsnj · 2018-12-12T03:27:19Z

starting with nightly-2018-11-04 and anything later, just checking the version or doing anything with rustc causes the process to hang forever. It seems like it is waiting on a lock (no CPU usage)

Stable rust works fine, and well as any nightly version before this one.

OS is Ubuntu 18.04.1 LTS
Rust stable/nightly versions were installed with rustup

nathan@nathan-Precision-7510:~$ rustc +nightly-2018-11-04 --version
^C
nathan@nathan-Precision-7510:~$ rustc +nightly-2018-11-03 --version
rustc 1.32.0-nightly (8b096314a 2018-11-02)

The last few lines from strace rustc +nightly-2018-11-04 --version are

...
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
readlink("/etc/malloc.conf", 0x7ffd8dd4a900, 4096) = -1 ENOENT (No such file or directory)
open("/proc/sys/vm/overcommit_memory", O_RDONLY) = 3
futex(0x7fbdfd2510c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x7fbdfe014228, FUTEX_WAIT_PRIVATE, 2, NULL

It hangs on the FUTEX_WAIT call forever

The text was updated successfully, but these errors were encountered:

ehuss · 2018-12-12T03:46:47Z

Another user reported the same issue here: rust-lang/cargo#6384

That is the first release that removed jemalloc. I would suspect that is related, but I don't have any ideas on how to reproduce.

fuchsnj · 2018-12-12T04:02:33Z

Here is a stacktrace of the stuck process
https://gist.github.com/fuchsnj/5612923e3613a915b65aece0dd920149

This was captured with gdb on a stuck process running rustc +nightly --version
I just updated my nightly version, so it should be running the latest. (Couldn't really tell you which version that was though...)

ehuss · 2018-12-12T04:26:47Z

Do you have ESET antivirus installed (or any other security software)?

I installed ESET and I'm able to reproduce it. It looks like jemalloc is getting stuck recursively trying to initialize itself.

@alexcrichton My guess is that there is something about jemalloc 5 has changed how it initializes maybe?

fuchsnj · 2018-12-12T04:30:03Z

Yes, I have ESET antivirus installed.

alexcrichton · 2018-12-12T05:09:52Z

cc @gnzlbg, do you know if jemalloc has a fix for this perhaps?

ehuss · 2018-12-12T06:58:03Z

It appears to be an issue with ESET, jemalloc 5, and rustc being built for an old kernel.

It looks like jemalloc 5 has started to use CLOEXEC. Since rust is built with a very old linux kernel, it has to use fcntl (here) instead of just passing O_CLOEXEC (which requires 2.6.23). fcntl is intercepted by ESET which attempts to find the open symbol with dlsym. dlsym requires calling calloc, which hangs since jemalloc is in the middle of initializing.

I have confirmed building rustc locally (with jemalloc) that it does not hang, presumably because it is using O_CLOEXEC.

I don't offhand see any workarounds (other than using a newer kernel).

gnzlbg · 2018-12-12T08:50:32Z

I don't offhand see any workarounds (other than using a newer kernel).

The PR that started using CLOEXEC was jemalloc/jemalloc#872 which fixed jemalloc/jemalloc#528 . We could patch jemalloc to not use CLOEXEC when built for Rust, but... it looks to me that jemalloc is doing the right thing here, and that this is a corner case that should be handled in ESETs side.

We should open a bug with ESET about their jemalloc 5 / fcntl / Rust support, maybe they can roll a fix quickly. Depending on their timeline, patching jemalloc to not use CLOEXEC shouldn't be hard: when will the first stable Rust version with this issue land? I think we should consider it a regression.

alexcrichton · 2018-12-12T15:18:55Z

Opening a bug (if we can) with ESET sounds good to me for now, but if that doesn't pan out we can probably work around this and just not use cloexec there as it's a short-lived fd anyway

pnkfelix · 2018-12-14T11:27:26Z

Something I think deserves clarification here: as of PR #55238 (resolving issue #36963), builds of rustc stopped linking in jemalloc by default; however, if I am correctly reading the documentation and commit messages of that PR, the rustc built via CI opts back into having jemalloc linked to rustc (on Linux or Mac OS X). (and thus the nightly you get via rustup or otherwise downloading CI-built executables will link to jemalloc).

Its a pretty confusing situation, IMO, since attempts to locally replicate the behavior described here via a local build of rustc would need to turn that flag back on. (I think @ehuss is saying in their comment above that they took care to opt back into jemalloc in their local build. But it is easily overlooked.)

(Also: the CI's opting back into using jemalloc affects not just the nightly builds but also the beta and stable ones......? I'm a bit flummoxed as to why we would want the out-of-the-box local build to differ in this way from what we deploy. At the very least I would expect more prominent documentation as to how to properly recreate the CI's build.)

gnzlbg · 2018-12-14T11:32:24Z

the rustc built via CI opts back into having jemalloc linked to rustc.

IIUC the intent was for rustc to always depend on jemalloc by default, since that was the status-quo before that change, but to allow people to build it without jemalloc, e.g., if they want to use it in a system where jemalloc is not available. It might be that this did not fully materialize.

pnkfelix · 2018-12-14T11:37:55Z

Yes, I too thought that was the intent. But...:

rust/config.toml.example

Lines 399 to 401 in f4b07e0

    
           # Link the compiler against `jemalloc`, where on Linux and OSX it should 
        
           # override the default allocator for rustc and LLVM. 
        
           #jemalloc = false

It could very well be that I am wrong about my expectations, and that if one wants to replicate the CI build product, one should take care to actually run configure with args taken from e.g.:

rust/.travis.yml

Line 33 in f4b07e0

    
                   RUST_CONFIGURE_ARGS="--enable-extended --enable-profiler --enable-lldb --set rust.jemalloc"

(or with configure args taken from https://github.com/rust-lang/rust/blob/master/appveyor.yml, as appropriate to one's platform)

pnkfelix · 2018-12-14T11:43:55Z

I'm just going to open a separate issue about this discrepancy between the CI vs local builds, rather than continuing to clutter up this issue's comment thread. Sorry for the noise!

pnkfelix · 2018-12-20T14:33:38Z

T-compiler triage. This issue is tagged as a regression but no T-label, so no team has default responsibility for it. Based on the comments in the issue here, I do not think T-compiler is in a position to fix this; it seems to be probably a T-infra problem? (And a problem that T-infra may well choose to close as "wont-fix")

turboladen · 2019-01-15T18:22:02Z

FWIW, we started getting this after updating to 1.31 on Ubuntu 14.04.5 LTS, but we are not running ESET. So far it's only only one one instance in our AWS stack, but that instance is responsible for a whole feature set in our beta environment. We've tried 1.31.0 and 1.31.1 so far. Also have reinstalled rustup + rust. So far the behavior is pretty consistent. We don't, however get this on the same feature set's staging instance, so we're trying to track down the differences. Hopefully we'll find something as this is currently hanging up the dev+QA cycle right before a scheduled app release.

fuchsnj · 2019-01-26T18:53:24Z

As of Rust 1.32 (Jan 17th) this now affects the latest Rust stable version.

aidanhs · 2019-02-04T17:17:03Z

We discussed this in the infra team meeting a few weeks, and basically decided (given that this appears a strange interaction between jemalloc and ESET we're stuck in the middle of) to wait until beta (and stable!) to see if more people reported the issue.

Given that we've not seen more reports, unfortunately this isn't going to be something we prioritise - our hope is that either jemalloc or ESET fixes things.

That said - @turboladen, we're interested in your report. Did you manage to track anything down, e.g. via strace?

turboladen · 2019-02-04T17:42:20Z

@aidanhs unfortunately we didn't really get time to get much info on it. We ended up cloning over the AWS image we had for our same app from a different environment (beta was having the problem described in this ticket, staging was not); I do believe, however, that we had jemalloc installed on the beta instance (trying to help speed up Rails), but did not have jemalloc on the staging instance.

antekone · 2019-04-01T13:05:09Z

Just FYI, ESET has fixed the issue in version 4.5.14.0. So if anyone uses older product version and suffers from this hang, please try updating to 4.5.14.0 or later.

estebank added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Dec 12, 2018

fuchsnj changed the title ~~nightly rustc --version hangs forever (regression)~~ nightly rustc --version hangs forever Dec 12, 2018

pnkfelix mentioned this issue Dec 14, 2018

Why is jemalloc linked to rustc by default *only* via CI? #56812

Closed

ehuss mentioned this issue Dec 17, 2018

cargo install never ends on nightly and linux rust-lang/cargo#6384

Closed

pietroalbini added the I-nominated label Jan 3, 2019

fuchsnj changed the title ~~nightly rustc --version hangs forever~~ stable rustc --version hangs forever Jan 26, 2019

aidanhs removed the I-nominated label Feb 4, 2019

ehuss mentioned this issue Feb 22, 2019

I tried installing rust using rustup but after install rustc isn't working. rust-lang/rustup#1669

Closed

nagisa mentioned this issue Mar 2, 2019

rustc segmentation fault #58871

Closed

pnkfelix mentioned this issue Mar 11, 2019

rustc segfault #59032

Closed

fuchsnj closed this as completed Apr 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stable rustc --version hangs forever #56736

stable rustc --version hangs forever #56736

fuchsnj commented Dec 12, 2018

ehuss commented Dec 12, 2018

fuchsnj commented Dec 12, 2018 •

edited

Loading

ehuss commented Dec 12, 2018

fuchsnj commented Dec 12, 2018

alexcrichton commented Dec 12, 2018

ehuss commented Dec 12, 2018

gnzlbg commented Dec 12, 2018 •

edited

Loading

alexcrichton commented Dec 12, 2018

pnkfelix commented Dec 14, 2018 •

edited

Loading

gnzlbg commented Dec 14, 2018

pnkfelix commented Dec 14, 2018

pnkfelix commented Dec 14, 2018

pnkfelix commented Dec 20, 2018 •

edited

Loading

turboladen commented Jan 15, 2019

fuchsnj commented Jan 26, 2019

aidanhs commented Feb 4, 2019

turboladen commented Feb 4, 2019

antekone commented Apr 1, 2019

stable rustc --version hangs forever #56736

stable rustc --version hangs forever #56736

Comments

fuchsnj commented Dec 12, 2018

ehuss commented Dec 12, 2018

fuchsnj commented Dec 12, 2018 • edited Loading

ehuss commented Dec 12, 2018

fuchsnj commented Dec 12, 2018

alexcrichton commented Dec 12, 2018

ehuss commented Dec 12, 2018

gnzlbg commented Dec 12, 2018 • edited Loading

alexcrichton commented Dec 12, 2018

pnkfelix commented Dec 14, 2018 • edited Loading

gnzlbg commented Dec 14, 2018

pnkfelix commented Dec 14, 2018

pnkfelix commented Dec 14, 2018

pnkfelix commented Dec 20, 2018 • edited Loading

turboladen commented Jan 15, 2019

fuchsnj commented Jan 26, 2019

aidanhs commented Feb 4, 2019

turboladen commented Feb 4, 2019

antekone commented Apr 1, 2019

fuchsnj commented Dec 12, 2018 •

edited

Loading

gnzlbg commented Dec 12, 2018 •

edited

Loading

pnkfelix commented Dec 14, 2018 •

edited

Loading

pnkfelix commented Dec 20, 2018 •

edited

Loading