Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious timeouts compiling librustc on the Android bot #34559

Closed
alexcrichton opened this issue Jun 29, 2016 · 10 comments
Closed

Spurious timeouts compiling librustc on the Android bot #34559

alexcrichton opened this issue Jun 29, 2016 · 10 comments
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason)

Comments

@alexcrichton
Copy link
Member

Looks like the android bot is having trouble compiling librustc in stage1, taking more than 30 minutes to compile it causing the "no output received in 30 minutes" warning to trigger.

I don't think that librustc should take 30 minutes to build on the bots, so something fishy is happening.

@alexcrichton alexcrichton added the A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason) label Jun 29, 2016
@alexcrichton
Copy link
Member Author

Appears that for some reason LLVM isn't optimized, a small sample of perf record shows these as the top functions:

  5.38%  rustc  librustc_llvm-fe3cdf61.so  [.] llvm::Use::get() const
  5.29%  rustc  librustc_llvm-fe3cdf61.so  [.] llvm::Value::getValueID() const
  4.90%  rustc  librustc_llvm-fe3cdf61.so  [.] llvm::User::getOperandList()

These all look trivial like they should be inlined:

Now to figure out why it's not optimized...

@alexcrichton
Copy link
Member Author

Right librustc_llvm is also 100MB where locally it's 42, just another sign it's not optimized.

@alexcrichton
Copy link
Member Author

For now I've just cleaned out the LLVM directory so it'll rebuild from scratch, hoping that it will optimize correctly this time...

@alexcrichton
Copy link
Member Author

Ok, that didn't work, so I blew away the entire build directory. Something about that did the trick. I was unable to reproduce on the bot or figure out what actually happened.

For posterity, the problem looked like it started with a timeline like:

  1. Drive trans from the output of the translation item collector #33890 attempted to land, but bounced because of legit failures on other platforms. This triggered an LLVM rebuild on the Android bot, however, because the llvm-auto-clean-trigger file was touched.
  2. After that bounced hashmap: use siphash-1-3 as default hasher #33940 was tested afterwards. This recompiled LLVM again because llvm-auto-clean-trigger changed again. This then bounced due to failures on other bots, but looked like it was stuck in compiling libsyntax.
  3. All further builds which didn't fail other bots were timing out on the Android bot.

I have no idea why LLVM was recompiling itself without optimizations. This may have something to do where if we run cmake over an existing LLVM directory it somehow corrupts something, but that's just a guess. All attempts to configure LLVM passed -DCMAKE_BUILD_TYPE=Release, so I don't know how a lack of an optimization flag ever leaked in.

In any case, something to remain vigilant for, but nothing is actionable here with a lack of a reproduction so I'm gonna close this. If Android starts timing out again though we should look for a similar pattern of recompilations of LLVM.

Also I have absolutely no idea why this seemed to only affect the Android bot...

@retep998
Copy link
Member

retep998 commented Jul 1, 2016

Why was this closed if a case of this just occurred 10 hours ago? #34577 (comment)

@alexcrichton
Copy link
Member Author

@retep998 as clearly stated here

In any case, something to remain vigilant for, but nothing is actionable here with a lack of a reproduction so I'm gonna close this. If Android starts timing out again though we should look for a similar pattern of recompilations of LLVM.

It appears to not be spurious, so I'm reopening.

@alexcrichton
Copy link
Member Author

FWIW current solution is blow away the obj directory and restart the current build

@alexcrichton
Copy link
Member Author

Ok, more investigation. Wondering why this wasn't plaguing any other bot Linux bot like the linux or linux-cross builders. Turns out they're both using CMake 3.2+ because linux-cross is installing CMake on Ubuntu 15.10 from standard repos and the linux bot goes out of its way to get CMake 3.2 on Ubuntu 14.04. The android bot, however, installs vanilla CMake on Ubuntu 14.04, which means it's running 2.8 instead of 3.2

As a result, I'm tempted to chalk this up to a random cmake bug fixed between 2.8 and 3.2, so I'm going to update the Android builder and see what happens.

alexcrichton added a commit to rust-lang-deprecated/rust-buildbot that referenced this issue Jul 1, 2016
This gives us CMake > 2.8 and should hopefully help fixing rust-lang/rust#34559
@alexcrichton
Copy link
Member Author

New image is baked, configured, and ready to go. All we need now is a buildbot restart. Those are pretty expensive, so I'm going to delay that until this actually happens again or we have a better reason to do so in the meantime.

@alexcrichton
Copy link
Member Author

New image deployed, closing, will reopen if this still happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spurious Area: Spurious failures in builds (spuriously == for no apparent reason)
Projects
None yet
Development

No branches or pull requests

2 participants