Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis OSX build time very slow #6239

Closed
ehuss opened this issue Oct 30, 2018 · 14 comments · Fixed by #6254
Closed

Travis OSX build time very slow #6239

ehuss opened this issue Oct 30, 2018 · 14 comments · Fixed by #6254

Comments

@ehuss
Copy link
Contributor

ehuss commented Oct 30, 2018

The OSX job on Travis seems to be getting significantly slower over time, and has been getting timed out recently.

I charted the time of the OSX job for the past 1000 builds: https://docs.google.com/spreadsheets/d/1BKJNq3bc8NDb5Q-bkRAQOTIaYVbgBejPsm4owDqkYeo/edit?usp=sharing

There is a huge jump starting about 3 months ago: https://travis-ci.org/rust-lang/cargo/builds/410508080 I don't think that PR is responsible, I tested before and after and there was no measurable time difference.

I'm wondering if something has changed with Travis's OSX setup. @alexcrichton do you have any insight? I can continue to investigate if not.

@ehuss
Copy link
Contributor Author

ehuss commented Oct 30, 2018

Ah, on July 31 Travis changed the default OS X image: https://blog.travis-ci.com/2018-07-19-xcode9-4-default-announce I wonder how that would have had such a large impact.

@alexcrichton
Copy link
Member

Oh dear thanks for investigating this! I'm not really sure what may have changed, but the image update definitely sounds suspicious. I wonder if that was the time that it switched to APFS as well for a filesystem? Maybe we're doing something that's heavily non-optimized there?

@ehuss
Copy link
Contributor Author

ehuss commented Oct 30, 2018

I don't have any good news, yet. I have seen some things that I don't yet understand.

  • xcode9.4 and xcode10 both have the same problem (they are OS X 10.13).
  • xcode9.2 (OS X 10.12) finishes in about 22 minutes. However, it typically has around 10-20 test failures due to mtime issues.
  • The 10.13 images use HFS, which is a little surprising. It still has 1-second mtime resolution. However, they don't encounter the same mtime failures that the 10.12 images have. I don't know why.
    • Running locally on my machine, there is not much time difference between APFS and HFS performance-wise (HFS is actually a little slower than APFS).
  • Building cargo itself takes about the same amount of time (about 5m 20s).
  • Tests on 10.12 take about 15m, on 10.13 it is 41m (if it even succeeds within the time limit).
  • Some tools report minor hardware differences between the images, but nothing that looks particularly fishy.

I'm running low on ideas. I'm thinking of doing some more fine-grained timing tests. Let me know if you have any other ideas.

@alexcrichton
Copy link
Member

The 10.13 images use HFS

To confirm, you mean APFS?

I've not noticed any slowdown locally myself, so this mostly have to do with Travis's configuration of macs maybe? It does seem related to APFS for sure...

Beyond that though I don't know what would cause this :(

@ehuss
Copy link
Contributor Author

ehuss commented Oct 30, 2018

To confirm, you mean APFS?

No, HFS. From mount:

/dev/disk0s2 on / (hfs, local, journaled)

And I wrote a little Python script to verify the mtime behavior. And it's documented as HFS at https://docs.travis-ci.com/user/reference/osx/#file-system and https://docs.travis-ci.com/user/reference/overview/#virtualisation-environment-vs-operating-system.

@alexcrichton
Copy link
Member

Oh dear if that's the case then I definitely don't know what's going on! I think travis does have limited support for ssh'ing into a build, so perhaps their support can be emailed to see if they can help out?

@ehuss
Copy link
Contributor Author

ehuss commented Oct 31, 2018

Update: I'm pretty sure it is unrelated to rust, cargo, or our testsuite. I've been running some benchmarks, and the CPU performance seems to be significantly slower on the 10.13 images. Do you think it would be more effective to contact support via email, or to file an issue on github? Or maybe try something else?

@alexcrichton
Copy link
Member

I don't necessarily have a lot of luck with either, but I've had somewhat more luck personally with the support email. Feel free to cc me on the email too!

@ehuss
Copy link
Contributor Author

ehuss commented Oct 31, 2018

Let me know if there's anything you want to try in the meantime. Some ideas:

  • Go back to xcode9.2. We would need to deal with Timing errors on some cargo tests on macOS #5940. I realized that the xcode9.4 image was just running so slowly nothing finished within a second. Perhaps the tests listed in that issue could be (temporarily?) disabled on Travis OSX?
  • Temporarily disable some tests on Travis OSX. For example, disable cross-compile tests and proptest tests. That would buy just a few minutes, though. Maybe other chunks that have low risk of being osx-specific could be disabled.
    • In particular, I'm curious about the long-term viability of the 32-bit cross-tests, since 32-bit is now deprecated. I actually can't get them to work at all on my 10.14 system (something about missing 32-bit sdk files).

@alexcrichton
Copy link
Member

Given that it seems nothing can land right now it does seem prudent yeah to try something out at least! I think it's fine to disable the cross-test (we should probably just whitelist Linux as doing those) and otherwise I'd ideally advocate for switching back to xcode9.2, but given the existing bugs it may also be fine to simply comment out some long running tests.

Do you have an idea if there are some tests taking much longer than others?

@ehuss
Copy link
Contributor Author

ehuss commented Oct 31, 2018

Unfortunately most of the tests take about the same amount of time. The only outliers are the resolve tests.

A very rough estimate of how long the top 5 modules take on travis (estimating that travis is ~4 times slower than my local system):

  1. build: 6.6 min
  2. test: 6.3 min
  3. build_script: 5.6 min
  4. doc: 5 min
  5. resolve: 4.7 min

The vast majority of resolve's time is spent in 4 tests. The others are slow just because they have a huge number of tests.

Removing the cross-compile tests saves less time than I thought (around 15-20 seconds).

@alexcrichton
Copy link
Member

Bah :(

It may be best for now to switch back to xcode9.2 and just ignore all the failing tests

@Eh2406
Copy link
Contributor

Eh2406 commented Nov 2, 2018

I can scale back the resolve tests if that will fix things. (Ether in general or just on the one platform.) Witch 4 art the big ones? Just a guess but:

  • passes_validation
  • limited_independence_of_irrelevant_alternatives
  • resolving_with_constrained_cousins_backtrack
  • resolving_with_deep_traps

did I guess correctly?

@ehuss
Copy link
Contributor Author

ehuss commented Nov 2, 2018

Here are the top 4:

7.152365 test resolve::resolving_with_constrained_cousins_backtrack ... ok
18.546984 test resolve::limited_independence_of_irrelevant_alternatives ... ok
22.062093 test resolve::passes_validation ... ok
23.042393 test resolve::resolving_with_many_equivalent_backtracking ... ok

Turning them off should save about 4 minutes, so that might allow it to squeak by, but it still might time out occasionally depending on what mood Travis is in. The check to disable I've been using in some of my experiments is cfg!(target_os = "macos") && env::var("CI").is_ok().

bors added a commit that referenced this issue Nov 3, 2018
Fix slow MacOS Travis issue.

OS X 10.13 images on Travis are running very slow and causing timeouts. This PR does two things:

- Use OS X 10.12 (`xode9.2`) which is much faster.
- Implement a change to the testsuite to handle 1-second resolution mtimes on HFS. When a test executes cargo multiple times, and the first run finishes in under 1 second, the second one will think it needs to rebuild because the mtime of the files equals the mtime of the output. This change forces the mtime of every project to be created 1 second in the past. Tests that are still sensitive to mtimes are adjusted on a case-by-case basis.

Closes #6239, Closes #5940
@bors bors closed this as completed in #6254 Nov 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants