-
Notifications
You must be signed in to change notification settings - Fork 46
Test every commit of web-platform-tests within 1 hour #164
Comments
This is totally achievable by sharding the runs across multiple instances. In gecko CI on Linux opt builds are arbitrarily sharded into 19 chunks (12 testharness, 6 reftests, 1 wdspec) and the longest run from a recent m-c commit took 37 minutes (most were under 25; unfortunately there isn't a good way to even out the timings if one directory happens to be particularly slow running). It's literally just about having enough machine resources. Gecko could probably run in 15 chunks without affecting the e2e time too much because reftests are fast; so you are looking at ~60 machines to do Firefox + Chrome + Safari + Edge in this time limit. Having said that I have no idea how this would interact with Sauce; I don't know if they would like us using 30 simultaneous connections, and that's presumably slower than local runs anyway. I don't really understand the travis comment. What's the intent there? |
Me neither, and I'm not taking for granted that we can keep using Sauce, or keep using the same account. If we need to maintain our own infrastructure to achieve fast-enough runs, then that's probably what we'll do. Just using many connections would be the first thing to investigate though.
I think it will look increasingly silly that we have two setups for running tests, which may be subtly different. More importantly, in web-platform-tests/wpt#7073, web-platform-tests/wpt#7475 and web-platform-tests/wpt#7660, what we have is mostly a capacity problem. If we had a way to do full runs in <50 minutes, then Travis could use that. For web-platform-tests/wpt#7475 specifically, if we had very fast results for each commit, then we could possibly use those instead of running the tests without the changes. But that's a bit more speculative. |
So for Travis we don't want to run all the tests for each PR, but we want to run each modified test in a way that exposes stability issues. So on one hand I agree that having a way for travis to delegate that work to a larger pool of machines under the For web-platform-tests/wpt#7475 I think that using the day-old wpt-fyi results is better than adding extra travis load on each push since it will only make a difference in edge cases (where tests are changing rapidly). |
Suggestion: we could set up a sharded Travis CI or Circle CI run as a non-blocking builder for all WPT PRs. Travis CI has a build matrix limit of 200. So hypothetically if we needed 60 machines as @jgraham said we could fit under that limit. I'd lean toward Circle CI though so we have the option of expanding to more browsers & builds. Circle CI also natively supports sharding.* Looking at how some other large OSS projects with big test suites on GitHub solve this issue, here's a sorta random probably biased sample:
Slightly related: imho I think it's a prerequisite for all of these options that we containerize the builds. I've worked on this in #153 and have already moved the Firefox cron job over to using the container. One benefit of containerization once we start sharding is that shard startup time should be super quick since the shards won't have to generate the manifest from scratch or clone the whole repos. *It looks for parallelism Circle CI exposes the env vars |
On travis, at least, there's a limit to how many concurrent machines we get. Setting up 60 instances wouldn't help if only one job ran at a time. I don't know what the situtation with other providers is, but it seems unlikely anyone is going to give us that kind of resources without a special arrangement likely involving money. I think the three options are probably:
The last is particularly unappealing ;) Independent of that, containering the builds seems reasonable, but whether it cuts down on setup time is at least a little unclear and depends on how caching works. Taskcluster uses docker for everything on Linux, but whenever a new instance is provisioned there's a noticable setup time to download the image. And then the VCS checkout happens inside the container, it's not a static part off the container. So I'm not sure exactly what you are imagining, but it's not entirely clear that e.g. generating a new container per run is viable (and I have other plans to make the manifest in particular faster to generate by downloading a cached copy). |
Update: I've been doing some thinking along these lines and as an intermediary solution and a step up from what we currently have, I set up a Jenkins cluster on GKE: https://ci.wpt.fyi. I migrated Edge yesterday (see currently running build) and hope to migrate Safari and FF soon. This solves the following problems:
|
web-platform-tests/wpt#8063 is a good example of why we need full runs to be fast enough to be done in Travis. Currently the only way I have to be confident in such a change is to run it through Chromium's bots, and it was rather a lot of work. And it still wouldn't catch if the change broke everything in Safari, for example. |
I'm not sure that is a good example of why we need full runs in travis, vs an example of a missing rule in the logic that detects relevant changes to test for a build. We certainly aren't going to be able to stability check with full runs, and for most things running every test is massive overkill. I agree that the ability to request full runs for PRs where we think the changes are substantial would be a good improvement. |
Filed a bug for that. But as we keep improving those rules, more and more PRs will (correctly) run so many tests that they'll time out and fail. I don't know if the sum of IDL tests are past that threshold. |
Closing this, just like #108. wpt.fyi results are available within 1 hour is still our objective, but doesn't make sense to track as a monolithic goal here. (In this repo we should track cycle time, and 30 minutes would be required to consistently have latency below 1 hour. 1 hour cycle time should give a mean latency of 90 minutes.) |
A more aggressive goal than #108.
I have found myself recently making changes and waiting for them to show up on the dashboard, like in web-platform-tests/wpt#7758 (comment). Based on the fact that Chromium and Gecko runs most of the tests as part of their own CI and waterfall, it should be well within reach for the web-platform-tests dashboard to run all of the tests for every commit.
In the past 6 months, there have been on average ~9 commits per day. (Using
--first-parent
, because we would only test things that master has pointed to.)This would require some kind of sharding to make it always run fast enough.
If we can get runs down to <50 minutes, it means the we could also use the same running infra for Travis CI, and deal with the worst case of having to run every test.
@mattl @lukebjerring @jgraham
The text was updated successfully, but these errors were encountered: