[ci] Remove hardcoded test shards #10743

driazati · 2022-03-23T20:45:22Z

This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in conftest.py and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). Some tests are also manually allocated via round-robin to different shards to ensure that long-running tests run on different shards as much as possible.

This only does this for the GPU frontend tests but eventually we could expand it to more.

cc @areusch

areusch

thanks @driazati this looks pretty good! just a couple small suggestions, could also defer if you feel strongly against them.

areusch · 2022-04-06T17:53:19Z

jenkins/macros.j2

+
+{% macro sharded_test_step(name, num_shards, node, ws) %}
+{% for shard_index in range(1, num_shards + 1) %}
+  '{{ name }} {{ shard_index }} of {{ num_shards }}': {


minor nit: want to 0-pad shard_index and num_shards here?

meh, this is intended for humans and everything is text-align: center-ed anyways so 0-padding won't make it easier to read IMO (also we are at like 3 shards max not 10 so we can revisit later)

ok sg. the main intent was that Jenkins sorts these alphabetically iiuc.

ah I see, we should still be ok on that front but would probably need to pad if we shard more than n=9

jenkins/macros.j2

This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more.

driazati · 2022-04-08T17:57:14Z

I also verified the tests ran in the various shards vs the tests from a recent main run, the overall result of tests ran is the same so the sharding is working correctly. Code is here https://gist.github.com/f636700cd68b5717350c107a0eaaee4e for the curious.

areusch

thanks @driazati , couple small comments here and there but could defer them as they're less important than reducing CI time

areusch · 2022-04-08T18:00:16Z

conftest.py

+
+def pytest_collection_modifyitems(config, items):
+    if not all(k in os.environ for k in ["CI", "TVM_NUM_SHARDS", "TVM_SHARD_INDEX"]):
+        # Only apportion tests if in CI and in a job that is set up for it


nit: could log here if CI is present

areusch · 2022-04-08T18:01:20Z

conftest.py

+    "tests/python/topi/python/test_topi_conv2d_winograd.py::test_conv2d_nchw",
+    "tests/python/relay/test_py_converter.py::test_global_recursion",
+]
+HARDCODED_ALLOCATIONS = {}


nit: could do with dict comprehension: HARDCODED_ALLOCATIONS = {i: v for i, v in enumerate(_slowest_tests)}

This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>

driazati force-pushed the generate_shard branch 5 times, most recently from 62aed5c to 9ee0878 Compare March 23, 2022 21:06

driazati changed the title ~~generate shard~~ [ci] Remove hardcoded test shards Mar 23, 2022

driazati force-pushed the generate_shard branch 16 times, most recently from 1676f79 to 05cf7cc Compare March 25, 2022 22:59

driazati marked this pull request as ready for review March 25, 2022 23:00

github-actions bot requested a review from areusch March 25, 2022 23:03

driazati force-pushed the generate_shard branch 2 times, most recently from ea246e1 to e8dcd19 Compare March 29, 2022 20:25

driazati marked this pull request as draft March 29, 2022 20:25

driazati force-pushed the generate_shard branch 3 times, most recently from e44ff6e to a26b0ce Compare March 29, 2022 20:46

driazati force-pushed the generate_shard branch 4 times, most recently from ebc594c to 920b6d7 Compare March 31, 2022 00:21

areusch reviewed Apr 6, 2022

View reviewed changes

driazati force-pushed the generate_shard branch 3 times, most recently from 49ba711 to a69f079 Compare April 6, 2022 18:34

driazati marked this pull request as ready for review April 6, 2022 18:40

driazati requested a review from areusch April 6, 2022 18:40

driazati added a commit to driazati/tvm that referenced this pull request Apr 6, 2022

PR apache#10743 - [ci] Remove hardcoded test shards

54b3a57

driazati force-pushed the generate_shard branch 2 times, most recently from 4b2b890 to 7b79899 Compare April 6, 2022 21:32

driazati added a commit to driazati/tvm that referenced this pull request Apr 6, 2022

PR apache#10743 - [ci] Remove hardcoded test shards

4c40f17

driazati force-pushed the generate_shard branch 3 times, most recently from 4ef21fa to 4c84028 Compare April 7, 2022 19:33

driazati marked this pull request as draft April 7, 2022 19:33

driazati force-pushed the generate_shard branch 2 times, most recently from aa568d0 to 040be72 Compare April 7, 2022 22:06

driazati force-pushed the generate_shard branch from 040be72 to 764b69e Compare April 7, 2022 22:49

driazati marked this pull request as ready for review April 8, 2022 00:58

areusch approved these changes Apr 8, 2022

View reviewed changes

areusch merged commit 0c17f07 into apache:main Apr 8, 2022

driazati mentioned this pull request Jul 14, 2022

TVM v0.9.0.rc0 Release Candidate Notes #12102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Remove hardcoded test shards #10743

[ci] Remove hardcoded test shards #10743

driazati commented Mar 23, 2022 •

edited

Loading

areusch left a comment

areusch Apr 6, 2022

driazati Apr 6, 2022

areusch Apr 8, 2022

driazati Apr 8, 2022

driazati commented Apr 8, 2022

areusch left a comment

areusch Apr 8, 2022

areusch Apr 8, 2022

[ci] Remove hardcoded test shards #10743

[ci] Remove hardcoded test shards #10743

Conversation

driazati commented Mar 23, 2022 • edited Loading

areusch left a comment

Choose a reason for hiding this comment

areusch Apr 6, 2022

Choose a reason for hiding this comment

driazati Apr 6, 2022

Choose a reason for hiding this comment

areusch Apr 8, 2022

Choose a reason for hiding this comment

driazati Apr 8, 2022

Choose a reason for hiding this comment

driazati commented Apr 8, 2022

areusch left a comment

Choose a reason for hiding this comment

areusch Apr 8, 2022

Choose a reason for hiding this comment

areusch Apr 8, 2022

Choose a reason for hiding this comment

driazati commented Mar 23, 2022 •

edited

Loading