-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] Remove hardcoded test shards #10743
Conversation
62aed5c
to
9ee0878
Compare
1676f79
to
05cf7cc
Compare
ea246e1
to
e8dcd19
Compare
e44ff6e
to
a26b0ce
Compare
ebc594c
to
920b6d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @driazati this looks pretty good! just a couple small suggestions, could also defer if you feel strongly against them.
|
||
{% macro sharded_test_step(name, num_shards, node, ws) %} | ||
{% for shard_index in range(1, num_shards + 1) %} | ||
'{{ name }} {{ shard_index }} of {{ num_shards }}': { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: want to 0-pad shard_index and num_shards here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meh, this is intended for humans and everything is text-align: center
-ed anyways so 0-padding won't make it easier to read IMO (also we are at like 3 shards max not 10 so we can revisit later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sg. the main intent was that Jenkins sorts these alphabetically iiuc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, we should still be ok on that front but would probably need to pad if we shard more than n=9
49ba711
to
a69f079
Compare
4b2b890
to
7b79899
Compare
4ef21fa
to
4c84028
Compare
aa568d0
to
040be72
Compare
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more.
I also verified the tests ran in the various shards vs the tests from a recent main run, the overall result of tests ran is the same so the sharding is working correctly. Code is here https://gist.github.com/f636700cd68b5717350c107a0eaaee4e for the curious. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @driazati , couple small comments here and there but could defer them as they're less important than reducing CI time
|
||
def pytest_collection_modifyitems(config, items): | ||
if not all(k in os.environ for k in ["CI", "TVM_NUM_SHARDS", "TVM_SHARD_INDEX"]): | ||
# Only apportion tests if in CI and in a job that is set up for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could log here if CI is present
"tests/python/topi/python/test_topi_conv2d_winograd.py::test_conv2d_nchw", | ||
"tests/python/relay/test_py_converter.py::test_global_recursion", | ||
] | ||
HARDCODED_ALLOCATIONS = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could do with dict comprehension: HARDCODED_ALLOCATIONS = {i: v for i, v in enumerate(_slowest_tests)}
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in `conftest.py` and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). This only does this for the GPU frontend tests but eventually we could expand it to more. Co-authored-by: driazati <driazati@users.noreply.github.com>
This moves the sharding logic from being inlined in the Jenkinsfile to templated, so we can change just the number of shards and the test allocation in
conftest.py
and the Jenkinsfile will work to match. This also changes the test allocation from a manual balancing before to be random between shards. Each shard needs to know only its shard number and the total number of shards, then it hashes each test and skips it unless that hash falls within its allocated tests. This breaks up related tests across shards but has the downside that any change to the number of shards will shuffle around where the tests end up (but ideally this is rare as we settle on a good number of shards to use). Some tests are also manually allocated via round-robin to different shards to ensure that long-running tests run on different shards as much as possible.This only does this for the GPU frontend tests but eventually we could expand it to more.
cc @areusch