-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New scheduler for distribution of groups of related tests #191
Conversation
hi, i did a initial skim and this looks great, its a nice improvement 👍 and def should go insoon we should spend a bit of time to discuss/prepare for different definitions of suite, as for some people it more important to partition based on a session scoped parameterized fixture for example however currently xdist does not pass on parameter values and scopes when informing the master of tests - so this may turn out as a unnecessary/bad exercise |
Definitely, thanks @carlos-jenkins for the PR. I will take a closer look ASAP, possibly today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good at first sight. Main thing I would say is that naming things is hard. pytest never really defined what a suit is and searching the current pytest.org for "suit" seems to suggest it gets used very differently. Internally pytest calls these collectors but when using --collect-only you see them as , etc so not really sure what to suggest.
Also, tests would be nice of course :)
xdist/dsession.py
Outdated
@@ -791,3 +366,6 @@ def pytest_testnodedown(self, node, error): | |||
# def pytest_xdist_rsyncfinish(self, source, gateways): | |||
# targets = ", ".join(["[%s]" % gw.id for gw in gateways]) | |||
# self.write_line("rsyncfinish: %s -> %s" %(source, targets)) | |||
|
|||
|
|||
__all__ = ['DSession'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too keen on starting to add __all__
variables to xdist modules. These are all internal modules and maintain no outward API so I don't see what benefit this brings.
This comment obviously applies to everywhere __all__
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't think xdist is mature enough to promise any API other than what we promise through hooks.
xdist/scheduler/loadsuite.py
Outdated
|
||
class LoadSuiteScheduling: | ||
""" | ||
Implement load scheduling, but grouping by suite, across nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny irrelevant nitpicking, but this probably wants to be on the line above with the opening quotes as is done in the other modules. Other docstrings below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the need of consistency in the codebase and be PEP 257 compliant. Will change it.
xdist/scheduler/loadsuite.py
Outdated
if log is None: | ||
self.log = Producer('loadsuitesched') | ||
else: | ||
self.log = log.loadsuitesched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure what object is supposed to be passed, but this looks a little suspicious to me. If you're confident it's right then never mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident about it either. I just followed the structure of the others schedulers to be sure:
pytest-xdist/xdist/dsession.py
Line 37 in 144b37a
self.log = log.eachsched |
pytest-xdist/xdist/dsession.py
Line 193 in 144b37a
self.log = log.loadsched |
Documentation says:
:log: A py.log.Producer instance.
Which is this one (and marked Deprecated):
It seems that the .
(__getattr__
) triggers the creation of a new logger using the attribute name as argument to the constructor.
I'll prefer to leave it for consistency with the others schedulers.
Also, as the others said: great work and thanks for the PR! |
As a follow up on the whole suit thing and Ronny's comment on it. Just to get some random (maybe useless) ideas started, how about considering a command-line option which takes a list of the collection IDs which should be run on one node. That is if you have a structure like |
we should makr this as experimental and break it if the need arises (reminds me that we should settle experimentation propperly) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All in all excellent work! 👍
I left some comments for discussion.
xdist/dsession.py
Outdated
@@ -791,3 +366,6 @@ def pytest_testnodedown(self, node, error): | |||
# def pytest_xdist_rsyncfinish(self, source, gateways): | |||
# targets = ", ".join(["[%s]" % gw.id for gw in gateways]) | |||
# self.write_line("rsyncfinish: %s -> %s" %(source, targets)) | |||
|
|||
|
|||
__all__ = ['DSession'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't think xdist is mature enough to promise any API other than what we promise through hooks.
xdist/dsession.py
Outdated
'load': LoadScheduling, | ||
'loadsuite': LoadSuiteScheduling | ||
} | ||
return schedulers[dist](config, log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should make this experiment as a separate plugin instead? pytest-xdist-loadsuite
for example. The addition of pytest_xdist_make_scheduler
was done exactly to allow this kind of experimentation. 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to you, no problem from my part either. I really think this would add value to the xdist core, but as you prefer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it seems this is something everybody wants, so let's continue the discussion. Did you read the comments about using a marker to customize this behavior? Do you have an opinion on that?
xdist/scheduler/loadsuite.py
Outdated
self.numnodes = len(parse_spec_config(config)) | ||
self.collection = None | ||
|
||
self.workqueue = OrderedDict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the question regarding using OrderedDict
vs Python 2.6: IMO it would be acceptable for py26 users to get an error message if they try to use this new scheduler: "loadsuite scheduler is not supported in Python 2.6" or something like this. I think this is acceptable because py26 users will still have access to this plugin and the other schedule modes, plus we probably will drop support for py26 somewhat soonish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another and simpler alternative is to depend on ordereddict
from PyPI for py26 only as we did recently in pytest-dev/pytest#2617.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Modified the setup.py to add the dependency conditionally.
xdist/scheduler/loadsuite.py
Outdated
(...) | ||
} | ||
|
||
:assigned_work: Ordered dictionary that maps worker nodes with their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great docs!
xdist/scheduler/loadsuite.py
Outdated
self.registered_collections[node] = list(collection) | ||
return | ||
|
||
# A new node has been added later, perhaps an original one died. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this happens, shouldn't we register the new node and its collection, like it is done in line 225
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I mistranslated from the load scheduler. Will fix it.
xdist/scheduler/loadsuite.py
Outdated
self.assigned_work[node][suite][nodeid] = True | ||
self._reschedule(node) | ||
|
||
def _asign_work_unit(self, node): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: assign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
xdist/scheduler/loadsuite.py
Outdated
assigned_to_node[suite] = work_unit | ||
|
||
# Ask the node to execute the workload | ||
worker_collection = self.registered_collections[node] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I'm missing something, but wouldn't be simpler to keep a single dict mapping nodeid to its completed status instead of each node has its own separate data structure? We already guarantee that all workers have collected the exact same items, so we can use that assumption to drop registered_collections
. Less data structures to keep things in sync the better IMHO.
I think you can change self.collection
into a dict of suite -> list of item ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the life cycle of this data structures is a little complex. In particular, registered_collections
is filled on initialization, while the nodes start the structure is being constructed. On the other hand, collection
acts as a Boolean to say that both the collection of tests in all nodes was completed and that the initial distribution of work as been done. You cannot overload the semantics of both into one.
In the particular line of code that you commented on we can change worker_collection = self.registered_collections[node]
with self.collection
without problem, both will be equivalent, but I preferred to grab the index from the node itself for correctness and to improve understanding that this needs to happen after the node was registered (independently if it was registered on initialization of later).
I think you can change self.collection into a dict of suite -> list of item ids.
We could, but because of what I wrote in the first paragraph you will still need a structure to build while initializing, and a second one / boolean to indicate initialization is done, and if you look there are other utilitarian uses of self.collection
that justify its existence.
xdist/scheduler/loadsuite.py
Outdated
self._reschedule(node) | ||
return | ||
|
||
# XXX allow nodes to have different collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can probably e removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
xdist/scheduler/loadsuite.py
Outdated
|
||
# Avoid having more workers than work | ||
if len(self.workqueue) < len(self.nodes): | ||
# FIXME: Can we handle this better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we shutdown the nodes in excess instead of raising an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I do that?
node.shutdown()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested calling shutdown
on the extra nodes and it seems to be worked correctly:
72ac6a3#diff-5dfc52f66c10897e228ed2af4fc310ceR351
The nodes don't die until the suite has finished but it seems they die ok.
I'm not sure, this kind of information seems to me that it should be defined in the test themselves, not controlled by the command line. How about we use a mark to group tests, for example Thinking a little further, having the |
You're entirely right, markers are a much better way to go about this. |
337eadc
to
a7e68f9
Compare
Agree. "Suite" has different meanings depending on the context and the person you ask. I'll change the name of everything to
Implementing grouping either using a command line option or markers put a lot of burden to the user. Consider the options:
This will imply to modify the tox file, jenkins file, or whatever triggers the execution of the tests each time a test is added. In our case, we have many colleagues working and adding tests everyday. Not only the command line option will become very very very large, it will become unmaintainable.
This will also imply that all users must add the In addition, we currently do not pass more than the list of collected tests between nodes, adding introspection to the tests to check if they are marked will imply a large change on the architecture of how xdist currently work, and doesn't add that much value. So, in order to make it "generic" we are putting a lot of burden to the user. Let's return back to what we are trying to solve. #18 list 3 use cases:
The third I'll discuss it later. But the first two is basically the reuse of fixtures. Currently pytest have 4 scopes for fixtures:
Let's analyze them one by one:
In consequence, grouping by fixture scope is completely supported with this scheduler. So, what about the third use case?
Which is arbitrary grouping, and will then imply markers, that is what you propose. My first impression is, do we really need that? Is that a real use case? For what reasons do we need it, other than hypothetical? Plus #18 suggest a solution:
Which means that you must use a fixture to group them. A scoped fixture, either by class or module, thus basically implying that we need grouping with those scope. Which is what this PR implements. I added to the example suite the
Because this algorithm groups by the first My 2 cents. Regards and thanks for comments. |
8364dcf
to
da48e21
Compare
Hi, I just pushed a version with corrections for all the feedback provided, even changing the name to 94c1425#diff-13d1cb017e0edf623ce2c3a8a49f9569R278 I changed the With this, I think only two things are missing:
Please , let me know if you want to move this implementation forward. Thanks. |
da48e21
to
b2b9d74
Compare
Hi @carlos-jenkins, First of all thanks for the detailed response. We appreciate both your efforts on this PR as well as your patience on following up in the discussion! 👍
You are absolutely right that forcing users to manually keep track of the marking is a huge maintainability problem. I should have mentioned that this marker can be done automatically by a simple hook: def pytest_collect_modifyitems(items):
for item in item:
suite_name = item.module.fspath
item.add_marker(pytest.mark.xdist_group(suite_name)) The same can easily be extended to group by class as well, or any other way a user needs. Also, I'm not suggesting that users should implement this hook themselves, but by implementing the distribution scheduling in a more general way we can provide builtin options in xdist which use the idea of the hook above to set markers to items automatically, while allowing users to implement more complex scheduling themselves if they want to. For example, we could create a command-line option which controls how it should group tests by default:
and so on.
Indeed we would have to change some things on the architecture, but unless I'm mistaken they would be minimal changes: currently the workers send just a list of item ids to the master; to be able to handle the marker idea, we would have to send in addition to that a dict of "test id -> marker group name". The workers can easily obtain the value of the Now that I think more about it, we don't even need a separate scheduler, we might be able to change the default
This is only true of the session fixture is used by all the tests, like an Here's an use case based on a session-scoped fixture: Suppose you have a session fixture which is very costly to create (it starts up a server inside a dock container) but this session fixture is only used by a small portion of your tests (say 10%). In the current state of things, if you spawn 8 workers chances are that you will endup spinning 8 of those fixtures because the tests which use this fixture will most likely go to separate workers. With the def pytest_collect_modifyitems(items):
for item in item:
if 'docker_server' in item.fixturenames:
item.add_marker(pytest.mark.xdist_group('docker-server')) So all tests which use this fixture (whenever they are defined) will go to the same worker, spinning up a single server. Another example: In a test suite a few tests (8 out of 400) take a very long time (8+ minutes). Those slow tests don't have much in common, they are slow because they are running some numerical simulations which due to their parameters do take a lot of time. When we initially put them in xdist, unfortunately it happened that a lot of those tests would end up in the same worker. For example, it might happen that 6 of those tests to land on the same worker because xdist currently is not deterministic when deciding which test goes where, so a single worker might end up taking 48min (8x6) to finish, meanwhile the other workers will have finished the tests long after that. By marking them with a custom mark (say def pytest_collect_modifyitems(items):
numprocesses = items[0].config.numprocesses
current_worker = 0
for item in item:
m = item.get_marker('very_slow')
if m:
item.add_marker(pytest.mark.xdist_group(f'worker-{current_worker}'))
current_worker += 1
if current_worker >= numprocesses:
current_worker = 0 This will distribute evenly the slow tests across the available workers. The examples above all came from work, and we had to implement ugly hacks to get around them. @carlos-jenkins I hope I'm not coming across as trying to shoot down your efforts, I'm just trying to brainstorm with everyone involved here to see if perhaps we can get to a better/more general implementation. Guys, If we can't get to a consensus now (I mean everyone involved in the discussion), I would rather us work towards merging this PR as it stands because it will improve how things work currently. Also, I think it would be possible to implement this idea of using markers afterwards with some internal refactoring and also without breaking the behavior being introduced here. Again @carlos-jenkins thanks for your patience! |
@carlos-jenkins thanks for your updates, in near future i#d like to get interested people together for changing the shape currently imalready preparing to shed xdist from boxed/looponfails and setting up more comprehensible statemachines for xidst, afterwards i'd like to extend sheduling for per file, per fixture and similar details |
Hello, somehow I don't find this outcome very satisfying. Here we had code that works right now and solves a sub-set of the problem, so can't we find a way to use it? We all have grand plans for many things but realistically also know they only get realised rarely. If we where to add this code as-is to the core now we'd have a new scheduler type. We'd just need to be a little careful about naming it but it would work just fine. It is my understanding that if we then later add a mark-scheduler we do not have to do anything else and there would be no conflict. So I'm not actually sure there's any practical downside to adding this now at all. Did I overlook something? An alternative approach which was mentioned I think and may not be too much extra work is to turn this scheduler into a plugin. I'd then update the standard documentation to point to this plugin and it would would be a great example for other people who want to customise their own scheduling. Again I've not looked into the practicalities of this, but what are the changes needed to turn this into a plugin? Cheers |
I agree, it was not my intention to just shut down the PR at all. I believe we can add this to the core and refactor things later to support the mark-based distribution idea. Creating a separate plugin with this functionality is also possible I think. |
(Oops clicked on the wrong button) |
then lets flag it as experimental and merge ? @carlos-jenkins would you be ok with such an integration? |
@RonnyPfannschmidt sure, for me this is fine. We can pin the hash of the merge in our |
Thanks @carlos-jenkins Before release, how about we create a separate option So this:
Becomes:
What do you guys think? |
I think that is perfect @nicoddemus. Just a detail, the current implementation groups by the last |
since this works right now i propose merging first as an experimental feature, then pushing changes on top |
I think we should at least add some functional tests and a changelog entry presenting the feature before merging. |
Oh right. Hmm not sure how to name it then, Or we can leave as But as @RonnyPfannschmidt mentioned, we can leave the bikeshedding of the option name to later. |
On 31 July 2017 at 18:37, Bruno Oliveira ***@***.***> wrote:
Before release, how about we create a separate option --group-by, which
is used when --dist is load? By default, group-by would be none which
sends tests individually to workers, and group-by=module provides the
functionality of this PR. Having a separate option would allow us to extend
the option to support grouping by classes in the future.
So this:
pytest --dist=loadsuite ...
Becomes:
pytest --dist=load --group-by=module ...
What do you guys think?
I'm not very keen on this. It makes more assumptions that the next
scheduler feature will want different kind of grouping. If we do this as
just a new scheduler name, e.g. `--dist=loadbunch` or so, then we don't
introduce anything new to any other part of xdist. If we later add another
grouping and it still makes sense to add `--group-by` then we can add it at
that point.
Anyway, happy to be outvoted though ;-)
|
No problem in adding this later, I guess it is acceptable we break the UI later for an experimental feature. 👍 |
On 31 July 2017 at 21:27, Bruno Oliveira ***@***.***> wrote:
No problem in adding this later, I guess it is acceptable we break the UI
later for an experimental feature.
This wouldn't break the API in the future either, the feature would only
be more then one way to activate the feature. Both `--dist=load
--group-by=xxx' and `--dist=loadxxx`.
|
@flub you are right, a separate |
Hi guys, I'm writing an acceptance test for this feature so we can get it in, but I'm not getting the results I was expecting. Consider 3 test files, import pytest
@pytest.mark.parametrize('i', range(10))
def test(i):
pass Running:
I get this output:
As can be seen, tests from |
Hi @nicoddemus Thanks for the effort on testing this. I've noted that you're using
The correct way to call it is with:
You should be able to read:
|
Rá, indeed! Thanks! I will push my tests to your branch soon. 😁 |
Pushed two very basic functional tests for I didn't add any docs yet. |
If you guys are OK with the implementation so far I guess we can merge it now that we have some regression tests in place. |
@nicoddemus thanks for tieing up @carlos-jenkins fabulous work we now need to decide how to document this as experimental and how to communicate/implement eventual changes on it |
Hi, this PR is a possible fix of #18 (and its duplicate #84).
As stated in #18, current implementation of the
LoadScheduler
distributes tests without taking into account their relation. In particular, if the user is using a module level fixture that performs a large amount of work, distribution in this manner will trigger the fixture in each node, causing a large overhead.In my case, we use the Topology Modular Framework to build a topology of nodes. When using the
topology_docker
builder, each module will start a set of Docker nodes, connect them, initialize them, configure them, and then the testing can start. When running on other builders, we have an even larger overhead for building and destroying the topology.With this new scheduler, the tests will be aggregated by suite (basically anything before the
::
in thenodeid
of a test. I called this chunks of related tests "work units". The scheduler then will distribute complete work units to the workers, and thus, triggering the module level fixtures only once per worker.We are running a test suite of more than 150 large tests in series that is taking 20 hours. We tried running it with xdist
LoadScheduler
at it took even more (30 and then we stopped it). With this change, we are able to scale the solution, and with just 4 workers we were able to reduce it to 5 hours.I've included an example test suite using the loadsuite scheduler in
examples/loadsuite
that shows 3 simple suites (alpha, beta, gamma), any taking a total of 50 seconds to complete. And the results are as follows:This PR still requires a little bit of work, which I'm willing to do with your guidance. In particular:
OrderedDict
s to keep track of ordering, but it isn't available on Python2.6. There is a PyPI package for Python 2.6 and below that includes it but I was hesitant to add it, or maybe conditionally add it for Python 2.6 only.len(workers) > len(suites)
. Currently we are just raising aRuntimeException
.Thank you very much for your consideration of this PR.