Review before/after feature behavior #124

gasparnagy · 2024-05-10T09:49:40Z

gasparnagy
May 10, 2024
Maintainer

It seems that the current before/after feature behavior has some strange aspects, so this discussion can be used to decide on what we should do with this feature.

Here is the current behavior (discussed at #119, but reviewed).

Note: I will refer often to the term test-thread below. As the name is a bit misleading, here is it's definition: Reqnroll somehow assigns the test execution requests to test-threads. These are basically logical flows (not physical threads!) and Reqnroll ensures that the test execution within the same test-thread are not started parallel. (How it ensures that is out of scope here, and it might depend on the test runner framework, but this is given.) Of course in an unlucky situation it can happen that every scenario will be executed in a unique test-thread, so there is no guarantee that is is "optimal", but usually it is.

The ITestRunner interface is the representation of the execution methods of a particular test-thread, so we have as many TestRunner instances as the number of test-threads.

Currently the hooks are triggered by the class-level setup/teardown methods of the test runners (MsTest, xUnit, NUnit). These methods are static (or have static behavior).
For Reqnroll to behave correctly, the before/after feature hook must be run on the same "test-thread" as the related scenario execution and hooks. (Otherwise some stange null ref exceptions come.)
As the test runners might use different tactics to run the class-level setup/teardown methods and the tests, especially in parallel situations, it is difficult to ensure that Reqnroll assigns these executions to the same test-thread. The way is slightly different for each test runner, but mainly it works in a way that during the class-level setup method it saves the ITestRunner to a static field and this is reused for the test execution.
MsTest has a special handling of the class-level teardown method, because it might be called delayed, so a scenario of the next class might come faster to the test-thread than the teardown of the previous one in some situations. To handle this, there is a special trick implemented in TestExecutionEngine.OnFeatureStartAsync: if the previous one was not finished, we finish that first.

So altogether the behavior is pretty complex and it depends heavily on the behavior of the test runners (that might even change).

There are some important consequences of the fact that Reqnroll requires the before/after feature hook to be run on the same test-thread as the related scenarios:

The before/after feature hooks of a concrete feature might be executed multiple times (if the test runner randomizes the scenarios across feature files).
When the tests are running parallel, the before/after feature hooks of a concrete feature might be executed parallel to each-other (e.g. if a scenario from that feature comes in in multiple test-threads)
The current behavior on the other hand ensures that before the execution of any scenario, the before feature hooks have been called and that the after feature hooks will be called after.

Originally, the before/after feature hooks have been designed for caching some automation infrastructure elements that are expensive to initialize and related to a particular feature file (not easy to find a good example though).

With the modern test runners (parallel, randomized order), this benefit is quite limited, because usually the scenarios of the same feature file are anyway not running in a sequence block. But of course this depends on the runner configuration as well.

//CC @Code-Grump @obligaron

gasparnagy · 2024-05-10T10:01:49Z

gasparnagy
May 10, 2024
Maintainer Author

My goal would be that we remove the dependency of the test runner's (MsTest, xUnit, NUnit) behavior for the feature hooks. This could be achieved in the following way:

We remove (obsolete) the OnFeatureStartAsync, OnFeatureEndAsync methods from ITestRunner
We pass in the FeatureInfo together with the ScenarioInfo to the runner in the OnScenarioInitialize method.
Reqnroll itself will detect if the "current feature" has changed on the current test-thread and fires the before/after feature hooks
At the end, when all test execution finished (just before the AfterTestRun hook), we invoke the after feature hooks of all "remaining" open test-threads (this depends on the fix of Dispose method is not called on objects registered for the TestThread container #123).
Optionally we can have some idle detection for the test-threads and if there is no further scenario execution started for a test-thread for some time, we close the test-thread and run the related after feature hook (even earlier then AfterTestRun). This might be useful for parallel execution, because usually at the end only a few longer test runs.

My other idea would be to deprecate this feature. The behavior can be simulated with before/after scenario anyway if necessary.

What do you think?

1 reply

Code-Grump May 10, 2024
Maintainer

I would really like to change our nomenclature to not use the word "thread", as it's definitely easy to think it is relating to execution threads (an already confusing area.) I believe I understand it to be equivalent to an execution context, but even then I'm not certain of a test-thread's lifecycle. I will avoid using the word "thread" as much as possible to reduce ambiguity.

A clear explanation of what "test-thread" encapsulates (either here or in a separate discussion) would be appreciated.

I haven't personally seen a practical use for feature-level hooks, but they are analogous to class-level hooks prevalent in all major test-runners, so I have to believe they're useful to somebody. Perhaps they'll be useful for us if we wanted to create a project on-disk and then run a number of scenarios before cleaning up the project at the end of the test.

The complexity in this discussion is definitely concurrency, where the feature level blocks need to maintain guarantees of:

Before Feature runs once for a given feature before any of the feature's scenarios are run, but after the Before Test Run hooks
After Feature runs once for a given feature after all the feature's scenarios have run, but before the After Test Run hooks
No guarantees can (or should) be made about exactly when these things will happen, just that they happen relative to other important events.

I don't believe the idea of "changing feature" really doesn't make sense in this model, where all features could execute in parallel, in serial, or some interpolated mixture. Additionally, the end-user is free to run one test, or all the tests, or all the tests containing the letter 'z'. We're coupled to the execution engine of whatever test-framework we run on top-of, so we really need to be flexible enough to handle the full spectrum that the end-user might want to configure.

Importantly, the context created at the feature level needs to flow into the scenarios. Instances registered with the feature-scoped container should be available for all the scenarios via their containers (whilst still maintaining a separation between the scenario-scoped services.)

My thoughts run counter to yours @gasparnagy where I think the answer lies in tighter integration with the test-framework, relying on their established hooks to run at the times when they're expected to, and using appropriate mechanisms to flow our execution context between those stages. In some runners that might be as simple as leveraging AsyncLocal or ExecutionContext, in others we'll have to grab their native contexts and propagate that way.

The test-runners I'm familiar with already make the guarantees we need about when hooks run in relation to other important hooks, so re-engineering that seems like we're not taking advantage of that established and relied-on work. I may have a naive view of the problem and not understand why it's difficult, but in my mind I feel like we "only" need to propagate a context object through the test-runner and call its methods at the appropriate times.

gasparnagy · 2024-05-10T15:43:12Z

gasparnagy
May 10, 2024
Maintainer Author

I would really like to change our nomenclature to not use the word "thread", as it's definitely easy to think it is relating to execution threads (an already confusing area.) I believe I understand it to be equivalent to an execution context, but even then I'm not certain of a test-thread's lifecycle.

I understand your concern and I am fine finding another name, but "execution context" is a term used for async and it is not exactly the same as ours (although it is similar) so that would be confusing as well. So if you have a good suggestion please share it. I will also think on that. Unfortunately it is also in the API (like TestThreadContext class), so renaming is a bit complicated, but not impossible.

I haven't personally seen a practical use for feature-level hooks, but they are analogous to class-level hooks prevalent in all major test-runners, so I have to believe they're useful to somebody.

Code based test tools, you can declare and use fields, so there the class-level hooks can be used to initialize/manipulate these. In Gherkin files you cannot declare variables and the steps are not tightened to a particular feature file, so therefore the usage is more limited and the analogy does not work fully.

Before Feature runs once for a given feature before any of the feature's scenarios are run, but after the Before Test Run hooks

Unfortunately this is not enough. The example usages of the before/after feature hooks that I have seen (before parallel times) was that they

updated the database with some records in before feature in order to provide the necessary base data for the scenarios
logged in the user in the browser in before feature, because all scenarios needed a logged in user

I'm not saying that these are good usages, but I have seen people using it this way. For these cases it is not enough if the before feature runs some time (but not necessarily directly) before the first scenario in that feature file, but they need to be run directly before the first scenario.

If we want to apply a more relaxed specification (with the once before), we should find a good use-case where this is useful for otherwise we just break the current one without adding a benefit. And this is where I usually get stuck. I can't really find a good use-case for before/after feature that makes sense in modern execution environments (parallel, random order).

I don't believe the idea of "changing feature" really doesn't make sense in this model, where all features could execute in parallel, in serial, or some interpolated mixture. [...] I think the answer lies in tighter integration with the test-framework, relying on their established hooks to run at the times when they're expected to, and using appropriate mechanisms to flow our execution context between those stages.

I also don't have a strong opinion here.

But I think tighter integration with the test-framework is what we have now. We rely on the test-framework infra, applied the workarounds for the unmatching behavior or MsTest and need to store the test runner in a static field to ensure that the feature context and the scenario context are linked (they need to run on the same test runner instance).

So the question is that if we keep the tighter integration with the test-framework, how could we improve the current situation? Do you have some ideas?

12 replies

gasparnagy May 13, 2024
Maintainer Author

my mental model first came up with Test Thread and Feature Context being inverted to what it is now

You are not alone with this, I guess. But it the other relation (global - feature - test-thread (worker) - scenario) would be even more problematic from the implementation point of view, because we could not maintain a strict parent-child relationship between the contexts.

Might be just me, but I'd avoid that altogether and use a pool as well. I've actually used a singleton pool in the past for this.

I can just answer the same as I did to @Code-Grump. It is absolutely fine, if you use other sorts of ways to manage your resources. And the same applies to the feature context and the feature hooks - this can be also implemented with a hash table indexed by the feature name.

But we should not forget that you (we) are probably power-users of Reqnroll, so not necessarily representing the every-day usage. I do training and coaching for BDD, so I see usages in the field. How much we should simplify the things at the price of making magic black box solutions is a good question of course though (that I keep hesitating about all the time). My vision was always that SpecFlow (Reqnroll) is a tool only. It should not dictate the usage, but provide enough (safe) help for the teams to improve themselves. So providing "training wheels" here and there is not necessarily bad as long as it does not block the advanced usages.

Code-Grump May 13, 2024
Maintainer

In the sprit of making things easier for users, perhaps the most "friendly" thing we could do is looking at implementing an easier resource-management system with a "pooled" lifecycle type?

Singleton / Scoped
Transient
Pooled

A pool implementation isn't hard to write with "lease" semantics and we get to bake that into our DI system transparently, since we know when a consumer's lifecycle has ended and the leased object can be released back to the pool.

At some point I'd like to have a look at whether we still actually need the test-thread for our implementation. If we can remove our dependency on it, it might make our own execution internals easier to work with and opens the way for considering obsoleting it in favour of concepts that are less-coupled to an execution model. There's definitely a cognitive load with having to think about execution models and resource-lifetimes together that I would personally like to remove.

gasparnagy May 13, 2024
Maintainer Author

At some point I'd like to have a look at whether we still actually need the test-thread for our implementation.

Yes. I will make a review of this.

gasparnagy May 13, 2024
Maintainer Author

At some point I'd like to have a look at whether we still actually need the test-thread for our implementation.

I did the analysis now.

As far as I see, for our implementation we really need this only to ensure the behavior of the before/after feature hooks. So we are circling back to our original problem. I think I have a better understanding now, so let me write it down before I forget.

It became pretty long at the end, so made it a top level comment, see below.

obligaron May 13, 2024
Maintainer

I tried to replace the workaround for MsTest in #128.

Thanks @gasparnagy for the explanation. Now I understand more what a "test-thread" is. 🙂

If we don't guarantee what "test-thread"/TestThreadContext a scenario/TestRunner is using, we could use pooling on our side (like the xUnit runner). The only difference to the current behavior would be that a TestThreadContext can change when running different tests (we could also remove this by reusing the TestThreadContext from the feature level, if it's not used yet).

gasparnagy · 2024-05-13T13:29:41Z

gasparnagy
May 13, 2024
Maintainer Author

Yet another try to describe the before/after feature hook problem after some analysis.

Let's start with the specification of the before/after feature hooks: The before/after feature hooks are directly before and after a set of scenarios that are placed and only placed in the same feature file. The hooks can use a storage (FeatureContext) that has the lifetime between the pairing before and after feature hook. A consequence of this is, that the hooks of a particular feature might be executed multiple times (and therefore have multiple FeatureContext) if the scenarios of that feature file are not executed in a single block.

This specification requires a local sequence (in order to identify the "set of scenarios that are placed and only placed in the same feature file") and that requires the test-thread (worker) concept at the end.

The technical implementation of the feature hooks currently:

In order to trigger the before/after feature hooks we used the class-level setup/teardown infrastructure of the test execution frameworks, but this works differently for all 3 frameworks (and probably also based on what parallel model you use with them).

One of the key challenges here is how we connect the data that the user saved in the before feature hook to the execution of the scenario. So far (at least for xUnit and MsTest) we used static fields to store this data (by saving the TestRunner into a static field - and TestRunner is unique per test-thread (worker)). (The MsTest "ClassInitialize" method has to be static, so there are not much other options we had.)

This technical implementation was good as long as the parallelization was per-class at max, because the test executions of the same class could happily work together on the same static TestRunner. This is obviously not true for test-level parallelization.

Side note: The PR #119 of @obligaron uses a separate static field for storing the TestRunner for the before/after feature hook, but this is not sufficient, because is might happen that the data you save into the FeatureContext in the before feature hook is not "visible" during the scenario execution.

Maybe too early to conclude here, but I can see the following possible options:

We keep the before/after feature hook spec (and hence the test-thread (worker) concept), but do not trigger these from the test execution framework class-level setup/teardown infrastructure, but doing it "on demand" during test execution. With that we remove the need to pass data between the class initialize method and the test method.
We change the before/after feature hook spec in a way that it does not offer a direct before/after execution and it runs the before/after feature hooks only once per feature regardless of how the scenarios are scheduled. This way we would not need to pair the feature hooks with the scenario executions as we can simply store the FeatureContext in a has by the class name. We could also eliminate the test-thread (worker) concept as well.
We deprecate (and eventually delete) the before/after hook and all the complexity caused by that (including test-thread (worker) concept). This can be combined temporarily with option 1) to be able to support scenario-level parallelism.

I feel that many of you would tend to go for option 2), and I don't have any problem with that in principle. As far as I understood your main arguments are basically making the Reqnroll codebase and the concepts simpler. I can agree to this, but this is really a tough decision. Not from the technical point of view, but from project & product perspective. My fears about this are the following:

This is a breaking change that is hard to communicate. Code that has been written in before/after feature hooks will compile fine, but suddenly it works differently. (That is the worst kind of breaking change.)
I have seen usages of before/after feature hooks that were relying on the direct before/after execution (so there is an existing need), but I still cannot see any concrete example for the modified specs. And I am a bit skeptical for changing to something that we haven't even seen to be useful.
This is a LOT of work because all Reqnroll infrastructure is heavily built on the test-thread (worker) concept.
I'm not sure we want to spend our time now on that (we have quite many open topics and a big demand on the living doc generator)

If we want to have simplicity, I would rather go for option 3). That is a clearer situation and - as you have mentioned around the resource management - we can provide examples of pooling, etc. where the same behavior can be achieved without feature hooks.

The option 1) can be used to have a quick win: it simplifies the code gen, works with scenario-level parallelism, easy and is not a breaking change. But it keeps the complexity and the "cognitive load" of the test-thread (worker) concept. This can also be seen as a "delaying" option as we can later still go for option 2) or 3).

2 replies

obligaron May 13, 2024
Maintainer

I tried to replace the workaround for MsTest in #128.
Hopefully the AfterFeature hook should now be called (nearly) after the scenario is finished.

Side note: The PR #119 of @obligaron uses a separate static field for storing the TestRunner for the before/after feature hook, but this is not sufficient, because is might happen that the data you save into the FeatureContext in the before feature hook is not "visible" during the scenario execution.

The PR should carry on data that was set in BeforeFeature to the FeatureContext to the scenario execution. At least I added this to my test. But perhaps I missed something, so I would be happy if you could point me to a direction. 🙂
Side-note: Perhaps in the PR, to keep the discussion here more focused.

Code-Grump May 14, 2024
Maintainer

I would love to go for option 2 to make the code based simpler for all involved, but I think all the reasons not to do it now are valid. I believe breaking changes need to be shipped in a major version, so this is probably something we can leave in the backlog for a future time.

Option 1 feels like a change we could make happen quickly and helps improve the situation.

I would like to spend some time looking at alternative mechanisms to hooks, in the hope we can make them simpler, encourage them as an alternative/soft-deprecation of hooks and reduce the impact of breaking changes to hooks. But after the Roslyn generator 😅

Code-Grump · 2024-06-24T12:59:05Z

Code-Grump
Jun 24, 2024
Maintainer

This issue seems relevant #189 to the discussion.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reqnroll

Review before/after feature behavior #124

{{title}}

Replies: 4 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Reqnroll

Review before/after feature behavior #124

gasparnagy May 10, 2024 Maintainer

Replies: 4 comments · 15 replies

gasparnagy May 10, 2024 Maintainer Author

Code-Grump May 10, 2024 Maintainer

gasparnagy May 10, 2024 Maintainer Author

gasparnagy May 13, 2024 Maintainer Author

Code-Grump May 13, 2024 Maintainer

gasparnagy May 13, 2024 Maintainer Author

gasparnagy May 13, 2024 Maintainer Author

obligaron May 13, 2024 Maintainer

gasparnagy May 13, 2024 Maintainer Author

obligaron May 13, 2024 Maintainer

Code-Grump May 14, 2024 Maintainer

Code-Grump Jun 24, 2024 Maintainer

gasparnagy
May 10, 2024
Maintainer

Replies: 4 comments 15 replies

gasparnagy
May 10, 2024
Maintainer Author

Code-Grump May 10, 2024
Maintainer

gasparnagy
May 10, 2024
Maintainer Author

gasparnagy May 13, 2024
Maintainer Author

Code-Grump May 13, 2024
Maintainer

gasparnagy May 13, 2024
Maintainer Author

gasparnagy May 13, 2024
Maintainer Author

obligaron May 13, 2024
Maintainer

gasparnagy
May 13, 2024
Maintainer Author

obligaron May 13, 2024
Maintainer

Code-Grump May 14, 2024
Maintainer

Code-Grump
Jun 24, 2024
Maintainer