Add staging pipeline to PRs #45178

safern · 2020-11-25T01:28:38Z

This PR adds a staging pipeline which sets continueOnError: true on test steps. Also, we're moving Android legs that run on an emulator to this pipeline as they've been unstable for a long time.

I've validated the generated yml and continueOnError is set to true accordingly.

I tried to do it in a minimal way by defining a template expression variable on the common variable template, but continueOnError is evaluated too early on the template expansion that the variable was evaluating to Null. So this was the best way I could think off, where the value of the variable is in a central location.

I just added continueOnError: true for test results, as we should protect build verticals of new platforms from PRs on breaking it and verticals are usually stable on new platforms, so a PR that adds a new platform, should add a vertical that is green from the beginning.

The pipeline name is runtime-staging, I also thought about runtime-onboarding, let me know what your preference is.

Fixes: #44031

cc: @sunandabalu @markwilkie @jkotas @fanyang-mono @naricc

Dotnet-GitSync-Bot · 2020-11-25T01:28:41Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2020-11-25T01:29:21Z

Tagging subscribers to this area: @ViktorHofer
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds a staging pipeline which sets continueOnError: true on test steps. Also, we're moving Android legs that run on an emulator to this pipeline as they've been inestable for a long time.

I've validated the generated yml and continueOnError is set to true accordingly.

I tried to do it in a minimal way by defining a template expression variable on the common variable template, but continueOnError is evaluated too early on the template expansion that the variable was evaluating to Null. So this was the best way I could think off, where the value of the variable is in a central location.

I just added continueOnError: true for test results, as we should protect build verticals of new platforms from PRs on breaking it and verticals are usually stable on new platforms, so a PR that adds a new platform, should add a vertical that is green from the beginning.

The pipeline name is runtime-staging, I also thought about runtime-onboarding, let me know what your preference is.

Fixes: #44306

cc: @sunandabalu @markwilkie @jkotas @fanyang-mono @naricc

Author:	safern
Assignees:	-
Labels:	`area-Infrastructure`
Milestone:	-

safern · 2020-11-25T01:30:52Z

eng/pipelines/runtime-staging.yml

@@ -0,0 +1,110 @@
+# The staging pipeline only makes sense to run it on PRs.
+trigger: none


I disabled batched rolling builds for the staging pipeline as it didn't make sense to have it. However I did think that it might be useful for people trying to get a new platform stable, to have rolling builds that do fail the build when there are test failures.

Opinions on that?

I must admit I'm not a big fan of this idea for two reasons: first, subtle differences between PR and rolling builds are enormously cryptical for most developers. I myself have learned to appreciate and leverage them in the runtime repo but only after having been involved in infrastructure for quite some time. Second, according to my understanding the conceptual thinking is that failures in the staging runs are not "hard errors", they are just indicators that are subsequently processed by a different statistic mechanism to identify "true errors". I worry that having them manifest as "hard errors" in half of the runs would cause more harm than good - after all, [the few] developers standing up runs for new platforms are typically experienced engineers used to digging through logs to validate their efforts, it seems counter-intuitive to add a feature benefiting just them and potentially harming / confusing to most other developers. Just my one cent.

Ideally a staging pipeline should do exactly what the prod pipeline does ie run as part of PRs and CI/Rolling builds; on the same platforms (unless a new platform is being introduced) etc. So when the test running in staging passes consistently it can get promoted with confidence to the prod pipeline. IMHO, we should have rolling builds along with PR builds for staging pipeline.

Thanks for your thoughts on this. @trylek I completely agree with your points, however the reason why I was asking is because there are 3 points that made me raise this question:

1st is @sunandabalu's point above.
2nd is, in order to measure pass rate, we will need to have a rolling build with "failures" in it, as PR builds of this will always be green if there are test failures, so it will be harder to get data on that.
3rd if someone breaks a test or something on their PR because this build was green, by having rolling builds that actually fail when running tests, we will know sooner rather than later and also it would be easier to track which change actually broke it.

Please let me know if that makes sense?

Thanks Santi for the clarifications, they basically make sense to me. I think part of the problem is that, according to my recollections, in our past discussions on the subject we combined and maybe somewhat conflated two fundamentally different types of tests that cause trouble in PR / CI runs and should be moved or reclassified:

(a) Tests that exhibit some inherent failure rate due to relying on external factors, typically network availability (e.g. some of the HTTP or SSL tests);

(b) Tests being brought up, typically the Mono tests on devices, that exhibited some initial flakiness after they started running on a regular basis but issues got filed for these and they ultimately became green, possibly modulo group (1).

For category (b) the staging pipeline is a trivial and natural solution similar to flaky test quarantine used by ASP.NET infra. Its application to (a) is more tricky because then your bullet 1 (sunandabalu's comment) becomes somewhat contradictory to the remaining bullets 2-3. Simply put, you cannot just "quarantine" all networking tests because they fail 1% of the time due to transient network failures.

You can move them to another pipeline so that you can track and measure them separately but that doesn't change the fact that our mental model based on Jared's initial analysis is that their results are not "binary". Here we hit the problem that AzDO infra is built on tests having binary results. So that, if / as we don't seem to have any easy leverage to make AzDO implement non-binary test outcomes in their core infra, we need some customization or bolt-on to make them work.

This customization basically boils down to either reporting individual failures as PR errors (which is what we have today and don't like), not reporting them as errors at all and just use some custom side tooling a la runfo to monitor their long-term trends and raise some red flags / file issues when their failure rate exceeds a certain threshold (which is what I suggested but is not ideal either as you point out) or reporting them as errors in the staging pipeline and not in the "real" PR pipeline (that seems to go against sunandabalu's point that the two pipelines should behave in a similar manner letting tests freely flow between them; I also cannot help feeling it's somewhat counter-intuitive).

Maybe if AzDO at least supported three-state test outcomes, we could use some "unknown" or "yellow" state for these; but most likely it's a no-go. That seems to imply we're down to identifying the "lesser evil" and I think your pick is probably best from a practical perspective despite my reservations.

Thanks, @trylek for the points. While I completely agree with this, I think this would make sense if we would include bucket a, which we agreed it is out of the scope at the moment as we already have a mechanism for that which is adding ActiveIssue to those tests.

So I think we should be able to enable CI runs for this with failures on the tests to track pass rate and to track things that might have broken it since the intent for the pipeline is only bullet point b that you mention. Anyway I will do this in a separate PR.

I presume too that if there were additional tools to make the tests more resilient to transient issues (like network), that this would be acceptable as well @trylek ?

This is definitely a useful improvement compared to what we had before. I just wanted to point out that the question of "resiliency to transient issues" (no matter whether that means resiliency of the tests themselves or of the test infra) is largely orthogonal and not fully addressed / fixed by this change - this was not immediately obvious to me and may have caused some of my confusion.

jkotas · 2020-11-25T01:31:25Z

The pipeline name is runtime-staging

👍

trylek

Looks great to me, thank you!

trylek · 2020-11-25T01:49:47Z

eng/pipelines/runtime-staging.yml

@@ -0,0 +1,110 @@
+# The staging pipeline only makes sense to run it on PRs.
+trigger: none


I must admit I'm not a big fan of this idea for two reasons: first, subtle differences between PR and rolling builds are enormously cryptical for most developers. I myself have learned to appreciate and leverage them in the runtime repo but only after having been involved in infrastructure for quite some time. Second, according to my understanding the conceptual thinking is that failures in the staging runs are not "hard errors", they are just indicators that are subsequently processed by a different statistic mechanism to identify "true errors". I worry that having them manifest as "hard errors" in half of the runs would cause more harm than good - after all, [the few] developers standing up runs for new platforms are typically experienced engineers used to digging through logs to validate their efforts, it seems counter-intuitive to add a feature benefiting just them and potentially harming / confusing to most other developers. Just my one cent.

stephentoub · 2020-11-25T15:59:45Z

Thanks, @safern.

eng/pipelines/common/xplat-setup.yml

eng/pipelines/common/templates/runtimes/wasm-runtime-and-send-to-helix.yml

safern requested a review from a team November 25, 2020 01:28

safern added the area-Infrastructure label Nov 25, 2020

safern commented Nov 25, 2020

View reviewed changes

Add staging pipeline to PRs

2f8af75

safern force-pushed the AddStagingPipeline branch from cce7311 to 2f8af75 Compare November 25, 2020 01:33

Condition extraSubsets parameter

8513de0

trylek approved these changes Nov 25, 2020

View reviewed changes

sunandabalu reviewed Nov 25, 2020

View reviewed changes

eng/pipelines/common/xplat-setup.yml Show resolved Hide resolved

naricc reviewed Nov 30, 2020

View reviewed changes

eng/pipelines/common/templates/runtimes/wasm-runtime-and-send-to-helix.yml Show resolved Hide resolved

safern merged commit 9f92311 into master Nov 30, 2020

safern deleted the AddStagingPipeline branch November 30, 2020 23:56

elinor-fung mentioned this pull request Dec 3, 2020

Installer test failures in PR build do not fail the build #45515

Closed

ghost locked as resolved and limited conversation to collaborators Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add staging pipeline to PRs #45178

Add staging pipeline to PRs #45178

safern commented Nov 25, 2020 •

edited by akoeplinger

Loading

Dotnet-GitSync-Bot commented Nov 25, 2020

ghost commented Nov 25, 2020

safern Nov 25, 2020

trylek Nov 25, 2020

sunandabalu Nov 25, 2020

safern Nov 25, 2020

trylek Nov 30, 2020

safern Nov 30, 2020 •

edited

Loading

markwilkie Nov 30, 2020

trylek Dec 1, 2020

jkotas commented Nov 25, 2020

trylek left a comment

trylek Nov 25, 2020

stephentoub commented Nov 25, 2020

		@@ -0,0 +1,110 @@
		# The staging pipeline only makes sense to run it on PRs.
		trigger: none

Add staging pipeline to PRs #45178

Add staging pipeline to PRs #45178

Conversation

safern commented Nov 25, 2020 • edited by akoeplinger Loading

Dotnet-GitSync-Bot commented Nov 25, 2020

ghost commented Nov 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

safern Nov 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas commented Nov 25, 2020

trylek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub commented Nov 25, 2020

safern commented Nov 25, 2020 •

edited by akoeplinger

Loading

safern Nov 30, 2020 •

edited

Loading