-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add staging pipeline to PRs #45178
Add staging pipeline to PRs #45178
Conversation
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @ViktorHofer Issue DetailsThis PR adds a staging pipeline which sets I've validated the generated yml and I tried to do it in a minimal way by defining a template expression variable on the common variable template, but I just added The pipeline name is Fixes: #44306 cc: @sunandabalu @markwilkie @jkotas @fanyang-mono @naricc
|
@@ -0,0 +1,110 @@ | |||
# The staging pipeline only makes sense to run it on PRs. | |||
trigger: none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disabled batched rolling builds for the staging pipeline as it didn't make sense to have it. However I did think that it might be useful for people trying to get a new platform stable, to have rolling builds that do fail the build when there are test failures.
Opinions on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I'm not a big fan of this idea for two reasons: first, subtle differences between PR and rolling builds are enormously cryptical for most developers. I myself have learned to appreciate and leverage them in the runtime repo but only after having been involved in infrastructure for quite some time. Second, according to my understanding the conceptual thinking is that failures in the staging runs are not "hard errors", they are just indicators that are subsequently processed by a different statistic mechanism to identify "true errors". I worry that having them manifest as "hard errors" in half of the runs would cause more harm than good - after all, [the few] developers standing up runs for new platforms are typically experienced engineers used to digging through logs to validate their efforts, it seems counter-intuitive to add a feature benefiting just them and potentially harming / confusing to most other developers. Just my one cent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally a staging pipeline should do exactly what the prod pipeline does ie run as part of PRs and CI/Rolling builds; on the same platforms (unless a new platform is being introduced) etc. So when the test running in staging passes consistently it can get promoted with confidence to the prod pipeline. IMHO, we should have rolling builds along with PR builds for staging pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your thoughts on this. @trylek I completely agree with your points, however the reason why I was asking is because there are 3 points that made me raise this question:
1st is @sunandabalu's point above.
2nd is, in order to measure pass rate, we will need to have a rolling build with "failures" in it, as PR builds of this will always be green if there are test failures, so it will be harder to get data on that.
3rd if someone breaks a test or something on their PR because this build was green, by having rolling builds that actually fail when running tests, we will know sooner rather than later and also it would be easier to track which change actually broke it.
Please let me know if that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Santi for the clarifications, they basically make sense to me. I think part of the problem is that, according to my recollections, in our past discussions on the subject we combined and maybe somewhat conflated two fundamentally different types of tests that cause trouble in PR / CI runs and should be moved or reclassified:
(a) Tests that exhibit some inherent failure rate due to relying on external factors, typically network availability (e.g. some of the HTTP or SSL tests);
(b) Tests being brought up, typically the Mono tests on devices, that exhibited some initial flakiness after they started running on a regular basis but issues got filed for these and they ultimately became green, possibly modulo group (1).
For category (b) the staging pipeline is a trivial and natural solution similar to flaky test quarantine used by ASP.NET infra. Its application to (a) is more tricky because then your bullet 1 (sunandabalu's comment) becomes somewhat contradictory to the remaining bullets 2-3. Simply put, you cannot just "quarantine" all networking tests because they fail 1% of the time due to transient network failures.
You can move them to another pipeline so that you can track and measure them separately but that doesn't change the fact that our mental model based on Jared's initial analysis is that their results are not "binary". Here we hit the problem that AzDO infra is built on tests having binary results. So that, if / as we don't seem to have any easy leverage to make AzDO implement non-binary test outcomes in their core infra, we need some customization or bolt-on to make them work.
This customization basically boils down to either reporting individual failures as PR errors (which is what we have today and don't like), not reporting them as errors at all and just use some custom side tooling a la runfo to monitor their long-term trends and raise some red flags / file issues when their failure rate exceeds a certain threshold (which is what I suggested but is not ideal either as you point out) or reporting them as errors in the staging pipeline and not in the "real" PR pipeline (that seems to go against sunandabalu's point that the two pipelines should behave in a similar manner letting tests freely flow between them; I also cannot help feeling it's somewhat counter-intuitive).
Maybe if AzDO at least supported three-state test outcomes, we could use some "unknown" or "yellow" state for these; but most likely it's a no-go. That seems to imply we're down to identifying the "lesser evil" and I think your pick is probably best from a practical perspective despite my reservations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @trylek for the points. While I completely agree with this, I think this would make sense if we would include bucket a
, which we agreed it is out of the scope at the moment as we already have a mechanism for that which is adding ActiveIssue
to those tests.
So I think we should be able to enable CI runs for this with failures on the tests to track pass rate and to track things that might have broken it since the intent for the pipeline is only bullet point b
that you mention. Anyway I will do this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume too that if there were additional tools to make the tests more resilient to transient issues (like network), that this would be acceptable as well @trylek ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a useful improvement compared to what we had before. I just wanted to point out that the question of "resiliency to transient issues" (no matter whether that means resiliency of the tests themselves or of the test infra) is largely orthogonal and not fully addressed / fixed by this change - this was not immediately obvious to me and may have caused some of my confusion.
👍 |
cce7311
to
2f8af75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thank you!
@@ -0,0 +1,110 @@ | |||
# The staging pipeline only makes sense to run it on PRs. | |||
trigger: none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I'm not a big fan of this idea for two reasons: first, subtle differences between PR and rolling builds are enormously cryptical for most developers. I myself have learned to appreciate and leverage them in the runtime repo but only after having been involved in infrastructure for quite some time. Second, according to my understanding the conceptual thinking is that failures in the staging runs are not "hard errors", they are just indicators that are subsequently processed by a different statistic mechanism to identify "true errors". I worry that having them manifest as "hard errors" in half of the runs would cause more harm than good - after all, [the few] developers standing up runs for new platforms are typically experienced engineers used to digging through logs to validate their efforts, it seems counter-intuitive to add a feature benefiting just them and potentially harming / confusing to most other developers. Just my one cent.
Thanks, @safern. |
This PR adds a staging pipeline which sets
continueOnError: true
on test steps. Also, we're moving Android legs that run on an emulator to this pipeline as they've been unstable for a long time.I've validated the generated yml and
continueOnError
is set to true accordingly.I tried to do it in a minimal way by defining a template expression variable on the common variable template, but
continueOnError
is evaluated too early on the template expansion that the variable was evaluating toNull
. So this was the best way I could think off, where the value of the variable is in a central location.I just added
continueOnError: true
for test results, as we should protect build verticals of new platforms from PRs on breaking it and verticals are usually stable on new platforms, so a PR that adds a new platform, should add a vertical that is green from the beginning.The pipeline name is
runtime-staging
, I also thought aboutruntime-onboarding
, let me know what your preference is.Fixes: #44031
cc: @sunandabalu @markwilkie @jkotas @fanyang-mono @naricc