-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing MetricEventSource tests #55385
Conversation
I've seen two recent non-deterministic failures: 1. After starting the EventListener there is a delay of one collection interval (currently 0.3 seconds) where we expect calls to counter.Add and Histogram.Record() to update stats. On a fast machine this code would run in under a millisecond but in some test runs it still wasn't happening fast enough. 2. We were seeing error events from a previous test get observed in a later test because session id was being ignored for error events. Fixes: 1. Added an initial collection interval where no counter APIs will be invoked and the test will delay up to 5 seconds waiting for this. Hopefully this delay makes it more likely that the Add/Record APIs are ready to execute promptly once the 2nd interval begins. If that still isn't sufficient we can either increase the collection intervals or disable the tests. 2. Tightened up the session id matching so only error events with empty id or the expected id are allowed.
@tarekgh |
Am I understanding correctly that we expect these events to occur and if they don't the test is meant to fail? If that's the case, I recommend using a much larger timeout than 5 seconds. Elsewhere we generally use at least 30 if not 60 seconds in such cases: such values are much more tolerant of load in CI, you only end up paying the full time if the test is going to fail anyway, and if the event occurs later in that timeframe, it's better to have spent the extra time to make the test succeed rather than having flaky tests. (I also recommend putting it into a SuccessTimeout constant such that the value is centralized to one updateable location.) |
Yep, that is correct. I can change 5 sec -> 60 sec. |
|
||
public MetricEventSourceTests(ITestOutputHelper output) | ||
{ | ||
_output = output; | ||
} | ||
|
||
[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.IsNotBrowser))] | ||
[OuterLoop("Slow and has lots of console spew")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seeing you have added outer loop attribute to some test, do you want enable CI run the outer loops tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I hope you enable the outerloop tests before you merge to ensure running the touched tests. Thanks
I don't care if every PR runs them, as long as they run periodically. Were you saying that as-is the automation will never run them (in which case I want to change it), or that they won't run on every PR (which I am fine with)? |
Outerloop doesn't run on any PR by default; you have to ask the bot to run the legs via a comment on a PR. Outerloop does run on various rolling builds daily. |
/azp list |
/azp run runtime-libraries-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
src/libraries/System.Diagnostics.DiagnosticSource/tests/MetricEventSourceTests.cs
Outdated
Show resolved
Hide resolved
Thanks, that sounds like a fine state of affairs for the tests I marked as outerloop. [EDIT] |
4feb86c
to
deaeea4
Compare
I've seen two recent non-deterministic failures:
interval (currently 0.3 seconds) where we expect calls to counter.Add
and Histogram.Record() to update stats. On a fast machine this code
would run in well under a millisecond but in some test runs it still wasn't
happening fast enough.
later test because session id was being ignored for error events.
Fixes:
invoked and the test will delay up to 5 seconds waiting for this.
Hopefully this delay makes it more likely that the Add/Record APIs
are ready to execute promptly once the 2nd interval begins. If that
still isn't sufficient we can either increase the collection intervals or
disable the tests.
empty id or the expected id are allowed.
Fixes #55315 and #55313 (I hope)