-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky console-subscriber
integration tests
#473
Labels
Comments
hds
added a commit
that referenced
this issue
Oct 11, 2023
The `ConsoleLayer` uses the `Callsites` struct to store and check for callsites for specific kinds of traces, for example spawn spans or waker events. `Callsites` stores a fixed size array of pointers to the `Metadata` for each callsite and a length to indicate how many callsites it has registered. The length and each individual pointer are stored in atomics. Since it is possible for these values to change individually, if a callsite lookup fails, we check if the length of the array has changed while we were checking the pointers, if it has, the lookup is started again. However, there is still a possible race condition. If the length changes, but the lookup occurs before the callsite pointer is actually written, then we may miss a callsite that is in the process of being registered. In this case, the pointer which is loaded from the `Callsites` array will be null. This change adds a check for this case (null ptr), and reperforms the lookup if it occurs. This race condition was found while chasing down the source of #473. It doesn't solve the flakiness, but it can reduce the likelihood of it occuring, thus it is a mitigation only. In reality, neither of these race condition checks should be needed, as we would expect that `tracing` guarantees that `ConsoleLayer` completes `register_callsite()` before `on_event()` or `new_span()` are called.
hds
added a commit
that referenced
this issue
Oct 11, 2023
A flakiness problem has been discovered with the `console-subscriber` integration tests introduced in #452. Issue #473 is tracking the issue. It has been observed that we only "miss" the wake operation event when it comes from `yield_now()`, but not when it comes from a task that yielded due to `sleep`, even when the duration is zero. it is likely that this is due to nature of the underlying race condition. This change removes all the calls to `yield_now()` from the `framework` tests, except those where we wish to actually test self wakes.
hds
added a commit
that referenced
this issue
Oct 25, 2023
A flakiness problem has been discovered with the `console-subscriber` integration tests introduced in #452. Issue #473 is tracking the issue. It has been observed that we only "miss" the wake operation event when it comes from `yield_now()`, but not when it comes from a task that yielded due to `sleep`, even when the duration is zero. it is likely that this is due to nature of the underlying race condition. This change removes all the calls to `yield_now()` from the `framework` tests, except those where we wish to actually test self wakes. Additionally, all the sleeps have been moved out into a separate function which describes why we're using `sleep` instead of `yield_now` when either of them would be sufficient.
Here is another one: |
These two tests are so flaky that I'm going to delete them until we fix the problem. From #[test]
fn sleep_wakes() {
let expected_task = ExpectedTask::default()
.match_default_name()
.expect_wakes(1)
.expect_self_wakes(0);
let future = async {
sleep(Duration::ZERO).await;
};
assert_task(expected_task, future);
}
#[test]
fn double_sleep_wakes() {
let expected_task = ExpectedTask::default()
.match_default_name()
.expect_wakes(2)
.expect_self_wakes(0);
let future = async {
sleep(Duration::ZERO).await;
sleep(Duration::ZERO).await;
};
assert_task(expected_task, future);
} |
hds
added a commit
that referenced
this issue
Apr 10, 2024
We have identified that some of the `console-subscriber` integration tests are flaky (#473). This appears to be a result of the following issue in `tracing` tokio-rs/tracing#2743. Flaky tests are really worse than no tests. So these tests will be removed until the flakiness can be fixed.
hds
added a commit
that referenced
this issue
Apr 10, 2024
We have identified that some of the `console-subscriber` integration tests are flaky (#473). This appears to be a result of the following issue in `tracing` tokio-rs/tracing#2743. Flaky tests are really worse than no tests. So these tests will be removed until the flakiness can be fixed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What crate(s) in this repo are involved in the problem?
console-subscriber
What is the issue?
The
console-subscriber
integration tests show flaky behavior locally.The tests were originally introduced in #452.
How can the bug be reproduced?
Run all the
framework
tests locally. Sometimes theself_wakes
test fails.To reproduce this, I ran the tests in a loop until they failed. On my machine it typically takes under 5 minutes to produce a failure.
Logs, error output, etc
No response
Versions
main
branch.console-subscriber
v0.2.0Possible solution
It appears that the root cause for this issue is the one described in tokio-rs/tracing#2743.
Additional context
No response
Would you like to work on fixing this bug?
yes
The text was updated successfully, but these errors were encountered: