-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: track simulated timers per dispatcher to simplify thread interactions and locking. #12609
test: track simulated timers per dispatcher to simplify thread interactions and locking. #12609
Conversation
… cross thread interactions. Signed-off-by: Antonio Vicente <avd@google.com>
…tcher thread to avoid data race found by tsan. Signed-off-by: Antonio Vicente <avd@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flushing comments as I'll be in meetings for quite a bit starting now.
This looks great so far.
|
||
bool isEnabled(Alarm& alarm); | ||
void enableAlarm(Alarm& alarm, const std::chrono::microseconds& duration); | ||
void disableAlarm(Alarm& alarm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ABSL_LOCKS_EXCLUDED(mutex_) here and similar APIs on this object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
class AlarmSet { | ||
public: | ||
bool empty() { return alarms_.empty(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
public: | ||
bool empty() { return alarms_.empty(); } | ||
|
||
const AlarmRegistration& next() { return *alarms_.begin(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ASSERT(!empty()) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
} | ||
|
||
private: | ||
std::set<AlarmRegistration> alarms_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (this was pre-existing): call this ordered_alarms_, just for clarity at use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to sorted_alarms_
Signed-off-by: Joshua Marantz <jmarantz@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flushing more comments.
MonotonicTime current_monotonic_time_ ABSL_GUARDED_BY(mutex_); | ||
SystemTime current_system_time_ ABSL_GUARDED_BY(mutex_); | ||
|
||
MonotonicTime next_monotonic_time_ ABSL_GUARDED_BY(mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this means is the timestamp of the next scheduled event on this scheduler? Add comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that the two fields were not needed after all. Changed to just monotonic_time_ and system_time_
if (armed_) { | ||
disableTimer(); | ||
if (!registered_alarms_.empty()) { | ||
return registered_alarms_.next().time_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that a registered but not-yet-triggered alarm will be earlier than a triggered alarm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly sure the answer was no, but I'm removing this method since it is no longer needed after Matt's awesome WaitFor refactor.
time_system_.decPendingLockHeld(); | ||
|
||
if (duration.count() == 0) { | ||
if (Runtime::runtimeFeatureEnabled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we cache this bool in the helper class? Or do we expect it to change during the course of a test method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see someone setting runtime features after creating the SimulatedScheduler but before creating the first alarm. Does it hurt to check the runtime features this often?
Given @antoniovicente is OOO for the rest of the week, let's continue in #12614. |
…r_dispatcher Signed-off-by: Antonio Vicente <avd@google.com>
…ving draft status from the PR. Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
please fix format so CI runs. |
Signed-off-by: Antonio Vicente <avd@google.com>
SchedulableCallbackPtr run_alarms_cb_; | ||
|
||
absl::Mutex mutex_; | ||
bool not_running_cbs_ ABSL_GUARDED_BY(mutex_) = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment that this bool has negative sense to make it easier to use with absl::Condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed implementation to avoid the double negatives.
} | ||
|
||
private: | ||
std::set<AlarmRegistration> sorted_alarms_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have thread annotations on this member var and the methods above? or if it is to be left unguarded due to a threading discipline can you call that out in comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All synchronization for this object is external. See:
AlarmSet registered_alarms_ ABSL_GUARDED_BY(mutex_);
AlarmSet triggered_alarms_ ABSL_GUARDED_BY(mutex_);
Signed-off-by: Antonio Vicente <avd@google.com>
@@ -125,6 +125,7 @@ envoy_cc_test( | |||
|
|||
envoy_cc_test( | |||
name = "guarddog_impl_test", | |||
size = "small", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation of this change:
Before other changes in this PR this test occasionally times out due to flaky alarm behavior as seen in #12638
This test usually runs in about 5s to 10s in standard confirgurations, and 20 seconds under clang-tsan. It may make sense to reduce the configured timeout for this test to small which has an associated timeout of 60 secs in order to get faster feedback when this test fails due to timeout.
} | ||
|
||
private: | ||
std::set<AlarmRegistration> sorted_alarms_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All synchronization for this object is external. See:
AlarmSet registered_alarms_ ABSL_GUARDED_BY(mutex_);
AlarmSet triggered_alarms_ ABSL_GUARDED_BY(mutex_);
SchedulableCallbackPtr run_alarms_cb_; | ||
|
||
absl::Mutex mutex_; | ||
bool not_running_cbs_ ABSL_GUARDED_BY(mutex_) = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed implementation to avoid the double negatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, this makes sense at a high level. I didn't analyze all of the locking, etc. deeply. One API thought for consideration. Thank you for working on this!
/wait
@@ -107,6 +107,11 @@ class GradientControllerTest : public testing::Test { | |||
.value()); | |||
} | |||
|
|||
template <typename DurationType> void advanceTimeAndLoop(DurationType duration) { | |||
time_system_.advanceTimeAsync(duration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about having advanceTimeAsync
take a dispatcher and whether to run it blocking or non-blocking? It seems like it would be almost always broken to not do this pattern when using this API? If there are some cases I guess the dispatcher could be optional in the API but it would at least make test writers think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmarantz @antoniovicente ping on this one. I see a new commit but I'm unsure of the plan here. If this is not needed we can merge but wanted to check.
/wait-any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, was doing some research in order to provide a complete reply to your comment.
As you noticed I rolled back this file since the changes to it are no longer required once I moved to updating monotonic time directly instead of updating next_monotonic_time and propagating it to event loops next time they run.
The idea of taking a dispatcher on advanceTimeAsync is an interesting suggestion. I found that most uses of this function happen in cases where there is a single active Dispatcher, but there's 1 exception to this in ServerStatsTest.FlushStats. Still it makes sense to have a SimulatedTimeSystem function that moves time forward and then runs the dispatcher event loop. Preferences for doing that change in this PR or a followup cleanup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferences for doing that change in this PR or a followup cleanup?
Up to you, I'm fine either way. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do the cleanup as a followup.
Signed-off-by: Antonio Vicente <avd@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Commit Message: test: track simulated timers per dispatcher to simplify thread interactions and locking.
Additional Description:
Risk Level: low, test-only
Testing: unit
Docs Changes:
Release Notes:
Issue #12585 #12638