Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: track simulated timers per dispatcher to simplify thread interactions and locking. #12609

Merged

Conversation

antoniovicente
Copy link
Contributor

@antoniovicente antoniovicente commented Aug 12, 2020

Commit Message: test: track simulated timers per dispatcher to simplify thread interactions and locking.
Additional Description:
Risk Level: low, test-only
Testing: unit
Docs Changes:
Release Notes:
Issue #12585 #12638

… cross thread interactions.

Signed-off-by: Antonio Vicente <avd@google.com>
…tcher thread to avoid data race found by tsan.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente
Copy link
Contributor Author

cc @mattklein123 @jmarantz

@jmarantz jmarantz self-assigned this Aug 12, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing comments as I'll be in meetings for quite a bit starting now.

This looks great so far.


bool isEnabled(Alarm& alarm);
void enableAlarm(Alarm& alarm, const std::chrono::microseconds& duration);
void disableAlarm(Alarm& alarm) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ABSL_LOCKS_EXCLUDED(mutex_) here and similar APIs on this object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


class AlarmSet {
public:
bool empty() { return alarms_.empty(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

public:
bool empty() { return alarms_.empty(); }

const AlarmRegistration& next() { return *alarms_.begin(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ASSERT(!empty()) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

private:
std::set<AlarmRegistration> alarms_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (this was pre-existing): call this ordered_alarms_, just for clarity at use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to sorted_alarms_

jmarantz added a commit to jmarantz/envoy that referenced this pull request Aug 12, 2020
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flushing more comments.

MonotonicTime current_monotonic_time_ ABSL_GUARDED_BY(mutex_);
SystemTime current_system_time_ ABSL_GUARDED_BY(mutex_);

MonotonicTime next_monotonic_time_ ABSL_GUARDED_BY(mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what this means is the timestamp of the next scheduled event on this scheduler? Add comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the two fields were not needed after all. Changed to just monotonic_time_ and system_time_

test/test_common/simulated_time_system.cc Show resolved Hide resolved
if (armed_) {
disableTimer();
if (!registered_alarms_.empty()) {
return registered_alarms_.next().time_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that a registered but not-yet-triggered alarm will be earlier than a triggered alarm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly sure the answer was no, but I'm removing this method since it is no longer needed after Matt's awesome WaitFor refactor.

time_system_.decPendingLockHeld();

if (duration.count() == 0) {
if (Runtime::runtimeFeatureEnabled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we cache this bool in the helper class? Or do we expect it to change during the course of a test method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see someone setting runtime features after creating the SimulatedScheduler but before creating the first alarm. Does it hurt to check the runtime features this often?

@mattklein123
Copy link
Member

Given @antoniovicente is OOO for the rest of the week, let's continue in #12614.

…r_dispatcher

Signed-off-by: Antonio Vicente <avd@google.com>
@jmarantz jmarantz reopened this Aug 20, 2020
…ving draft status from the PR.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente antoniovicente marked this pull request as ready for review August 21, 2020 16:55
Signed-off-by: Antonio Vicente <avd@google.com>
@jmarantz
Copy link
Contributor

please fix format so CI runs.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
SchedulableCallbackPtr run_alarms_cb_;

absl::Mutex mutex_;
bool not_running_cbs_ ABSL_GUARDED_BY(mutex_) = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment that this bool has negative sense to make it easier to use with absl::Condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed implementation to avoid the double negatives.

}

private:
std::set<AlarmRegistration> sorted_alarms_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have thread annotations on this member var and the methods above? or if it is to be left unguarded due to a threading discipline can you call that out in comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All synchronization for this object is external. See:
AlarmSet registered_alarms_ ABSL_GUARDED_BY(mutex_);
AlarmSet triggered_alarms_ ABSL_GUARDED_BY(mutex_);

Signed-off-by: Antonio Vicente <avd@google.com>
@@ -125,6 +125,7 @@ envoy_cc_test(

envoy_cc_test(
name = "guarddog_impl_test",
size = "small",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation of this change:
Before other changes in this PR this test occasionally times out due to flaky alarm behavior as seen in #12638

This test usually runs in about 5s to 10s in standard confirgurations, and 20 seconds under clang-tsan. It may make sense to reduce the configured timeout for this test to small which has an associated timeout of 60 secs in order to get faster feedback when this test fails due to timeout.

}

private:
std::set<AlarmRegistration> sorted_alarms_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All synchronization for this object is external. See:
AlarmSet registered_alarms_ ABSL_GUARDED_BY(mutex_);
AlarmSet triggered_alarms_ ABSL_GUARDED_BY(mutex_);

SchedulableCallbackPtr run_alarms_cb_;

absl::Mutex mutex_;
bool not_running_cbs_ ABSL_GUARDED_BY(mutex_) = true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed implementation to avoid the double negatives.

jmarantz
jmarantz previously approved these changes Aug 25, 2020
@mattklein123 mattklein123 self-assigned this Aug 25, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this makes sense at a high level. I didn't analyze all of the locking, etc. deeply. One API thought for consideration. Thank you for working on this!

/wait

@@ -107,6 +107,11 @@ class GradientControllerTest : public testing::Test {
.value());
}

template <typename DurationType> void advanceTimeAndLoop(DurationType duration) {
time_system_.advanceTimeAsync(duration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about having advanceTimeAsync take a dispatcher and whether to run it blocking or non-blocking? It seems like it would be almost always broken to not do this pattern when using this API? If there are some cases I guess the dispatcher could be optional in the API but it would at least make test writers think about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz @antoniovicente ping on this one. I see a new commit but I'm unsure of the plan here. If this is not needed we can merge but wanted to check.

/wait-any

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, was doing some research in order to provide a complete reply to your comment.

As you noticed I rolled back this file since the changes to it are no longer required once I moved to updating monotonic time directly instead of updating next_monotonic_time and propagating it to event loops next time they run.

The idea of taking a dispatcher on advanceTimeAsync is an interesting suggestion. I found that most uses of this function happen in cases where there is a single active Dispatcher, but there's 1 exception to this in ServerStatsTest.FlushStats. Still it makes sense to have a SimulatedTimeSystem function that moves time forward and then runs the dispatcher event loop. Preferences for doing that change in this PR or a followup cleanup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferences for doing that change in this PR or a followup cleanup?

Up to you, I'm fine either way. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do the cleanup as a followup.

Signed-off-by: Antonio Vicente <avd@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants