Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed HA Tracker jitter causing unnecessary CAS operations #1861

Merged

Conversation

pracucci
Copy link
Contributor

What this PR does:
Today me and @pstibrany spent few hours debugging an expected pattern on the number of CAS operations done by the distributors when the HA tracker is enabled.

We found out that the PR #1748 - which introduced the update timeout jitter - also introduced a time window (long as much as the jitter) during which every request does a CAS but the CAS operation itself doesn't update the replica updated timestamp, because the CAS is triggered if now - receivedAt >= updateTimeout but then the CAS function is a noop if now - receivedAt < updateTimeout + jitter.

While adding tests, I've also realized that the checkReplicaTimestamp() was broken (tests were actually failing) so I've fixed it.

Which issue(s) this PR fixes:
No issue

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice bug!

if r.GetReplica() != replica {
err = fmt.Errorf("replicas did not match: %s != %s", r.GetReplica(), replica)
continue outer
}
if timestamp.Time(r.GetReceivedAt()).Equal(expected) {
} else if !timestamp.Time(r.GetReceivedAt()).Equal(expected) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny detail, but using 'else if' when previous 'if' body does continue, break or return is not very nice. Previous version was clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! Fixed

if r.GetReplica() != replica {
err = fmt.Errorf("replicas did not match: %s != %s", r.GetReplica(), replica)
continue outer
}
if timestamp.Time(r.GetReceivedAt()).Equal(expected) {
if !timestamp.Time(r.GetReceivedAt()).Equal(expected) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! :)

Copy link
Contributor

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, the actual change is pretty minor but good catch!

Were the whitespace/newline and mtime changes intentional?

})
assert.NoError(t, err)

// Write the first time.
mtime.NowForce(start)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there something that necessitated this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It avoids flaky tests.

})
assert.NoError(t, err)

// Write the first time.
mtime.NowForce(startTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It avoids flaky tests.

@pracucci
Copy link
Contributor Author

pracucci commented Dec 2, 2019

@gouthamve Do we want this in the 0.4.0?

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the fix-ha-tracker-cas-contention branch from 47e140c to 9e37d00 Compare December 2, 2019 11:48
@gouthamve gouthamve merged commit d74af59 into cortexproject:master Dec 2, 2019
@pracucci pracucci deleted the fix-ha-tracker-cas-contention branch December 2, 2019 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants