-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES-1292925] Fix metrics with reusable counter resets #107
Conversation
Signed-off-by: Yi Jin <yi.jin@databricks.com>
var it adjustableSeriesIterator | ||
if m.isCounter { | ||
it = &counterErrAdjustSeriesIterator{Iterator: r.Iterator(nil)} | ||
} else { | ||
it = &noopAdjustableSeriesIterator{Iterator: r.Iterator(nil)} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are applying counter adjustments to the raw data samples before deduplication. This is more complicated than applying the adjustments to the single time series after deduplication. The adjustments will intervene with quorum-based deduplication logic. I'm concerned it may introduce other edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another test case such that one raw time series has a large gap with a reset, but the other two have complete data?
replica 0: [[1000, 10], [10000, 8], [11000, 10]]
replica 1: [[1000, 10], [2000, 0], [3000, 1], [4000, 2], [5000, 3], [6000, 4], [7000, 5], [8000, 6], [9000, 7], [10000, 8], [11000, 10]
replica 2: [[1000, 10], [2000, 0], [3000, 1], [4000, 2], [5000, 3], [6000, 4], [7000, 5], [8000, 6], [9000, 7], [10000, 8], [11000, 10]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added new test cases, the reason i've done it this way is to reuse counterErrAdjustSeriesIterator
which you need to call adjustAtValue()
somewhere, passing a merged time series to original newDedupSeries doesn't work
// feed the merged series into dedup series which apply counter adjustment | ||
return NewMergedSeries(s.lset, repl, s.f) | ||
} | ||
if s.deduplicationFunc == AlgorithmChain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @yuchen-db, I've also added the prometheus implementation, but it doesn't work for a number of unit tests if you wonder
e012181
to
33ec6e3
Compare
33ec6e3
to
ccb205c
Compare
Signed-off-by: Yi Jin <yi.jin@databricks.com>
0b0fcfe
to
7c98625
Compare
pkg/query/querier.go
Outdated
@@ -210,18 +210,20 @@ func newQuerierWithOpts( | |||
|
|||
partialResponseStrategy := storepb.PartialResponseStrategy_ABORT | |||
if opts.GroupReplicaPartialResponseStrategy { | |||
level.Debug(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") | |||
level.Info(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be very chatty. I intentionally changed it to Debug previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every query will print this log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
found i actually doesn't need this log line, it is logged here already:
level.Info(logger).Log("msg", "databricks querier features", "opts", fmt.Sprintf("%+v", opts))
pkg/query/querier.go
Outdated
partialResponseStrategy = storepb.PartialResponseStrategy_GROUP_REPLICA | ||
} else if partialResponse { | ||
partialResponseStrategy = storepb.PartialResponseStrategy_WARN | ||
} | ||
level.Info(logger).Log("msg", "Deduplication algorithm applied", "func", opts.DeduplicationFunc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every query will print this log.
pkg/query/querier.go
Outdated
@@ -210,18 +210,20 @@ func newQuerierWithOpts( | |||
|
|||
partialResponseStrategy := storepb.PartialResponseStrategy_ABORT | |||
if opts.GroupReplicaPartialResponseStrategy { | |||
level.Debug(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") | |||
level.Info(logger).Log("msg", "Enabled group-replica partial response strategy in newQuerierInternal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every query will print this log.
Signed-off-by: Yi Jin <yi.jin@databricks.com>
7c98625
to
ab31d53
Compare
This PR tries to fix when a reusable counter metrics resets:
Added a number of unit tested and make sure they pass when doing counter functions.
Also unit tested if not doing counter functions the original time series is returned
Pending integration tests, sent for early review and feedbacks
Integration tests work as expected:
Changes
Verification