-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/bucket: make getFor() work with interleaved resolutions #1146
Conversation
Without this, we get max resolutions like 1, 2, 3 which do not mean anything to getFor(). With this, we get proper data from Thanos Store.
Makes a querier via queryableCreator and checks if the maxSourceResolution was passed properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks legit to me :D just nit on naming
Makes it clearer that it's just a temporary variable for the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tests seem good
Approving without @bwplotka :p
Since this touches a critical part of the code I think it would be best to wait for @bwplotka to review this (and, of course, anyone else who wants :P) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GiedriusS I think this is our bug found & fixed!
Generally good, but let's revert the previous change #1154 and leave or better document the maxResolution tranformation?
I need to dive into this between
thing, it's 1AM and want to ensure we don't miss anything further. For example not sure if divide and conquer recurrency here helps with readability or not.
pkg/query/querier.go
Outdated
@@ -55,7 +55,7 @@ type queryable struct { | |||
|
|||
// Querier returns a new storage querier against the underlying proxy store API. | |||
func (q *queryable) Querier(ctx context.Context, mint, maxt int64) (storage.Querier, error) { | |||
return newQuerier(ctx, q.logger, mint, maxt, q.replicaLabel, q.proxy, q.deduplicate, int64(q.maxSourceResolution/time.Millisecond), q.partialResponse, q.warningReporter), nil | |||
return newQuerier(ctx, q.logger, mint, maxt, q.replicaLabel, q.proxy, q.deduplicate, int64(q.maxSourceResolution), q.partialResponse, q.warningReporter), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite confusing - hard to tell for me if this makes sense (:
This has the same unit as mint, maxt int64 right? Maybe making the underlying variable maxSourceResolutionMiliseonconds
or keeping it time.Duration as long as possible makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm.. am I right that is PR was actually wrong? #1144 And we moved to miliseconds there but we are doing exactly the same here and we missed it? 🤔
If yes, let's remove https://github.com/improbable-eng/thanos/blob/2c4f64b1b96907295a7f8e99d8fd64697f0eb12a/pkg/query/api/v1.go#L227 I think it's wrong. All is in time.Duration so moving to miliseconds there does not make sense. It's here that we transition from duration to integer having arbitrary unit (miliseconds) so I think this should stay and last PR should be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yea, master is broken now in terms of downsampling if I am not wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert PR:#1154
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should move the whole function to return int64
and in a form appropriate to the caller since as we can see it is hard to make a mental connection between such distant places in the code? Because that's what parsing should do IMHO - we shouldn't do more parsing down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yea, master is broken now in terms of downsampling if I am not wrong?
I think that's a bit of an overstatement if it has never worked properly 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but now it does not work even more, so there is regression (:
Anyway, I just want to avoid confusion, sorry for being quite strict here ):
Now once we had no regression we can actually make that work and nice. Yes, I am fine with:
- sticking to
maxResolutionMilisec int64
name everywhere. - parseResolution returning int64 straight away.
Thoughts? (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed that's what I've done. Hopefully it is much more clearer now about what kind of units are being used and that no one in the future will run into this problem like me when I was trying to hunt down one problem but accidentally got caught up in this.
pkg/store/bucket.go
Outdated
@@ -1030,6 +1032,19 @@ func (s *bucketBlockSet) getFor(mint, maxt, minResolution int64) (bs []*bucketBl | |||
if len(bs) == 0 { | |||
return s.getFor(mint, maxt, s.resolutions[i]) | |||
} | |||
|
|||
until := len(bs) - 1 | |||
for j := 0; j < until; j++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky, need to dive more but generally makes sense. Explaining what we are seeing now. Thanks for tests catching this as well!
All of this assumes that blocks within different resolutions are aligned ideally right? Can we comment this somehow?
Also be aware of this: #1104 Hope you algorthim would be nicely extendable I guess.
cc @mjd95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They always should be aligned ideally or on the same timestamps because in other cases we would get a problem known as "overlapping blocks".
This change should be easy to integrate into that PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, It makes total sense BUT to me left := s.getFor(mint, bs[0].meta.MinTime, s.resolutions[i])
and right
are just between
0 and first min and lastMax and maxt, right?
Can we remove this left and right and treat them properly in this loop as between
? I think it would simplify flow and algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea sounds good and I played around with it a bit but I doubt that this function can be made more succinct because we have three distinct cases of the append
call: at the beginning of the array, between two elements, and at the end. Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, is it really too complex? I will propose something to your PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed: GiedriusS#1
IMO it's way simpler, what do you think? @GiedriusS
// getFor returns a time-ordered list of blocks that cover date between mint and maxt.
// Blocks with the biggest resolution possible but not bigger than the given max resolution are returned.
func (s *bucketBlockSet) getFor(mint, maxt, maxResolutionMillis int64) (bs []*bucketBlock) {
if mint == maxt {
return nil
}
s.mtx.RLock()
defer s.mtx.RUnlock()
// Find first matching resolution.
i := 0
for ; i < len(s.resolutions) && s.resolutions[i] > maxResolutionMillis; i++ {
}
// Fill the given interval with the blocks for the current resolution.
// Our current resolution might not cover all data, so recursively fill the gaps with higher resolution blocks if there is any.
start := mint
for _, b := range s.blocks[i] {
if b.meta.MaxTime <= mint {
continue
}
if b.meta.MinTime >= maxt {
break
}
if i+1 < len(s.resolutions) {
bs = append(bs, s.getFor(start, b.meta.MinTime, s.resolutions[i+1])...)
}
bs = append(bs, b)
start = b.meta.MaxTime
}
if i+1 < len(s.resolutions) {
bs = append(bs, s.getFor(start, maxt, s.resolutions[i+1])...)
}
return bs
}
* Convert parseDownsamplingParam() into parseDownsamplingParamMillis() which properly returns int64 for use in the querier code * Add parseDownsamplingMillis() tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍 Some suggestions, but overall thanks for spotting this bug.
@@ -223,7 +223,7 @@ func (api *API) parseDownsamplingParam(r *http.Request, step time.Duration) (max | |||
return 0, &ApiError{errorBadData, errors.Errorf("negative '%s' is not accepted. Try a positive integer", maxSourceResolutionParam)} | |||
} | |||
|
|||
return maxSourceResolution, nil | |||
return int64(maxSourceResolution / time.Millisecond), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
mint, maxt int64 | ||
window int64 | ||
} | ||
input := []resBlock{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about joinging those together in form of table test? essentiall add those as test cases to TestBucketBlockSet_addGet
? might be more consistent and easier to read just "cases" as test itself feels similar TBH?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These ones are a bit special because we are doing a property-based test instead of an ad-hoc one. I have removed the other test cases that I've added before and only left this one in - this already covers the "interleaved resolutions" case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
pkg/store/bucket.go
Outdated
@@ -1030,6 +1032,19 @@ func (s *bucketBlockSet) getFor(mint, maxt, minResolution int64) (bs []*bucketBl | |||
if len(bs) == 0 { | |||
return s.getFor(mint, maxt, s.resolutions[i]) | |||
} | |||
|
|||
until := len(bs) - 1 | |||
for j := 0; j < until; j++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, It makes total sense BUT to me left := s.getFor(mint, bs[0].meta.MinTime, s.resolutions[i])
and right
are just between
0 and first min and lastMax and maxt, right?
Can we remove this left and right and treat them properly in this loop as between
? I think it would simplify flow and algorithm.
This doesn't really matter as the tests show.
Only leave the property tests in place since they catch all of the errors.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for tests, but proposed bit simpler algorithm for implementation, WDYT: GiedriusS#1 @GiedriusS ?
Simplified goFor implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for finding and fixing this bug, was so hard to find out this (: The property testing is something awesome to talk about in some Golang meetup!
Add interleaved resolutions test for getFor(). Ordinary users could run into such a case when, for example, Thanos Compact would downsample metrics but also the metrics with the better resolution would still be available. Fix this by making getFor() look for blocks "in the middle" as well.
Fix the getDownsamplingParam() interface (and rename it to getDownsamplingParamMillis()) so that it would return a value which will be ready to use by the querier code.
Add unit tests for all of that.
Verification:
go test
and Grafana dashboards actually started showing all of the data.