Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/bucket: make getFor() work with interleaved resolutions #1146

Merged
merged 25 commits into from
May 27, 2019

Conversation

GiedriusS
Copy link
Member

@GiedriusS GiedriusS commented May 15, 2019

Add interleaved resolutions test for getFor(). Ordinary users could run into such a case when, for example, Thanos Compact would downsample metrics but also the metrics with the better resolution would still be available. Fix this by making getFor() look for blocks "in the middle" as well.

Fix the getDownsamplingParam() interface (and rename it to getDownsamplingParamMillis()) so that it would return a value which will be ready to use by the querier code.

Add unit tests for all of that.

Verification: go test and Grafana dashboards actually started showing all of the data.

@GiedriusS GiedriusS changed the title store/bucket_test: add interleaved resolutions test for getFor() store/bucket: make getFor() work with interleaved resolutions May 15, 2019
Giedrius Statkevičius added 2 commits May 16, 2019 09:53
Without this, we get max resolutions like 1, 2, 3 which do not mean
anything to getFor(). With this, we get proper data from Thanos Store.
@GiedriusS GiedriusS changed the title store/bucket: make getFor() work with interleaved resolutions store/bucket: make getFor() work with interleaved resolutions, pass proper resolution from querier May 16, 2019
Giedrius Statkevičius added 3 commits May 16, 2019 11:18
Makes a querier via queryableCreator and checks if the
maxSourceResolution was passed properly.
Copy link
Contributor

@domgreen domgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks legit to me :D just nit on naming

pkg/store/bucket.go Outdated Show resolved Hide resolved
Makes it clearer that it's just a temporary variable for the loop.
Copy link
Contributor

@domgreen domgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tests seem good

Approving without @bwplotka :p

@GiedriusS
Copy link
Member Author

Since this touches a critical part of the code I think it would be best to wait for @bwplotka to review this (and, of course, anyone else who wants :P)

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GiedriusS I think this is our bug found & fixed!

Generally good, but let's revert the previous change #1154 and leave or better document the maxResolution tranformation?

I need to dive into this between thing, it's 1AM and want to ensure we don't miss anything further. For example not sure if divide and conquer recurrency here helps with readability or not.

@@ -55,7 +55,7 @@ type queryable struct {

// Querier returns a new storage querier against the underlying proxy store API.
func (q *queryable) Querier(ctx context.Context, mint, maxt int64) (storage.Querier, error) {
return newQuerier(ctx, q.logger, mint, maxt, q.replicaLabel, q.proxy, q.deduplicate, int64(q.maxSourceResolution/time.Millisecond), q.partialResponse, q.warningReporter), nil
return newQuerier(ctx, q.logger, mint, maxt, q.replicaLabel, q.proxy, q.deduplicate, int64(q.maxSourceResolution), q.partialResponse, q.warningReporter), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite confusing - hard to tell for me if this makes sense (:

This has the same unit as mint, maxt int64 right? Maybe making the underlying variable maxSourceResolutionMiliseonconds or keeping it time.Duration as long as possible makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.. am I right that is PR was actually wrong? #1144 And we moved to miliseconds there but we are doing exactly the same here and we missed it? 🤔

If yes, let's remove https://github.com/improbable-eng/thanos/blob/2c4f64b1b96907295a7f8e99d8fd64697f0eb12a/pkg/query/api/v1.go#L227 I think it's wrong. All is in time.Duration so moving to miliseconds there does not make sense. It's here that we transition from duration to integer having arbitrary unit (miliseconds) so I think this should stay and last PR should be reverted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yea, master is broken now in terms of downsampling if I am not wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert PR:#1154

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should move the whole function to return int64 and in a form appropriate to the caller since as we can see it is hard to make a mental connection between such distant places in the code? Because that's what parsing should do IMHO - we shouldn't do more parsing down the road.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yea, master is broken now in terms of downsampling if I am not wrong?

I think that's a bit of an overstatement if it has never worked properly 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but now it does not work even more, so there is regression (:

Anyway, I just want to avoid confusion, sorry for being quite strict here ):

Now once we had no regression we can actually make that work and nice. Yes, I am fine with:

  • sticking to maxResolutionMilisec int64 name everywhere.
  • parseResolution returning int64 straight away.

Thoughts? (:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed that's what I've done. Hopefully it is much more clearer now about what kind of units are being used and that no one in the future will run into this problem like me when I was trying to hunt down one problem but accidentally got caught up in this.

pkg/store/bucket.go Outdated Show resolved Hide resolved
@@ -1030,6 +1032,19 @@ func (s *bucketBlockSet) getFor(mint, maxt, minResolution int64) (bs []*bucketBl
if len(bs) == 0 {
return s.getFor(mint, maxt, s.resolutions[i])
}

until := len(bs) - 1
for j := 0; j < until; j++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky, need to dive more but generally makes sense. Explaining what we are seeing now. Thanks for tests catching this as well!

All of this assumes that blocks within different resolutions are aligned ideally right? Can we comment this somehow?

Also be aware of this: #1104 Hope you algorthim would be nicely extendable I guess.
cc @mjd95

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They always should be aligned ideally or on the same timestamps because in other cases we would get a problem known as "overlapping blocks".

This change should be easy to integrate into that PR.

Copy link
Member

@bwplotka bwplotka May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, It makes total sense BUT to me left := s.getFor(mint, bs[0].meta.MinTime, s.resolutions[i]) and right are just between 0 and first min and lastMax and maxt, right?

Can we remove this left and right and treat them properly in this loop as between? I think it would simplify flow and algorithm.

Copy link
Member Author

@GiedriusS GiedriusS May 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea sounds good and I played around with it a bit but I doubt that this function can be made more succinct because we have three distinct cases of the append call: at the beginning of the array, between two elements, and at the end. Please correct me if I'm wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is it really too complex? I will propose something to your PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed: GiedriusS#1

IMO it's way simpler, what do you think? @GiedriusS

// getFor returns a time-ordered list of blocks that cover date between mint and maxt.
// Blocks with the biggest resolution possible but not bigger than the given max resolution are returned.
func (s *bucketBlockSet) getFor(mint, maxt, maxResolutionMillis int64) (bs []*bucketBlock) {
	if mint == maxt {
		return nil
	}

	s.mtx.RLock()
	defer s.mtx.RUnlock()

	// Find first matching resolution.
	i := 0
	for ; i < len(s.resolutions) && s.resolutions[i] > maxResolutionMillis; i++ {
	}

	// Fill the given interval with the blocks for the current resolution.
	// Our current resolution might not cover all data, so recursively fill the gaps with higher resolution blocks if there is any.
	start := mint
	for _, b := range s.blocks[i] {
		if b.meta.MaxTime <= mint {
			continue
		}
		if b.meta.MinTime >= maxt {
			break
		}

		if i+1 < len(s.resolutions) {
			bs = append(bs, s.getFor(start, b.meta.MinTime, s.resolutions[i+1])...)
		}
		bs = append(bs, b)
		start = b.meta.MaxTime
	}

	if i+1 < len(s.resolutions) {
		bs = append(bs, s.getFor(start, maxt, s.resolutions[i+1])...)
	}
	return bs
}

Giedrius Statkevičius added 3 commits May 17, 2019 11:49
* Convert parseDownsamplingParam() into parseDownsamplingParamMillis()
which properly returns int64 for use in the querier code
* Add parseDownsamplingMillis() tests
@GiedriusS GiedriusS changed the title store/bucket: make getFor() work with interleaved resolutions, pass proper resolution from querier store/bucket: make getFor() work with interleaved resolutions May 17, 2019
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 Some suggestions, but overall thanks for spotting this bug.

@@ -223,7 +223,7 @@ func (api *API) parseDownsamplingParam(r *http.Request, step time.Duration) (max
return 0, &ApiError{errorBadData, errors.Errorf("negative '%s' is not accepted. Try a positive integer", maxSourceResolutionParam)}
}

return maxSourceResolution, nil
return int64(maxSourceResolution / time.Millisecond), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

mint, maxt int64
window int64
}
input := []resBlock{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about joinging those together in form of table test? essentiall add those as test cases to TestBucketBlockSet_addGet? might be more consistent and easier to read just "cases" as test itself feels similar TBH?

Copy link
Member Author

@GiedriusS GiedriusS May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ones are a bit special because we are doing a property-based test instead of an ad-hoc one. I have removed the other test cases that I've added before and only left this one in - this already covers the "interleaved resolutions" case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

pkg/store/bucket.go Outdated Show resolved Hide resolved
@@ -1030,6 +1032,19 @@ func (s *bucketBlockSet) getFor(mint, maxt, minResolution int64) (bs []*bucketBl
if len(bs) == 0 {
return s.getFor(mint, maxt, s.resolutions[i])
}

until := len(bs) - 1
for j := 0; j < until; j++ {
Copy link
Member

@bwplotka bwplotka May 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, It makes total sense BUT to me left := s.getFor(mint, bs[0].meta.MinTime, s.resolutions[i]) and right are just between 0 and first min and lastMax and maxt, right?

Can we remove this left and right and treat them properly in this loop as between? I think it would simplify flow and algorithm.

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for tests, but proposed bit simpler algorithm for implementation, WDYT: GiedriusS#1 @GiedriusS ?

Simplified goFor implementation.
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for finding and fixing this bug, was so hard to find out this (: The property testing is something awesome to talk about in some Golang meetup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants