-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sidecar/compact/store/receiver - Add the prefix option to buckets #5337
Conversation
pkg/objstore/prefixed_bucket.go
Outdated
} | ||
|
||
func NewPrefixedBucket(bkt Bucket, prefix string) Bucket { | ||
pbkt := &PrefixedBucket{bkt: bkt, prefix: strings.Trim(prefix, DirDelim)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of DirDelim
here is something that tricked me.
This is because we're stating that directories are going to be separated by /
. I believe this would make filesystem bucket type to not work on Windows OS. But then I saw that metrics bucket uses slashes, and as nobody complained, I believe Windows users doesn't run Thanos very often.
What is your opinion on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think your observation is correct however I think we could still fix this. Could you open up an issue for this? I think we could mark it as a "good first issue" because it should be relatively straightforward? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: #5380
pkg/objstore/client/factory.go
Outdated
return objstore.NewTracingBucket(objstore.BucketWithMetrics(bucket.Name(), bucket, reg)), nil | ||
|
||
var prefixedBucket objstore.Bucket | ||
if validPrefix(bucketConf.Config.Prefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this seems a micro perf improvement, I guess it has the advantage of reducing risks of bugs for those who don't want to use prefixes (while also increasing perf for those). WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If prefix is configured but it is invalid, should we catch that error case and print the error log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked your idea :)
But I don't have an idea of how I would differentiate between a not filled key and a non-existent key. would love to have more ideas on this one.
Aaaaand I just notice we broke some e2e tests due to our new config field. We will work on that :) Feedback is welcome anyway! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I want to see prefix support in the objstore library so it is nice to see someone working on it again.
Can you add some E2E tests for this?
Btw I think we have a separate repo for the objstore library now https://github.com/thanos-io/objstore. But Thanos code is still using the old lib. I am wondering if we need to make changes on that repo.
pkg/objstore/client/factory.go
Outdated
return objstore.NewTracingBucket(objstore.BucketWithMetrics(bucket.Name(), bucket, reg)), nil | ||
|
||
var prefixedBucket objstore.Bucket | ||
if validPrefix(bucketConf.Config.Prefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If prefix is configured but it is invalid, should we catch that error case and print the error log?
pkg/objstore/client/factory.go
Outdated
prefixedBucket = objstore.NewPrefixedBucket(bucket, bucketConf.Config.Prefix) | ||
level.Debug(logger).Log("msg", "using prefix on bucket access", "prefix", bucketConf.Config.Prefix) | ||
} else { | ||
prefixedBucket = bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this else block is unnecessary. What about just using the code below on line 91.
bucket = objstore.NewPrefixedBucket(bucket, bucketConf.Config.Prefix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that we wanted to not use the prefixedBucket decorator in cases it is not needed. That would reduce the blast radius in case we introduce a bug. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @yeya24 here, it reads nicer when all of the logic is hidden inside of the prefixed bucket. Perhaps we could move this check there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice idea @GiedriusS, we moved this validation to NewPrefixedBucket
We've added a bunch of e2e tests. On all current tested flows to be more precise. Feedback is more than welcome :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your awesome work! 🍻 Haven't looked through the whole code yet but added some comments from my side.
docs/storage.md
Outdated
@@ -102,6 +102,8 @@ At a minimum, you will need to provide a value for the `bucket`, `endpoint`, `ac | |||
|
|||
However if you set `aws_sdk_auth: true` Thanos will use the default authentication methods of the AWS SDK for go based on [known environment variables](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html) (`AWS_PROFILE`, `AWS_WEB_IDENTITY_TOKEN_FILE` ... etc) and known AWS config files (~/.aws/config). If you turn this on, then the `bucket` and `endpoint` are the required config keys. | |||
|
|||
The field `prefix` can be used to transparently use prefixes in your S3 bucket. That way, you may point distinct Thanos instances to the same bucket, while avoiding one instance messing with the data from another instance. This allows multiple Thanos deployments to use the same bucket without having to list all the files in it when you point Thanos store to the bucket (even when you are using tenants feature). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a sentence that in practice you won't need this because ULIDs are really close to random (github.com/oklog/ulid). Or, in other words, the use case here I think is to separate blocks coming from different sources into paths with different prefixes to make it easier to understand what's going on i.e. you don't have to use Thanos tooling to know from where which blocks came.
Perhaps we could remove the "messing" part and mention something like this? ☝️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
I don't know if I understood all the advantages you describe, but the part I did, makes sense.
I replaced the second part with the explanation you gave here. But, please, feel free to ask for changes in it, as you certainly answer more questions related to Thanos than me, so you probably know more how people use it :)
pkg/objstore/prefixed_bucket.go
Outdated
} | ||
|
||
func NewPrefixedBucket(bkt Bucket, prefix string) Bucket { | ||
pbkt := &PrefixedBucket{bkt: bkt, prefix: strings.Trim(prefix, DirDelim)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think your observation is correct however I think we could still fix this. Could you open up an issue for this? I think we could mark it as a "good first issue" because it should be relatively straightforward? 🤔
pkg/objstore/client/factory.go
Outdated
prefixedBucket = objstore.NewPrefixedBucket(bucket, bucketConf.Config.Prefix) | ||
level.Debug(logger).Log("msg", "using prefix on bucket access", "prefix", bucketConf.Config.Prefix) | ||
} else { | ||
prefixedBucket = bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @yeya24 here, it reads nicer when all of the logic is hidden inside of the prefixed bucket. Perhaps we could move this check there?
pkg/objstore/prefixed_bucket.go
Outdated
|
||
func conditionalPrefix(prefix, name string) string { | ||
if len(name) > 0 && len(prefix) > 0 { | ||
return prefix + DirDelim + name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's call withPrefix
here? actually I think we only need to check for len(name) > 0
here because len(prefix) > 0
is already given our initial checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Done!
pkg/objstore/prefixed_bucket.go
Outdated
// object name including the prefix of the inspected directory. | ||
// Entries are passed to function in sorted order. | ||
func (p *PrefixedBucket) Iter(ctx context.Context, dir string, f func(string) error, options ...IterOption) error { | ||
if len(p.prefix) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we now check whether the prefix is valid at the beginning (includes a check that it is not empty) so this is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pkg/objstore/prefixed_bucket_test.go
Outdated
) | ||
|
||
func TestPrefixedBucket_Acceptance(t *testing.T) { | ||
prefix := "/someprefix/anotherprefix/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I believe that you will clean up the code a bit either way, could we put these strings into a string slice and iterate over the elements, and call those functions? I think it would make this way neater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few suggestions 👍
pkg/objstore/client/factory.go
Outdated
return objstore.NewTracingBucket(objstore.BucketWithMetrics(bucket.Name(), objstore.NewPrefixedBucket(bucket, prefix), reg)), nil | ||
} | ||
|
||
func prefixFromConfig(bucketConf *BucketConfig) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am questioning myself on where this function should live.
We left it here because with that we wouldn't expose the config
abstraction to the prefixed bucket. Ont he other hand, this exposes the prefix
parameter to the factory.
Any opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this #5337 (comment) I think we could remove this function in general 😄 what do you think?
One more thing I don't know how to do: how can I update the docs with the |
Yeah, we generate those docs automatically from the structs. Since this option is "hidden" now in the types, the check cannot pass. What about adding a new field to the BucketConfig type? Then we wouldn't have to have hacks like this. |
@@ -41,6 +41,7 @@ const ( | |||
type BucketConfig struct { | |||
Type ObjProvider `yaml:"type"` | |||
Config interface{} `yaml:"config"` | |||
Prefix string `yaml:"prefix" default:""` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we generate those docs automatically from the structs. Since this option is "hidden" now in the types, the check cannot pass. What about adding a new field to the BucketConfig type? Then we wouldn't have to have hacks like this.
Hey @GiedriusS , I followed your suggestion here :) The only thing I don't love about bringing the prefix to the BucketConfig is that all the bucket options are inside Config 🤔
But with this change, we also keep the PrefixedBucket layer with its validations and don't take the risk to break something by changing all the Config types that BucketConfig handles, so I think it's the best alternative!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 yeah, I think this is the best solution out of the alternatives
docs/components/tools.md
Outdated
@@ -811,7 +814,7 @@ Flags: | |||
--rewrite.to-relabel-config-file=<file-path> | |||
Path to YAML file that contains relabel configs | |||
that will be applied to blocks | |||
--tmp.dir="/tmp/thanos-rewrite" | |||
--tmp.dir="/var/folders/pr/h0912y1s3zj507j30fzj87gh0000gn/T/thanos-rewrite" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this random directory? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure actually, this change was generated when I ran make check-docs
, but the docs check kept breaking. I removed this change and the docs check passed, I didn't understand why since it also runs make check-docs
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This thing happens to me. I feel it is an issue specific to MacOS. The temp dir is generated incorrectly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 let's see if there are any other comments/suggestions. 🍻 thanks for your work
CHANGELOG.md
Outdated
@@ -10,6 +10,8 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re | |||
|
|||
## Unreleased | |||
|
|||
- [#5337](https://github.com/thanos-io/thanos/pull/5337) Thanos Object Store: Add the `prefix` option to buckets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the item under Added
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks @yeya24
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
The idea is that if it works, we can add for all other providers. Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: jademcosta <jademcosta@gmail.com>
Signed-off-by: jademcosta <jademcosta@gmail.com>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
We already check if the prefix if not empty when creating the bucket. Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: jademcosta <jade.costa@nubank.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br>
Thanks a lot @GiedriusS ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the contribution!
* Remove debug line (#5245) Signed-off-by: Matej Gera <matejgera@gmail.com> * e2e: fix compact test's flakiness (#5246) Fix the compact test's by running this sub-test sequentially. The further steps depend on this test's results so it's wrong to run it as a sub-test. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * bump prometheus version to v2.33.5 (#5256) Signed-off-by: Ben Ye <ben.ye@bytedance.com> * info: Return store info only when the service is ready (#5255) * return store info only when the service is ready Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix test Signed-off-by: Ben Ye <ben.ye@bytedance.com> * Merge release 0.25 to main (#5210) * Cut 0.25.0-rc.0 (#5184) Signed-off-by: Matej Gera <matejgera@gmail.com> * Cut v0.25.0 (#5209) Signed-off-by: Matej Gera <matejgera@gmail.com> * Create v0.25.1 built with Go 1.17.8 (#5226) The binaries published with this release are built with Go1.17.8 to avoid [CVE-2022-24921](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-24921). Signed-off-by: Matthias Loibl <mail@matthiasloibl.com> * *: Cut 0.25.2 rc.0 (#5247) * fix: add null check to exemplar data (#5202) Signed-off-by: Thomas Mota <tmm@danskecommodities.com> * Ruler: Fix WAL directory in stateless mode (#5242) Signed-off-by: Matej Gera <matejgera@gmail.com> * Update CHANGELOG, VERSION Signed-off-by: Matej Gera <matejgera@gmail.com> * Updates busybox SHA (#5234) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> Co-authored-by: Tomás Mota <tomasrebelomota@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * Cut v0.25.2 Signed-off-by: Matej Gera <matejgera@gmail.com> Update tutorials Signed-off-by: Matej Gera <matejgera@gmail.com> Co-authored-by: Matthias Loibl <mail@matthiasloibl.com> Co-authored-by: Tomás Mota <tomasrebelomota@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * Implement GRPC query API (#5250) With the current GRPC APIs, layering Thanos Queriers results in the root querier getting all of the samples and executing the query in memory. As a result, the intermediary Queriers do not do any intensive work and merely transport samples from the Stores to the root Querier. When data is perfectly sharded, users can implement a pattern where the root Querier instructs the intermediary ones to execute the queries from their stores and return back results. The results can then be concatenated by the root querier and returned to the user. In order to support this use case, this commit implements a GRPC API in the Querier which is analogous to the HTTP Query API exposed by Prometheus. Signed-off-by: fpetkovski <filip.petkovsky@gmail.com> * Change error cleanup in `objstore.DownloadDir` to delete files not destination dir (#5229) * Change error cleanup in objstore.DownloadDir to delete files not directories Dst is always a directory. If any file after the first fails to download, the cleanup will fail because the destination already contains at least one file. This commit changes the cleanup logic to clean up successfully downloaded files one by one instead of attempting to clean up the whole dst directory. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add cleanup of root dst directory. Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Add unit test for cleanup of DownloadDir Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Fix linter Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update index.html (#5264) * Add SumUp logo to adopters (#5267) Signed-off-by: Guilherme Souza <101073+guilhermef@users.noreply.github.com> * receive: Added tenant ID error handling of remote write requests. (#5269) Plus better explanation. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Add TIXnGO logo to adopters (#5273) Signed-off-by: Pierre Hanselmann <pierre.hanselmann@gmail.com> * Fix miekgdns resolver to work with CNAME records too (#5271) * Fix miekgdns resolver to work with CNAME records too Signed-off-by: Marco Pracucci <marco@pracucci.com> * Remove unused context Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/discovery/dns/miekgdns/resolver.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Lucas Servén Marín <lserven@gmail.com> Co-authored-by: Lucas Servén Marín <lserven@gmail.com> * UI: Remove old ui (#5145) * remove old ui Signed-off-by: Augustin Husson <husson.augustin@gmail.com> * add changelog Signed-off-by: Augustin Husson <husson.augustin@gmail.com> * update assets Signed-off-by: Augustin Husson <husson.augustin@gmail.com> * Updates busybox SHA (#5283) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * build(deps): bump moment from 2.29.1 to 2.29.2 in /pkg/ui/react-app (#5274) Bumps [moment](https://github.com/moment/moment) from 2.29.1 to 2.29.2. - [Release notes](https://github.com/moment/moment/releases) - [Changelog](https://github.com/moment/moment/blob/develop/CHANGELOG.md) - [Commits](https://github.com/moment/moment/compare/2.29.1...2.29.2) --- updated-dependencies: - dependency-name: moment dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: fix URLs preventing generation and unblock CI (#5285) * docs: fix Ian Billett's GitHub handle I noticed that CI was failing [0] for PR https://github.com/thanos-io/thanos/pull/5284 because Ian had changed his GitHub handle from @ianbillett to @bill3tt. This commit fixes this. [0] https://github.com/thanos-io/thanos/runs/6050355497?check_suite_focus=true#step:5:135 Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * docs: fix broken links to GitHub docs Currently, documentation generation is failing because mdox can't fetch some GitHub documentation pages since the URLs for the help content has changed. This commit updates the links to use the correct URLs. Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * MAINTAINERS.md: regenerate Signed-off-by: Lucas Servén Marín <lserven@gmail.com> * UI: Update vulnerable dependencies (#5233) * refactor global window typings Use declaration merging for better window types Signed-off-by: Gabriel Bernal <gbernal@redhat.com> * bump vulnerable react-scripts version Signed-off-by: Gabriel Bernal <gbernal@redhat.com> * Add Vestiaire Collective as adopter (#5289) Signed-off-by: claude ebaneck <claudeforlife@gmail.com> Co-authored-by: claude ebaneck <claude.ebaneck@vestiairecollective.com> * Implement Query API discovery (#5291) A recent commit (#5250) added a GRPC API to Thanos Query which allows executing PromQL over GRPC. This API is currently not discoverable through endpointsets which makes it hard for other Thanos components to use it. This commit extends endpointsets with a GetQueryAPIClients method which returns Query API clients to all components which support this API. Signed-off-by: fpetkovski <filip.petkovsky@gmail.com> * Added support for ppc64le (#5290) * Added support for ppc64le Signed-off-by: Marvin Giessing <marvin.giessing@gmail.com> * Updated Changelog Signed-off-by: Marvin Giessing <marvin.giessing@gmail.com> * Updated promu & protoc Signed-off-by: Marvin Giessing <marvin.giessing@gmail.com> * Updated Makefile comment Signed-off-by: Marvin Giessing <marvin.giessing@gmail.com> * Added target API tests (+goleak). (#5260) Attempted to repro https://github.com/thanos-io/thanos/issues/5257, but no good luck. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Revert "Added target API tests (+goleak). (#5260)" (#5297) This reverts commit 955ea6dcae2529ad5b5b97a6a11150a5906d775a. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Use correct filesystem/network path separators when uploading blocks (#5281) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> * query-frontend: Don't cache request with dedup=false (#5300) * query-frontend: Added repro for dedup affecting precision of querying. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * QFE does not cache request with dedup=false. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Move info about queries that skip cache logic to docs Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update CHANGELOG Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Run docs formatter Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix e2e tests where caching logic is desired Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * mixin: Fix typo in ThanosCompactHalted alert (#5306) Signed-off-by: Pedro Araujo <pedro.araujo@saltpay.co> * Avoid starting goroutines for memcached batch requests before gate (#5301) Use the doWithBatch function to avoid starting goroutines to fetch batched results from memcached before they are allowed to run via the concurrency Gate. This avoids starting many goroutines which cannot make any progress due to a concurrency limit. Fixes #4967 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Cut readme for 0.26 (#5311) Co-authored-by: Wiard van Rij <wvanrij@roku.com> * Reviewed and updated Changelog for 0.26-rc0 (#5313) Signed-off-by: Wiard van Rij <wvanrij@roku.com> Co-authored-by: Wiard van Rij <wvanrij@roku.com> * Cut 0.26.0-rc.0 set version correctly (#5317) Signed-off-by: Wiard van Rij <wvanrij@roku.com> Co-authored-by: Wiard van Rij <wvanrij@roku.com> * docs: Fix broken link to introduction blog (#5319) Signed-off-by: jmjf <jamee.mikell@gmail.com> * Ensure memcached batched requests handle context cancelation (#5314) * Ensure memcached batched requests handle context cancellation Ensure that when the context used for Memcached GetMulti is cancelled, getMultiBatched does not hang waiting for results that will never be generated (since the batched requests will not run if the context has been cancelled). Fixes an issue introduced in #5301 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Lint fixes Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes: run batches unconditionally Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * stalebot: add generic label to avoid stalebot (#5322) Add a generic label which tells stalebot not to close issues marked with it. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Use proper replicalabels in GRPC Query API (#5308) The GRPC Query API uses only the replica labels coming from the RPC request and ignores the ones configured when starting the querier. This commit ensures that the API falls back on the preconfigured replica labels when they are not provided in the request. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * groupcache: reduce log severity (#5323) Sometimes certain operations can fail with some error(-s) being expected e.g. a deletion marker might or might not exist. Thus, these log lines could get triggered even though nothing bad is happening. Since the expected errors are known only at the very end, near the call site, and because `error`s are already logged in other places, and because these Fetch()/Store() functions are working in best-effort scenario, I propose reducing the severity of these log lines to `debug`. Fixes https://github.com/thanos-io/thanos/issues/5265. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Update release process (#5325) * update release process Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Add info about VERSION file Signed-off-by: Wiard van Rij <wvanrij@roku.com> * query-frontend: improve docs on requestes excluded from cache (#5326) Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * cut release 0.26.0 (#5330) Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Updates busybox SHA (#5336) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * receive: fix deadlock on interrupt in routerOnly mode (#5339) * fix receive router deadlock on interrupt Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com> * Update changelog Signed-off-by: François Gouteroux <francois.gouteroux@gmail.com> * docs: Updated information about our community call. (#5309) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * reloader: Force trigger reload when config rollbacked (#5324) * Add Cache metrics to groupcache (#5352) Add metrics about the hot and main caches[0]. * Number of bytes in each cache. * Number of items in each cache. * Counter of evictions from each cache. [0]: https://pkg.go.dev/github.com/vimeo/galaxycache#CacheStats Signed-off-by: SuperQ <superq@gmail.com> * e2e: Refactored service helpers to be consistent with new API. (#5348) * test: Added Alert compatibilty test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Tmp. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * e2e: Refactored service helpers for newest e2e version. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Removed alert combatibiltiy test for now. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed lint. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed lint2. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed nginx service. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixes. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Refactored ruler. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * fixes. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fixed compactor. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Fix. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * What about now? Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * groupcache: fix handling of slashes (#5357) Use https://github.com/julienschmidt/httprouter#catch-all-parameters for the groupcache route otherwise slashes in the cache's key gets interpreted by the router and thus groupcache's function never gets invoked, and all clients get 404. Remove test regarding cache hit because now Thanos Store during test constantly generates cache hits due to 1s delay between block information refreshes. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Adds more info about the formatting part. (#5347) * Adds more info about the formatting part. Closes #5282 Signed-off-by: Wiard van Rij <wvanrij@roku.com> * adds extra newline Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Update promdoc to solve #5344 (#5345) Signed-off-by: Wiard van Rij <wvanrij@roku.com> * e2e: Refactored Receive Builder to be consistent with other helpers. (#5358) * e2e: Refactored Receive Builder to be consistent with other helpers. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Addressed comments. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Updates busybox SHA (#5365) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * e2e: Fixed exemplar support in receive helper. (#5372) Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Enforce memcached concurrency limit with unbatched requests (#5360) * Enforce memcached concurrency limit with unbatched requests This ensures that requests that are _not_ split into batches still count towards the concurrency limit that the client enforces. This fixes an issue introduced in #5301 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Lint fix Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * docs: fix link (#5379) I think I've found a replacement for the dead link. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * cache: do not copy data in groupcache (#5378) Add a unsafe codec which uses the given byte slices directly to avoid copying - we are doing ioutil.ReadAll() either way so there is no need to copy anything. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * fix ruler send empty alerts (#5377) Signed-off-by: Ben Ye <ben.ye@bytedance.com> * Add custom `errors` package with stack trace functionality (#5239) * feat: a simple stacktrace utility Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * feat: custom errors package with new, errorf, wrapping, unwrapping and stacktrace Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * chore: update existing errors import (small subset) Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * chore: update comments Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * add errors into skip-files linter config Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * intoduce UnwrapTillCause to suffice the limitation of Unwrap Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * Revert "chore: update existing errors import (small subset)" This reverts commit d27f0177fe6c8a357ba10e4ac8bfee87c8bf985c. Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * revert makefile && golangcilint file Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * apply PR feedbacks Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * stacktrace and errors test Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * fix typo Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * update stacktrace testing regex Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * add lint ignore for standard errors import inside errors pkg Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * [test files] add copyright headers Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * add no lint to avoid false misspell detection of keyword Tast Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * update stacktrace output test line number with regex pattern Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * return pc slice with reduced capacity Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * segregate formatted vs non formatted methods Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * update with only f functions Signed-off-by: Bisakh Mondal <bisakhmondal00@gmail.com> * Group memcached keys based on server when performing batch gets (#5356) * Group memcached keys based on server when performing batch gets Order and group keys during batch get operations based on the memcached server they will be sharded to. This reduces the number of connections that must be made within each batch of get operations. Fixes #5353 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review changes Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Fix error in testutil method added Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Code review: comments for selector interface Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * QueryFrontend: pre-compile regexp (#5383) * pre compile regexp Signed-off-by: Jin Dong <djdongjin95@gmail.com> * rename oppattern to labelvaluespattern Signed-off-by: Jin Dong <djdongjin95@gmail.com> * [FEAT] adding thanos consul blogpost (#5387) Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com> * Fix empty $externalLabels when templating labels in rule. (#5394) Signed-off-by: Rostislav Benes <r.dee.b.b@gmail.com> Co-authored-by: Rostislav Benes <r.dee.b.b@gmail.com> * support series relabeling on Thanos receiver (#5391) * support series relabeling on Thanos receiver Signed-off-by: Ben Ye <ben.ye@bytedance.com> * add changelog Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix lint Signed-off-by: Ben Ye <ben.ye@bytedance.com> * update lint Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix e2e test Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix relabel config pass Signed-off-by: Ben Ye <ben.ye@bytedance.com> * cleanup white space Signed-off-by: Ben Ye <ben.ye@bytedance.com> * address review comments Signed-off-by: Ben Ye <ben.ye@bytedance.com> * address comments Signed-off-by: Ben Ye <ben.ye@bytedance.com> * update comment Signed-off-by: Ben Ye <ben.ye@bytedance.com> * Expose GatherFileStats. (#5400) Signed-off-by: Peter Štibraný <pstibrany@gmail.com> * Rule: Error out earlier when building alertmanager config (#5405) * Error out earlier when building alertmanager config Signed-off-by: Jéssica Lins <jessicaalins@gmail.com> * Add test case for empty host Signed-off-by: Jéssica Lins <jlins@redhat.com> * [5130] [.*:] Upgrade Minio used for local development and e2e tests (#5392) * add updated bingo .gitignore Signed-off-by: B0go <victorbogo@icloud.com> * update bingo minio version to commit 91130e884b5df59d66a45a0aad4f48db88f5ca63 Signed-off-by: B0go <victorbogo@icloud.com> * trigger CI Signed-off-by: B0go <victorbogo@icloud.com> * Submit a proposal for vertical query sharding (#5350) Signed-off-by: fpetkovski <filip.petkovsky@gmail.com> * query: Close() after using query (#5410) * query: Close() after using query This should reduce memory usage because Close() returns points back to a sync.Pool. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * CHANGELOG: add item Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * query: call Close() in gRPC API too Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * avoided potential panic due to divide by 0 (#5412) Signed-off-by: Aditi Ahuja <ahuja.aditi@gmail.com> * sidecar/compact/store/receiver - Add the prefix option to buckets (#5337) * Create prefixed bucket Signed-off-by: jademcosta <jade.costa@nubank.com.br> * started PrefixedBucket tests Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * finish objstore tests Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * Simplify string removal logic Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Test more prefix cases on PrefixedBucket Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Only use a prefixedbucket if we have a valid prefix Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Add single unit test for prefixedBucket prefix Signed-off-by: jademcosta <jade.costa@nubank.com.br> * test other prefixes on UsesPrefixTest Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * add remaining methods to UsesPrefixTest Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * add prefix to docs examples Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * Simplify Iter method Signed-off-by: jademcosta <jade.costa@nubank.com.br> * add prefix explanation to S3 docs Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * Conclusion of prefix sentence on docs Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Use DirDelim instead of magic string Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Add log when using prefixed bucket Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Remove "@" from test string to make them simpler Signed-off-by: jademcosta <jade.costa@nubank.com.br> * fix BucketConfig Config type - back to interface Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * add changelog Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * add missing checks in UsesPrefixTest Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * fix linter and test errors Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * Add license to new files Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Remove autogenerated docs Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Remove duplicated transformation of string->[]byte Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Add prefixed bucket on all e2e tests for S3 The idea is that if it works, we can add for all other providers. Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Add e2e tests using prefixed bucket to all providers Signed-off-by: jademcosta <jade.costa@nubank.com.br> * refactor: move validPrefix to prefixed_bucket logic Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * Enhance the documentation about prefix. Signed-off-by: jademcosta <jademcosta@gmail.com> * Fix format Signed-off-by: jademcosta <jademcosta@gmail.com> * Add prefix entry on bucket config example Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Removing redundancies on prefix checks and tests We already check if the prefix if not empty when creating the bucket. Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Remove redundant YAML unmarshal Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Remove unused parameter Signed-off-by: jademcosta <jade.costa@nubank.com.br> * Remove docs that should be auto-geneated Signed-off-by: jademcosta <jade.costa@nubank.com.br> * refactor: move prefix to config root level Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * add auto generated docs Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * fix changelog Signed-off-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> Co-authored-by: Maria Eduarda Duarte <dudammduarte@yahoo.com.br> * Ruler: Change default evaluation interval to 1m (#5417) * Change default eval interval to 1m Signed-off-by: Matej Gera <matejgera@gmail.com> * Update CHANGELOG Signed-off-by: Matej Gera <matejgera@gmail.com> * Updates busybox SHA (#5423) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * receive: Added Ketamo Consistent hashing (#5408) * Add support for consistent hashing in receivers This commit adds support for distributing series in Receivers using consistent hashing based on the libketama algorithm. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Use require package for test assertions Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Rename algorithm from consistent to ketama Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * S3: Add config option to enforce the minio DNS lookup (#5409) * Add config option to enforce the minio DNS lookup Signed-off-by: Jakob Hahn <jakob.hahn@hetzner.com> * Useenums instead of boolean for bucket_lookup_type Signed-off-by: Jakob Hahn <jakob.hahn@hetzner.com> * Expose tsdb status in receiver (#5402) * Expose tsdb status in receiver This commit implements the api/v1/status/tsdb API in the Receiver. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add docs and todo Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Fix tests Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Receive: option to extract tenant from client certificate (#5153) * added option to extract tenant from client certificate Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> * added suggestions from PR Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> * removed else cases Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> * corrected location of certificate field check Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> * fixed issue with err definition Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> * updated docs Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> * corrected comment Signed-off-by: Magnus Kaiser <magnus.kaiser@gec.io> Co-authored-by: Magnus Kaiser <magnus.kaiser@gec.io> * Improve ketama hashring replication (#5427) With the Ketama hashring, replication is currently handled by choosing subsequent nodes in the list of endpoints. This can lead to existing nodes getting more series when the hashring is scaled. This commit changes replication to choose subsequent nodes from the hashring which should not create new series in old nodes when the hashring is scaled. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Cut readme for 0.27 (#5429) Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Added alert compliance test for Thanos (#5315) * test: Added Alert compatibilty test. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Tmp. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * update. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * e2e: Refactored service helpers for newest e2e version. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Removed alert combatibiltiy test for now. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * e2e: Added test for compatibility. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Added Querier /alerts API. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * e2e:Added replica labels. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Option to remove replica-label. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * skip. Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> * Use stateful ruler and default resend delay Signed-off-by: Matej Gera <matejgera@gmail.com> * Update docs Signed-off-by: Matej Gera <matejgera@gmail.com> Co-authored-by: Matej Gera <matejgera@gmail.com> * 0.27-rc0 Update readme and version (#5430) * Update readme and version Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Fix newlines Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Fixes typo Signed-off-by: Wiard van Rij <wvanrij@roku.com> * fixes noise Signed-off-by: Wiard van Rij <wvanrij@roku.com> * Alert Compliance: Fix wrong ruler configuration (#5433) * [receive] Export metrics about remote write requests per tenant (#5424) * Add write metrics to Thanos Receive Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Let the middleware count inflight HTTP requests Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update Receive write metrics type & definition Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Put option back in its place to avoid big diff Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fetch tenant from headers instead of context It might not be in the context in some cases. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Delete unnecessary tenant parser middleware Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor & reuse code for HTTP instrumentation Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add missing copyright to some files Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add changelog entry for Receive & new HTTP metrics Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Remove TODO added by accident Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Make error handling code shorter Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Make switch statement simpler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Remove method label from timeseries' metrics Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Count samples of all series instead of each Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Remove in-flight requests metric Will add this in a follow-up PR to keep this small. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Change timeseries/samples metrics to histograms The buckets were picked based on the fact that Prometheus' default remote write configuration (see https://prometheus.io/docs/practices/remote_write/#memory-usage) set a max of 500 samples sent per second. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix Prometheus registry for histograms Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix comment in NewHandler functions There are now four metrics instead of five. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> * remove unused block-sync-concurrency flag (#5426) * remove unused block-sync-concurrency flag Signed-off-by: Ben Ye <ben.ye@bytedance.com> * add changelog Signed-off-by: Ben Ye <ben.ye@bytedance.com> * update Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix e2e test Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix tests Signed-off-by: Ben Ye <ben.ye@bytedance.com> * fix docs typo in metric thanos_compact_halted (#5448) Signed-off-by: Nikita Matveenko <nikitapecasa@gmail.com> * Implement tenant expiration (#5420) * Implement tenant expiration This commit adds dynamic TSDB pruning for tenants which have not received new samples within a certain period of time. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add link to receiver tenant-lifecycle-management Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Docs: Remove Katacoda links (#5454) * Remove Katacoda links Signed-off-by: Matej Gera <matejgera@gmail.com> * Remove one more reference Signed-off-by: Matej Gera <matejgera@gmail.com> * Fixed lint on Go 1.18.3+ (#5459) Signed-off-by: bwplotka <bwplotka@gmail.com> * Add HTTP metrics for in-flight requests (#5440) * Add HTTP metrics for in-flight requests Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix changelog entry after PR creation Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix link in old CHANGELOG entry Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix style in the CHANGELOG All the entries should end up with a period. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve help for in-flight htttp requests metric Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Move changelog entry pending PR Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a method label to the in-flight HTTP requests Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * docs: Fix heading level of "Excluded from caching" (#5455) * Refactor DefaultTransport() from objstore to package exthttp (#5447) * Refactoring the DefaultTransport func in package exthttp Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Refactoring the DefaultTransport func from s3 in package exthttp Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Updated helpers.go corrected argument for DefaultTransport() in helpers.go Signed-off-by: Srushti (sroo-sh-tee) <73685894+SrushtiSapkale@users.noreply.github.com> * Changed the argument type in getContainerURL Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Update pkg/exthttp/transport.go Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Srushti (sroo-sh-tee) <73685894+SrushtiSapkale@users.noreply.github.com> * Update pkg/exthttp/transport.go Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Srushti (sroo-sh-tee) <73685894+SrushtiSapkale@users.noreply.github.com> * Removed the use of NewTransport() in cos.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Moved TLSConfig struct and functions that need it from objstore to exthttp Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Changed s3.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Kept s3.go and helpers.go unchanged to not break the cortex deps Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Consistency changed made while pair++ programming. Signed-off-by: bwplotka <bwplotka@gmail.com> * Created a new tlsconfig in exthttp and minor changes in cos.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Commented in s3.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Minor changes in transport.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Changed transport.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Changed transport.go and tlsconfig.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Removed changes from prometheus.mod and prometheus.sum Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> * Minor updation in cos.go Signed-off-by: Srushti Sapkale <srushtiisapkale@gmail.com> Co-authored-by: bwplotka <bwplotka@gmail.com> * receive: Fix race condition when pruning tenants (#5460) Pruning Receiver tenants has a race condition caused by concurrently removing items from the tenants map. This commit addresses the issue by using a mutex to guard the tenants map. Signed-off-by: fpetkovski <filip.petkovsky@gmail.com> * Adding SCMP as an adopter (#5466) Signed-off-by: Chris Ng <2509212+chris-ng-scmp@users.noreply.github.com> * Updated busybox version. (#5471) Signed-off-by: bwplotka <bwplotka@gmail.com> * Refactor endpoint ref clients Signed-off-by: Matej Gera <matejgera@gmail.com> * Fix E2E test env name clash Signed-off-by: Matej Gera <matejgera@gmail.com> * Build with Go 1.18 (#5258) * Build with Go 1.18 Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr> * Try something Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr> * Upgrade minio Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr> * Replace json-iterator/reflect2 in bingo Signed-off-by: Sylvain Rabot <sylvain@abstraction.fr> * Ignore 405 errors for prometheus buildVersion API requests (#5477) Older versions of prometheus (such as 2.7 which is shipped by Debian buster) return a 405 error for non-existent API endpoints instead of the 404 returned by more recent versions. Signed-off-by: Nicolas Dandrimont <olasd@softwareheritage.org> * *: Cut 0.27.0 (#5473) * Cut 0.27.0 Signed-off-by: Matej Gera <matejgera@gmail.com> * Updated busybox version. (#5471) Signed-off-by: bwplotka <bwplotka@gmail.com> Signed-off-by: Matej Gera <matejgera@gmail.com> * Docs: Remove Katacoda links (#5454) * Remove Katacoda links Signed-off-by: Matej Gera <matejgera@gmail.com> * Remove one more reference Signed-off-by: Matej Gera <matejgera@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Matej Gera <matejgera@gmail.com> * Update compact.md (#5465) * During 1h downsampling skip XOR chunks that may erroneously be present in 5m resolution blocks (#5453) * Add fpetkovski to triage list Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Use Azure BlobURL.Download instead of in-memory buffer (#5451) Modify the azure.Bucket get methods to use BlobURL.Download for fetching blobs and blob ranges. This avoids the need to allocate a buffer for storing the entire expected size of the object in memory. Instead, use a ReaderCloser view of the body returned by the download method. See grafana/mimir#2229 Signed-off-by: Nick Pillitteri <nick.pillitteri@grafana.com> * Update storage.md (#5486) * [receive] Add per-tenant charts to Receive's example dashboard (#5472) * Start to add tenant charts to Receive Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Properly filter HTTP status codes Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix tenant error rate chart Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor to improve readability and consistency Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor one more usage of code and tenant labels Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Filter tenant metrics to the Receive handler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Format math expression properly Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update CHANGELOG Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add samples charts to series & samples row Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Bump Go version in all the GH Actions (#5487) * Bump go version in go mod This is a follow up to #5258, which made the project be built with Go 1.18. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update Go version in all GH Actions Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Run go mod tidy Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Added changelog entry Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Put back Go 1.17 in go.mod Because we don't use any Go 1.18 feature yet, so it's not needed Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update go.sum after changing go.mod to go 1.17 Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Remove non-user-impacting entry for changelog Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * objstore: Download and Upload block files in parallel (#5475) * Parallel Chunks Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * test Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Changelog Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * making ApplyDownloadOptions private Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * upload concurrency Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Upload Test Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * update change log Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Change comments Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Address comments Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Remove duplicate entries on changelog Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Addressing Comments Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * update golang.org/x/sync Signed-off-by: alanprot <alanprot@gmail.com> Signed-off-by: Alan Protasio <approtas@amazon.com> * Adding Commentts Signed-off-by: Alan Protasio <approtas@amazon.com> * Use default HTTP config for E2E S3 tests (#5483) Signed-off-by: Matej Gera <matejgera@gmail.com> * chore: Included githubactions in the dependabot config (#5364) This should help with keeping the GitHub actions updated on new releases. This will also help with keeping it secure. Dependabot helps in keeping the supply chain secure https://docs.github.com/en/code-security/dependabot GitHub actions up to date https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot https://github.com/ossf/scorecard/blob/main/docs/checks.md#dependency-update-tool Signed-off-by: naveensrinivasan <172697+naveensrinivasan@users.noreply.github.com> * bump codemirror and promql editor to the last version (#5491) Signed-off-by: Augustin Husson <husson.augustin@gmail.com> * receiver: Expose stats for all tenants (#5470) * receiver: Expose stats for all tenants Thanos Receiver supports the Prometheus tsdb status API and can expose TSDB stats for a single tenant. This commit extends that functionality and allows users to request TSDB stats for all tenants using the all_tenants=true query parameter. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add back chunk count Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Simplify TSDBStats interface Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Return empty result for no stats Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * CHANGELOG.md: regenerate (#5495) * receive: Fix stats nil pointer panic (#5494) When fetching TSDB stats from receivers, certain TSDBs might not be initialized yet. This can lead to a nil pointer access when the status endpoint is accessed before all TSDBs are initialized. This commit adds an explicit check for each tenant's TSDB when exporting TSDB stats. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Update query.md (#5496) Fix typo of parameter --store.sd-files Signed-off-by: Firxiao <Firxiao@users.noreply.github.com> * Parallel download blocks - Follow up of #5475 (#5493) * Download blocks in parallel Signed-off-by: Alan Protasio <approtas@amazon.com> * remove the go func Signed-off-by: Alan Protasio <approtas@amazon.com> * Doc Signed-off-by: Alan Protasio <approtas@amazon.com> * CHANGELOG Signed-off-by: Alan Protasio <approtas@amazon.com> * doc Signed-off-by: alanprot <alanprot@gmail.com> * AddressComments Signed-off-by: alanprot <alanprot@gmail.com> * fix typo Signed-off-by: Alan Protasio <approtas@amazon.com> * Upgrade mdox with cache and some http settings to reduce CI failures (#5500) * Pin mdox to latest master commit It suppors now a cache for link validation and some HTTP configuration that can be used to help avoid intermittent CI failures. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add mdox cache and HTTP configuration The cache has a default TTL (5 days) A timeout of 1m and 10 connections per host at transport level should help us reduce the intermittent failures if we have to invalidate the cache. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add Github Action cache for the mdox cache Using the hash of the md files as cache key. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Upgrade cache actions to v3 and add restore key Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Empty commit to test CI build cache Signed-off-by: GitHub <noreply@github.com> * Use 2.5 days as jitter for mdox cache Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix bad editor auto-formating again Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Updated minio-go to latest; removed fork. (#5474) * Updated minio-go fork to latest. NOTE: Optimization is propopsed to upstream to avoid fork in future. Relates to https://github.com/thanos-io/thanos/issues/5101 and https://github.com/thanos-io/thanos/issues/5130 Signed-off-by: bwplotka <bwplotka@gmail.com> # Conflicts: # go.mod # go.sum * Removed fork. Signed-off-by: bwplotka <bwplotka@gmail.com> * Added comment. Signed-off-by: bwplotka <bwplotka@gmail.com> * Receiver: Handle storage exemplar multi-error (#5502) * Handle exemplar store errors as conflict Signed-off-by: Matej Gera <matejgera@gmail.com> * Adjust tests Signed-off-by: Matej Gera <matejgera@gmail.com> * Update CHANGELOG Signed-off-by: Matej Gera <matejgera@gmail.com> * Fixing Race condition Introduced by #5493 (#5503) * Update busybox image versions (#5506) Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Updates busybox SHA (#5507) Signed-off-by: GitHub <noreply@github.com> Co-authored-by: yeya24 <yeya24@users.noreply.github.com> * chore: Update Prometheus dependency (#5484) * chore: Update Prometheus dependency Update Prometheus from v2.33.5 to v2.36.2. Signed-off-by: SuperQ <superq@gmail.com> * Update query tests for cortex changes. Signed-off-by: SuperQ <superq@gmail.com> * Use the default rules.RuleGroupPostProcessFunc. Signed-off-by: SuperQ <superq@gmail.com> * Update QueryStats use. Signed-off-by: SuperQ <superq@gmail.com> * Update Cortex. Signed-off-by: SuperQ <superq@gmail.com> * Update queryfrontend for Cortex changes. Signed-off-by: SuperQ <superq@gmail.com> * Bump pprof. Signed-off-by: SuperQ <superq@gmail.com> * Add changelog entry. Signed-off-by: SuperQ <superq@gmail.com> * Adapt to changed query stats API Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Sync dependencies Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Reflect changed metric names Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Kemal Akkoyun <kakkoyun@gmail.com> Co-authored-by: Kemal Akkoyun <kakkoyun@users.noreply.github.com> * chore: Vendor Cortex dependency as an internal package (#5504) * Vendor Cortex dependency as an internal package Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add gitattributes Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Skip checks for vendored directory Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Add copyright headers for Cortex Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * *: Move objstore out of repo (#5510) * *: Move objstore out of repo Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * Fix doc checks Signed-off-by: Kemal Akkoyun <kakkoyun@gmail.com> * chore: Update Prometheus to v2.37.0 (#5511) * chore: Update Prometheus to v2.37.0 Update Prometheus to the latest release. Note that Prometheus upstream now tags v0.x.y to map to the 2.x.y releases. Signed-off-by: SuperQ <superq@gmail.com> * Cleanup direct/indirect go.mod requirements. Signed-off-by: SuperQ <superq@gmail.com> * chore: Update Go modules (#5516) * Update weaveworks/common to remove node_exporter indirect dep. * Update simonpasquier/klog-gokit/v2. * Update google.golang.org/grpc lock to v1.45.0. * Cleanup replacements that are now handled by indirect requirements. * Fixup grpc.WithInsecure() use. Signed-off-by: SuperQ <superq@gmail.com> * chore: Update Go modules (#5518) * Reuse upstream TSDB status structs (#5526) This commit replaces the copied TSDB status structs with direct references from prometheus/prometheus. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Fix proposal on website (#5530) Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Update all bingo dependencies (#5525) This commit updates all bingo dependencies to their latest versions. It pins golang.org/x/sys to v0.0.0-20220715151400-c0bba94af5f8 for the github.com/google/go-jsonnet dependency in order to prevent failures when running make docs on Mac OS. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * delete_katacoda (#5529) Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * Remove empty RuleGroups in api/v1/rules when using matchers (#5537) * Remove empty RuleGroups in api/v1/rules Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Implement suggestion Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Rename variables Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * fix(api): When querying api query on endpoint alerts return a json struct with alerts in lowercase. (#5534) To be same result as prometheus api Signed-off-by: Guillaume audic <audic.gui@gmail.com> * Receiver: Add benchmark for receive writer (#5533) * Add benchmark for receive writer Signed-off-by: Matej Gera <matejgera@gmail.com> * Incorporate feedback - Clearer parameter naming; use a separate temp dir for bench Signed-off-by: Matej Gera <matejgera@gmail.com> * Submit a proposal for Active Series Limiting for Hashring Topology (#5415) * Add proposal for Active Series Limiting for Hashring Topology Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Resize images Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add Observatorium as an alternative Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Implement suggestions; add TODO Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Update proposal Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Implement suggestions: add sections numbers Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Refactor EndpointSet (#5538) * Refactor EndpointSet This commit refactors the EndpointSet struct in order to make it easier to understand and work with. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Handle context cancellation in endpoint mock Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Make additions and removals of refs atomic. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Fix changed-docs grep regex (#5556) Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Added Vertical Query Sharding to Query-Frontend (#5342) * Update faillint to v1.10.0 Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Implement query sharding This commit implements query sharding for grouping PromQL expressions. Sharding is initiated by analyzing the PromQL and extracting grouping labels. Extracted labels are propagated down to Stores which partition the response by hashmoding all series on those labels. If a query is shardable, the partitioning and merging process will be initiated by the Query Frontend. The Query Frontend will make N distinct queries across a set of Queriers and merge the results back before presenting them to the user. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * First code review pass Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Use sync pool to reuse sharding buffers Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Add test for binary expression with constant Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Include external labels in series sharding Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Rule: Fix e2e test flake (#5558) * Rule: Fix e2e test flake Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix lint Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Check errors Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Change to github.com/thanos-io/thanos/pkg/errors Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Implement suggestion Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix multi-tenant exemplar matchers (#5554) * Fix multi-tenant exemplar matchers The exemplar proxy synthesizes a query based on PromQL expression matchers and individual store's label sets. When a store has multiple label sets with same label names but different values (e.g. multitenant Receivers), each exemplar matcher will be repeated once for each label set. Because of this, a receiver hosting 200 tenants can get the same exemplar matcher 200 times. This leads to the underlying stores slowing down and timing out when asked for exemplars. This commit modifies the exemplar proxy to deduplicate matchers and only send a matcher once to an underlying store. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Address CR comments Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Receive: add per request limits for remote write (#5527) * Add per request limits for remote write Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Remove useless TODO item Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Refactor write request limits test Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add write concurrency limit to Receive Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Change write limits config option name Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Document remote write concurrenty limit Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add changelog entry Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Format docs Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Extract request limiting logic from handler Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add copyright header Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add a TODO for per-tenant limits Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add default value and hide the request limit flags Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve TODO comment in request limits Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Update Receive docs after flags wre made hidden Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Add note about WIP in Receive request limits doc Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix typo in Receive docs Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix help text for concurrent request limit Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Use byte unit helpers for improved readability Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Removed check for nil writeGate The constructor sets the writeGate to a noopGate. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Better organize linebreaks Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix help text for limits hit metric Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Apply some english feedback Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Improve limits & gates documentationb Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix import clause Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Use a 3 node hashring for write limits test This should ensure the request fanout logic cannot somehow interfere with the request limit logic. Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Fix comment Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Douglas Camata <159076+douglascamata@users.noreply.github.com> * Announce sharding in ruler and store proxy (#5560) The ruler and store proxy currently support series sharding through the components that they use. However, this capability is not announced to the querier. This commit modifies their Info calls to indicate to the querier that it doesn't need to shard the response it receives from rulers and other store proxies. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com> * Fix flaky e2e tests (#5563) * Tools: Fix e2e test flake Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Metadata: Fix flaky e2e test Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Compact: Fix flaky e2e test Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Bumping actions/cache to v3 for e2e tests Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add missing e2e.WaitMissingMetrics Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Meta-monitoring based active series limiting (#5520) * Add initial PoC for meta-monitoring Receive active series limits Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add e2e tests, rebase Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add multitenant test + remake diagrams Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Implement suggestions; Make naming consistent; Rm/Add metrics Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Reuse meta-monitoring client Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix panic Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Cache meta-monitoring query result Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fix lint Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fail fast when limiting Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Implement suggestions: docs + mutex + struct Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add interface and no-op Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add changelog entry Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Add seriesLimitSupported to handler Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Remove tools fork Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Change docs header Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Remove usage of ioutil (#5564) Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * docs/contribution.md: Update required Go version (#5557) * delete_katacoda Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * updated go version Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * update golang version Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * updated Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * Retrigger CI Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * Retrigger CI Signed-off-by: Akshit42-hue <patelakshit2025@gmail.com> * fix an expression param in a link to an alert in the rules page (#5562) Signed-off-by: Rostislav Benes <r.dee.b.b@gmail.com> Co-authored-by: Rostislav Benes <r.dee.b.b@gmail.com> * Receiver: Validate labels in write requests (#5508) * Add label set validation method Signed-off-by: Matej Gera <matejgera@g…
Changes
This PR is another try to do the proposed on #1318
Some decisions had to be made in order for making it work, and those will be highlighted with comments in the code.
Although we used PR #3289 to understand where could we make this implementation, we rewrote everything from scratch.
Many thanks for the author of PR 3289 :)
Verification
We've added unit tests, and added the prefix test to all e2e tests.