forked from thanos-io/thanos
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release v0.37 #109
Merged
Release v0.37 #109
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
We have seen deadlocks with endpoint discovery caused by the metric collector hanging and not releasing the store labels lock. This causes the endpoint update to hang, which also makes all endpoint readers hang on acquiring a read lock for the resolved endpoints slice. This commit makes sure the Collect method on the metrics collector has a built in timeout to guard against cases where an upstream call never reads from the collection channel. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
…ne (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
In LabelNames and LabelValues gRPC calls were not pruned properly. While results are not wrong, this leads to inefficient fan-out for setups with many endpoints. We took the opportunity to unify the store filtering and generally also the larger layout of the gRPC methods, including logging and tracing. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Appending warn to changelog about breaking change Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * Including warning emoji Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> --------- Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
…7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
* *: Using native histograms for grpc middleware metrics Since we updated the middleware library, we can now use native histograms to keep track of latencies in grpc calls. This is a semi-breaking change if people enabled native histogram collection on their Prometheus monitoring Thanos instances. Signed-off-by: Pedro Tanaka <pedro.stanaka@gmail.com> adding change log Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * removing empty space; Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> * Put full disclaimer in changelog Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com> --------- Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* compact: recover from panics (thanos-io#7318) For thanos-io#6775, it would be useful to know the exact block IDs to aid debugging. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> * Sidecar: wait for prometheus on startup (thanos-io#7323) Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Receive: fix serverAsClient.Series goroutines leak (thanos-io#6948) * fix serverAsClient goroutines leak Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * fix lint Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * update changelog Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * delete invalid comment Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove temp dev test Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * remove timer channel drain Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> --------- Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> * Receive: fix stats (thanos-io#7373) If we account stats for remote write and local writes we will count them twice since the remote write will be counted locally again by the remote receiver instance. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline (thanos-io#7382) * *: Ensure objstore flag values are masked & disable debug/pprof/cmdline Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * small fix Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Query: dont pass query hints to avoid triggering pushdown (thanos-io#7392) If we have a new querier it will create query hints even without the pushdown feature being present anymore. Old sidecars will then trigger query pushdown which leads to broken max,min,max_over_time and min_over_time. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> * Cut patch release v0.35.1 Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Signed-off-by: Michael Hoffmann <mhoffm@posteo.de> Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com> Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com> Co-authored-by: Michael Hoffmann <mhoffm@posteo.de> Co-authored-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Previously we defered starting the gRPC server by blocking the whole startup until we could ping prometheus. This breaks usecases that rely on the config reloader to start prometheus. We fix it by using a channel to defer starting the grpc server and loading external labels in an actor concurrently. Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
* Uupdate Prometheus Signed-off-by: alanprot <alanprot@gmail.com> * Updating prometheus to 4e664035e84e Signed-off-by: alanprot <alanprot@gmail.com> * Temporarily pinning prometheus common Signed-off-by: alanprot <alanprot@gmail.com> * fixing lint Signed-off-by: alanprot <alanprot@gmail.com> * Using jsoniter to encode promql responses Signed-off-by: alanprot <alanprot@gmail.com> * Removing e2e test case with unvalid hifen on a matcher -> prometheus now support this use case Signed-off-by: alanprot <alanprot@gmail.com> * Updating prometheus to v0.52.2-0.20240606174736-edd558884b24 Signed-off-by: alanprot <alanprot@gmail.com> * pinning grpc to v1.63.2 Signed-off-by: alanprot <alanprot@gmail.com> --------- Signed-off-by: alanprot <alanprot@gmail.com> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-21-10.us-west-2.compute.internal>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Allow suppressing environment variables expansion errors when unset, and thus keep the reloader from crashing. Instead leave them as is. Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
* Update adopters.yml Signed-off-by: Rishabh Soni <risrock02@gmail.com> * Add files via upload Signed-off-by: Rishabh Soni <risrock02@gmail.com> --------- Signed-off-by: Rishabh Soni <risrock02@gmail.com>
Signed-off-by: Vasiliy Rumyantsev <4119114+xBazilio@users.noreply.github.com>
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
Recently ran into an issue with Istio in particular, where leaving the trailing dot on the SRV record returned by `dnssrvnoa` lookups led to an inability to connect to the endpoint. Removing the trailing dot fixes this behaviour. Now, technically, this is a valid URL, and it shouldn't be a problem. One could definitely argue that Istio should be responsible here for ensuring that the traffic is delivered. The problem seems rooted in how Istio attempts to do wildcard matching or URLs it receives - including the dot leads it to lookup an empty DNS field, which is invalid. The approach I take here is actually copied from how Prometheus does it. Therefore I hope we can sneak this through with the argument that 'this is how Prometheus does it', regardless of whether or not this is philosophically correct... Signed-off-by: verejoel <j.verezhak@gmail.com>
Bumps [go.opentelemetry.io/contrib/propagators/autoprop](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.38.0 to 0.53.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@zpages/v0.38.0...zpages/v0.53.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/propagators/autoprop dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [go.opentelemetry.io/contrib/samplers/jaegerremote](https://github.com/open-telemetry/opentelemetry-go-contrib) from 0.7.0 to 0.22.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go-contrib@v0.7.0...v0.22.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/contrib/samplers/jaegerremote dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…hanos-io#7492) * compact: Update filtered blocks list before second downsample pass If the second downsampling pass is given the same filteredMetas list as the first pass, it will create duplicates of blocks created in the first pass. It will also not be able to do further downsampling e.g 5m->1h using blocks created in the first pass, as it will not be aware of them. The metadata was already being synced before the second pass, but not updated into the filteredMetas list. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> * Update changelog Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> * e2e/compact: Fix number of blocks cleaned assertion The value was increased in 2ed48f7 to fix the test, with the reasoning that the hardcoded value must have been taken from a run of the CI that didn't reach the max value due to CI worker lag. More likely the real reason is that commit 68bef3f the day before had caused blocks to be duplicated during downsampling. The duplicate block is immediately marked for deletion, causing an extra +1 in the number of blocks cleaned. Subtracting one from the value again now that the block duplication issue is fixed. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> * e2e/compact: Revert change to downsample count assertion Combined with the previous commit this effectively reverts all of 2ed48f7, in which two assertions were changed to (unknowingly) account for a bug which had just been introduced in the downsampling code, causing duplicate blocks. This assertion change I am less sure on the reasoning for, but after running through the e2e tests several times locally, it is consistent that the only downsampling happens in the "compact-working" step, and so all other steps would report 0 for their total downsamples metric. Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk> --------- Signed-off-by: Thomas Hartland <thomas.hartland@diamond.ac.uk>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
…s.go (thanos-io#7552) Signed-off-by: Nishant Bansal <nishant.bansal.mec21@iitbhu.ac.in>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.24.0 to 0.25.0. - [Commits](golang/crypto@v0.24.0...v0.25.0) --- updated-dependencies: - dependency-name: golang.org/x/crypto dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
thanos-io#7528) Bumps [go.opentelemetry.io/otel/bridge/opentracing](https://github.com/open-telemetry/opentelemetry-go) from 1.21.0 to 1.28.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.21.0...v1.28.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/bridge/opentracing dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This commits adds the option of filtering rules by rule name, rule group, or file. This brings the rule API closer in-line with the current Prometheus api. Signed-off-by: Jacob Baungard Hansen <jacobbaungard@redhat.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.26.0 to 0.27.0. - [Commits](golang/net@v0.26.0...v0.27.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…hanos-io#7525) Bumps [go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc](https://github.com/open-telemetry/opentelemetry-go) from 1.27.0 to 1.28.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](open-telemetry/opentelemetry-go@v1.27.0...v1.28.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yuchen Wang <162491048+yuchen-db@users.noreply.github.com>
* support hedged requests in store Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * hedged roundtripper with tdigest for dynamic delay Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * refactor struct and fix lint Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * Improve hedging implementation Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * Improved hedging implementation Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * Update store doc Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * fix white space Signed-off-by: milinddethe15 <milinddethe15@gmail.com> * add enabled field Signed-off-by: milinddethe15 <milinddethe15@gmail.com> --------- Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
I always get this in logs: ``` err: receive capnp conn: close tcp ...: use of closed network connection ``` This is also visible in the e2e test. After Done() returns, the connection is closed either way so no need to close it again. Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
* Fix a storage GW bug that loses TSDB infos when joining them * E2E test demonstrating a bug in the MinT calculation in distributed Engine Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
…o#7915) * always close block series client at the end Signed-off-by: Ben Ye <benye@amazon.com> * add back close for loser tree Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com>
* Update objstore and promql-engine to latest Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> * Fixes after upgrade Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com> --------- Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
merge db_main branch to release branch which has been running for a few weeks, a few highlights to call out:
Changes
Verification