Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix typo in .json file name #6051

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 36 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,45 +20,68 @@ We use *breaking :warning:* to mark changes that are not backward compatible (re
- [#6035](https://github.com/thanos-io/thanos/pull/6035) Replicate: Support all types of matchers to match blocks for replication. Change matcher parameter from string slice to a single string.

### Fixed
- [#5995] (https://github.com/thanos-io/thanos/pull/5993) Sidecar: Loads the TLS certificate during startup.

- [#5995](https://github.com/thanos-io/thanos/pull/5995) Sidecar: Loads the TLS certificate during startup.
- [#6044](https://github.com/thanos-io/thanos/pull/6044) Receive: mark ouf of window errors as conflict, if out-of-window samples ingestion is activated
- [#6066](https://github.com/thanos-io/thanos/pull/6066) Tracing: fixed panic because of nil sampler
- [#6067](https://github.com/thanos-io/thanos/pull/6067) Receive: fixed panic when querying uninitialized TSDBs.

### Changed

- [#6010](https://github.com/thanos-io/thanos/pull/6010) *: Upgrade Prometheus to v0.41.0.
- [#5887](https://github.com/thanos-io/thanos/pull/5887) Tracing: Make sure rate limiting sampler is the default, as was the case in version pre-0.29.0.

## [v0.30.1](https://github.com/thanos-io/thanos/tree/release-0.30) - 4.01.2023

### Fixed

- [#6009](https://github.com/thanos-io/thanos/pull/6009) Query Frontend/Store: fix duplicate metrics registration in Redis client

## [v0.30.0](https://github.com/thanos-io/thanos/tree/release-0.30) - 2.01.2023

## [v0.30.0](https://github.com/thanos-io/thanos/tree/release-0.30) - in progress.
NOTE: Querier's `query.promql-engine` flag enabling new PromQL engine is now unhidden. We encourage users to use new experimental PromQL engine for efficiency reasons.

### Fixed

- [#5716](https://github.com/thanos-io/thanos/pull/5716) DNS: Fix miekgdns resolver LookupSRV to work with CNAME records.
- [#5844](https://github.com/thanos-io/thanos/pull/5844) Query Frontend: Fixes @ modifier time range when splitting queries by interval.
- [#5854](https://github.com/thanos-io/thanos/pull/5854) Query Frontend: Handles `lookback_delta` param in query frontend.
- [#5854](https://github.com/thanos-io/thanos/pull/5854) Query Frontend: `lookback_delta` param is now handled in query frontend.
- [#5860](https://github.com/thanos-io/thanos/pull/5860) Query: Fixed bug of not showing query warnings in Thanos UI.
- [#5856](https://github.com/thanos-io/thanos/pull/5856) Store: Fixed handling of debug logging flag.
- [#5230](https://github.com/thanos-io/thanos/pull/5230) Rule: Stateless ruler support restoring `for` state from query API servers. The query API servers should be able to access the remote write storage.
- [#5880](https://github.com/thanos-io/thanos/pull/5880) Query Frontend: Fixes some edge cases of query sharding analysis.
- [#5893](https://github.com/thanos-io/thanos/pull/5893) Cache: Fixed redis client not respecting `SetMultiBatchSize` config value.
- [#5966](https://github.com/thanos-io/thanos/pull/5966) Query: Fixed mint and maxt when selecting series for the `api/v1/series` HTTP endpoint.
- [#5997](https://github.com/thanos-io/thanos/pull/5997) Rule: switch to miekgdns DNS resolver as the default one.
- [#5948](https://github.com/thanos-io/thanos/pull/5948) Store: `chunks_fetched_duration` wrong calculation.
- [#5910](https://github.com/thanos-io/thanos/pull/5910) Receive: Fixed ketama quorum bug that was could cause success response for failed replication. This also optimize heavily receiver CPU use.

### Added

- [#5945](https://github.com/thanos-io/thanos/pull/5945) Tools: Added new `no-downsample` marker to skip blocks when downsampling via `thanos tools bucket mark --marker=no-downsample-mark.json`. This will skip downsampling for blocks with the new marker.
- [#5814](https://github.com/thanos-io/thanos/pull/5814) Store: Add metric `thanos_bucket_store_postings_size_bytes` that shows the distribution of how many postings (in bytes) were needed for each Series() call in Thanos Store. Useful for determining limits.
- [#5801](https://github.com/thanos-io/thanos/pull/5801) Store: add a new limiter `--store.grpc.downloaded-bytes-limit` that limits the number of bytes downloaded in each Series/LabelNames/LabelValues call. Use `thanos_bucket_store_postings_size_bytes` for determining the limits.
- [#5839](https://github.com/thanos-io/thanos/pull/5839) Receive: Add parameter `--tsdb.out-of-order.time-window` to set time window for experimental out-of-order samples ingestion. Disabled by default (set to 0s). Please note if you enable this option and you use compactor, make sure you set the `--enable-vertical-compaction` flag, otherwise you might risk compactor halt.
- [#5836](https://github.com/thanos-io/thanos/pull/5836) Receive: Add hidden flag `tsdb.memory-snapshot-on-shutdown` to enable experimental TSDB feature to snapshot on shutdown. This is intended to speed up receiver restart.
- [#5814](https://github.com/thanos-io/thanos/pull/5814) Store: Added metric `thanos_bucket_store_postings_size_bytes` that shows the distribution of how many postings (in bytes) were needed for each Series() call in Thanos Store. Useful for determining limits.
- [#5703](https://github.com/thanos-io/thanos/pull/5703) StoreAPI: Added `hash` field to series' chunks. Store gateway and receive implements that field and proxy leverage that for quicker deduplication.
- [#5801](https://github.com/thanos-io/thanos/pull/5801) Store: Added a new flag `--store.grpc.downloaded-bytes-limit` that limits the number of bytes downloaded in each Series/LabelNames/LabelValues call. Use `thanos_bucket_store_postings_size_bytes` for determining the limits.
- [#5836](https://github.com/thanos-io/thanos/pull/5836) Receive: Added hidden flag `tsdb.memory-snapshot-on-shutdown` to enable experimental TSDB feature to snapshot on shutdown. This is intended to speed up receiver restart.
- [#5839](https://github.com/thanos-io/thanos/pull/5839) Receive: Added parameter `--tsdb.out-of-order.time-window` to set time window for experimental out-of-order samples ingestion. Disabled by default (set to 0s). Please note if you enable this option and you use compactor, make sure you set the `--enable-vertical-compaction` flag, otherwise you might risk compactor halt.
- [#5889](https://github.com/thanos-io/thanos/pull/5889) Query Frontend: Added support for vertical sharding `label_replace` and `label_join` functions.
- [#5865](https://github.com/thanos-io/thanos/pull/5865) Compact: Retry on sync metas error.
- [#5889](https://github.com/thanos-io/thanos/pull/5889) Query Frontend: Support sharding vertical sharding `label_replace` and `label_join` functions.
- [#5819](https://github.com/thanos-io/thanos/pull/5819) Store: Add a few objectives for Store's data touched/fetched amount and sizes. They are: 50, 95, and 99 quantiles.
- [#5819](https://github.com/thanos-io/thanos/pull/5819) Store: Added a few objectives for Store's data summaries (touched/fetched amount and sizes). They are: 50, 95, and 99 quantiles.
- [#5837](https://github.com/thanos-io/thanos/pull/5837) Store: Added streaming retrival of series from object storage.
- [#5940](https://github.com/thanos-io/thanos/pull/5940) Objstore: Support for authenticating to Swift using application credentials.
- [#5977](https://github.com/thanos-io/thanos/pull/5977) Tools: Added remove flag on bucket mark command to remove deletion, no-downsample or no-compact markers on the block.
- [#5945](https://github.com/thanos-io/thanos/pull/5945) Tools: Added new `no-downsample` marker to skip blocks when downsampling via `thanos tools bucket mark --marker=no-downsample-mark.json`. This will skip downsampling for blocks with the new marker.
- [#5977](https://github.com/thanos-io/thanos/pull/5977) Tools: Added remove flag on bucket mark command to remove deletion, no-downsample or no-compact markers on the block

### Changed

- [#5716](https://github.com/thanos-io/thanos/pull/5716) DNS: Fix miekgdns resolver LookupSRV to work with CNAME records.
- [#5785](https://github.com/thanos-io/thanos/pull/5785) Query: `thanos_store_nodes_grpc_connections` now trimms `external_labels` label name longer than 1000 character. It also allows customizations in what labels to preserve using `query.conn-metric.label` flag.
- [#5542](https://github.com/thanos-io/thanos/pull/5542) Mixin: Added query concurrency panel to Querier dashboard.
- [#5846](https://github.com/thanos-io/thanos/pull/5846) Query Frontend: vertical query sharding supports subqueries.
- [#5909](https://github.com/thanos-io/thanos/pull/5909) Receive: compact tenant head after no appends have happened for 1.5 `tsdb.max-block-size`.
- [#5593](https://github.com/thanos-io/thanos/pull/5593) Cache: switch Redis client to [Rueidis](https://github.com/rueian/rueidis). Rueidis is [faster](https://github.com/rueian/rueidis#benchmark-comparison-with-go-redis-v9) and provides [client-side caching](https://redis.io/docs/manual/client-side-caching/). It is highly recommended to use it so that repeated requests for the same key would not be needed.
- [#5896](https://github.com/thanos-io/thanos/pull/5896) *: Upgrade Prometheus to v0.40.7 without implementing native histogram support. *Querying native histograms will fail with `Error executing query: invalid chunk encoding "<unknown>"` and native histograms in write requests are ignored.*
- [#5999](https://github.com/thanos-io/thanos/pull/5999) *: Upgrade Alertmanager dependency to v0.25.0.
- [#5909](https://github.com/thanos-io/thanos/pull/5909) Receive: Compact tenant head after no appends have happened for 1.5 `tsdb.max-block-size`.
- [#5838](https://github.com/thanos-io/thanos/pull/5838) Mixin: Added data touched type to Store dashboard.
- [#5922](https://github.com/thanos-io/thanos/pull/5922) Compact: Retry on clean, partial marked errors when possible.

### Removed

Expand Down
1 change: 0 additions & 1 deletion cmd/thanos/compact.go
Original file line number Diff line number Diff line change
Expand Up @@ -406,7 +406,6 @@ func runCompact(
defer cleanMtx.Unlock()

if err := sy.SyncMetas(ctx); err != nil {
cancel()
return errors.Wrap(err, "syncing metas")
}

Expand Down
6 changes: 6 additions & 0 deletions cmd/thanos/receive.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ func registerReceive(app *extkingpin.App) {
EnableExemplarStorage: true,
HeadChunksWriteQueueSize: int(conf.tsdbWriteQueueSize),
EnableMemorySnapshotOnShutdown: conf.tsdbMemorySnapshotOnShutdown,
EnableNativeHistograms: conf.tsdbEnableNativeHistograms,
}

// Are we running in IngestorOnly, RouterOnly or RouterIngestor mode?
Expand Down Expand Up @@ -783,6 +784,7 @@ type receiveConfig struct {
tsdbMaxExemplars int64
tsdbWriteQueueSize int64
tsdbMemorySnapshotOnShutdown bool
tsdbEnableNativeHistograms bool

walCompression bool
noLockFile bool
Expand Down Expand Up @@ -895,6 +897,10 @@ func (rc *receiveConfig) registerFlag(cmd extkingpin.FlagClause) {
"[EXPERIMENTAL] Enables feature to snapshot in-memory chunks on shutdown for faster restarts.").
Default("false").Hidden().BoolVar(&rc.tsdbMemorySnapshotOnShutdown)

cmd.Flag("tsdb.enable-native-histograms",
"[EXPERIMENTAL] Enables the ingestion of native histograms.").
Default("false").Hidden().BoolVar(&rc.tsdbEnableNativeHistograms)

cmd.Flag("hash-func", "Specify which hash function to use when calculating the hashes of produced files. If no function has been specified, it does not happen. This permits avoiding downloading some files twice albeit at some performance cost. Possible values are: \"\", \"SHA256\".").
Default("").EnumVar(&rc.hashFunc, "SHA256", "")

Expand Down
Binary file added docs/img/distributed-execution-proposal-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
86 changes: 86 additions & 0 deletions docs/proposals-accepted/202209-receive-tenant-external-labels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
type: proposal
title: Allow statically specifying tenant-specific external labels in Receivers
status: accepted
owner: haanhvu
menu: proposals-accepted
---

## 1 Related links/tickets

https://github.com/thanos-io/thanos/issues/5434

## 2 Why

We would like to do cross-tenant activities like grouping tenants' blocks or querying tenants that share the same attributes. Tenant's external labels can help us do those.

## 3 Pitfalls of the current solution

Currently, we can only add external labels to Receiver itself, not to each tenant in the Receiver. So we can't do cross-tenant activities like grouping tenants' blocks or querying tenants that share the same attributes.

## 4 Goals

* Allow users to statically add arbitrary tenants’ external labels in the easiest way possible
* Allow users to statically change arbitrary tenants’ external labels in the easiest way possible
* Changes in tenants’ external labels are handled correctly
* Backward compatibility (e.g., with Receiver’s external labels) is assured
* Tenants’ external labels are handled separately in RouterIngestor, RouterOnly, and IngestorOnly modes

## 5 Non-goals

* Logically split RouterOnly and IngestorOnly modes (Issue [#5643](https://github.com/thanos-io/thanos/issues/5643)): If RouterOnly and IngestorOnly modes are logically split, implementing tenants’ external labels in RouterOnly and IngestorOnly modes would be less challenging. However, fixing this issue will not be a goal of this proposal, because it's not directly related to tenants' external labels. Regardless, fixing this issue before implementing tenants’ external labels in RouterOnly and IngestorOnly modes would be the best-case scenario.
* Dynamically extract tenants' external labels from time series' data: This proposal only covers statically specifying tenants' external labels. Dynamically receiving and extracting tenants' external labels from time series' data will be added as a follow-up to this proposal.

## 6 Audience

Users who are admin personas and need to perform admin operations on Thanos for multiple tenants

## 7 How

In the hashring config, there will be new field `external_labels`. Something like this:

```
[
{
"hashring": "tenant-a-b",
"endpoints": ["127.0.0.1:10901"],
"tenants": ["tenant-a, tenant-b"]
"external_labels": ["key1=value1", "key2=value2", "key3=value3"]
},
]
```

In Receivers' MultiTSDB, external labels will be extended to each corresponding tenant's label set when the tenant's TSDB is started.

Next thing we have to do is handling changes for tenants' external labels. That is, whenever users make any changes to tenants' external labels, Receivers' MultiTSDB will update those changes in each corresponding tenant's label set.

We will handle the cases of hard tenancy first. Once tenants' external labels can be handled in those cases, we will move to soft tenancy cases.

Tenants’ external labels will be first implemented in RouterIngestor, since this is the most commonly used mode.

After that, we can implement tenants’ external labels in RouterOnly and IngestorOnly modes. As stated above, the best-case scenario would be logically splitting RouterOnly and IngestorOnl (Issue [#5643](https://github.com/thanos-io/thanos/issues/5643)) before implement tenants’ external labels in each.

For the tests, the foremost ones are testing defining one or multiple tenants’ external labels correctly, handling changes in tenants’ external labels correctly, backward compatibility with Receiver’s external labels, and shipper detecting and uploading tenants’ external labels correctly to block storage. We may add more tests in the future but currently these are the most important ones to do first.

## 8 Implementation plan

* Add a new `external_labels` field in the hashring config
* Allow MultiTSDB to extend external labels to each corresponding tenant's label set
* Allow MultiTSDB to update each tenant's label set whenever its external labels change
* Handle external labels in soft tenancy cases
* Implement tenants’ external labels in RouterOnly
* Implement tenants’ external labels in IngestorOnly

### 9 Test plan

* Defining one or multiple tenants’ external labels correctly
* Handling changes in tenants’ external labels correctly
* Backward compatibility with Receiver’s external labels
* Shipper detecting and uploading tenants’ external labels correctly to block storage

### 10 Follow-up

* Dynamically extract tenants' external labels from time series' data: Once statically specifying tenants' external labels have been implemented and tested successfully and completely, we can think of implementing dynamically receiving and extracting tenants' external labels from time series' data.
* Automatically making use of tenants' external labels: We can think of the most useful use cases with tenants' external labels and whether we should automate any of those use cases. One typical case is automatically grouping new blocks based on tenants' external labels.

(Both of these are Ben's ideas expressed [here](https://github.com/thanos-io/thanos/pull/5720#pullrequestreview-1167923565).)
30 changes: 30 additions & 0 deletions docs/proposals-accepted/202301-distributed-query-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,36 @@ Store groups can be created by either partitioning TSDBs by time (time-based par

<img src="../img/distributed-execution-proposal-5.png" alt="Distributed query execution" width="400"/>

### Distributed execution against Receive components

We currently lack the mechanism to configure a Querier against a subset of TSDBs, unless that Querier is exclusively attached to Stores that have those TSDBs. In the case of Receivers, TSDBs are created and pruned dynamically, which makes it hard to apply the distributed query model against this component.

To resolve this issue, this proposal suggests adding a `"selector.relabel-config` command-line flag to the Query component that will work the same way as the Store Gateway selector works. For each query, the Querier will apply the given relabel config against each Store's external label set and decide whether to keep or drop a TSDB from the query. After the relabeling is applied, the query will be rewritten to target only those TSDBs that match the selector.

An example config that only targets TSDBs with external labels `tenant=a` would be:

```
- source_labels: [tenant]
action: keep
regex: a
```

With this mechanism, a user can run a pool of Queriers with a selector config as follows:

```
- source_labels: [ext_label_a, ext_label_b]
action: hashmod
target_label: query_shard
modulus: ${query_shard_replicas}
- action: keep
source_labels: [query_shard]
regex: ${query_shard_instance}
```

<img src="../img/distributed-execution-proposal-6.png">

This approach can also be used to create Querier shards against Store Gateways, or any other pool of Store components.

## 7 Alternatives

A viable alternative to the proposed method is to add support for Query Pushdown in the Thanos Querier. By extracting better as described in https://github.com/thanos-io/thanos/issues/5984, we can decide to execute a query in a local Querier, similar to how the sidecar does that against Prometheus.
Expand Down
2 changes: 1 addition & 1 deletion docs/release-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Release shepherd responsibilities:
|---------|----------------------|-------------------------------|
| v0.32.0 | (planned) 2023.03.09 | No one ATM |
| v0.31.0 | (planned) 2023.01.26 | No one ATM |
| v0.30.0 | (planned) 2022.12.15 | `@bwplotka` |
| v0.30.0 | 2022.12.21 | `@bwplotka` |
| v0.29.0 | 2022.10.21 | `@GiedriusS` |
| v0.28.0 | 2022.08.22 | `@yeya24` |
| v0.27.0 | 2022.06.21 | `@wiardvanrij` and `@matej-g` |
Expand Down
2 changes: 1 addition & 1 deletion docs/storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -644,7 +644,7 @@ total 2209344
drwxr-xr-x 2 bwplotka bwplotka 4096 Dec 10 2019 chunks
-rw-r--r-- 1 bwplotka bwplotka 1962383742 Dec 10 2019 index
-rw-r--r-- 1 bwplotka bwplotka 6761 Dec 10 2019 meta.json
-rw-r--r-- 1 bwplotka bwplotka 111 Dec 10 2019 delete-mark.json # <-- Optional marker.
-rw-r--r-- 1 bwplotka bwplotka 111 Dec 10 2019 deletion-mark.json # <-- Optional marker.
-rw-r--r-- 1 bwplotka bwplotka 124 Dec 10 2019 no-compact-mark.json # <-- Optional marker.

01DN3SK96XDAEKRB1AN30AAW6E/chunks:
Expand Down
Loading