Skip to content

Commit

Permalink
Implement lazy retrieval of series from object store. (thanos-io#5837)
Browse files Browse the repository at this point in the history
* Implement lazy retrieval of series from object store.

The bucket store fetches series in a single blocking operation from
object storage. This is likely not an ideal strategy when it comes to
latency and resource usage. In addition, it causes the store to buffer
everything in memory before starting to send results to queriers.

This commit modifies the series retrieval to use the proxy response heap
and take advantage of the k-way merge used in the proxy store.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add batching

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Preload series in batches

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Emit proper stats

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Extract block series client

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Fix CI

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Address review comments

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Use emptyPostingsCount in lazyRespSet

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Reuse chunk metas

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Avoid overallocating for small responses

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Add metric for chunk fetch time

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Regroup imports

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Change counter to uint64

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
  • Loading branch information
fpetkovski authored Nov 28, 2022
1 parent 46873aa commit 39fa005
Show file tree
Hide file tree
Showing 7 changed files with 371 additions and 235 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ kube/.minikube
data/
test/e2e/e2e_*

# Ignore benchmarks dir.
benchmarks/

# Ignore promu artifacts.
/.build
/.release
Expand Down
2 changes: 2 additions & 0 deletions cmd/thanos/query.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ import (

v1 "github.com/prometheus/prometheus/web/api/v1"
"github.com/thanos-community/promql-engine/engine"

apiv1 "github.com/thanos-io/thanos/pkg/api/query"
"github.com/thanos-io/thanos/pkg/compact/downsample"
"github.com/thanos-io/thanos/pkg/component"
Expand Down Expand Up @@ -97,6 +98,7 @@ func registerQuery(app *extkingpin.App) {

queryTimeout := extkingpin.ModelDuration(cmd.Flag("query.timeout", "Maximum time to process query by query node.").
Default("2m"))

promqlEngine := cmd.Flag("query.promql-engine", "PromQL engine to use.").Default(string(promqlEnginePrometheus)).
Enum(string(promqlEnginePrometheus), string(promqlEngineThanos))

Expand Down
6 changes: 6 additions & 0 deletions cmd/thanos/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ package main
import (
"context"
"fmt"
"strconv"
"time"

"github.com/alecthomas/units"
Expand Down Expand Up @@ -56,6 +57,7 @@ type storeConfig struct {
httpConfig httpConfig
indexCacheSizeBytes units.Base2Bytes
chunkPoolSize units.Base2Bytes
seriesBatchSize int
maxSampleCount uint64
maxTouchedSeriesCount uint64
maxDownloadedBytes units.Base2Bytes
Expand Down Expand Up @@ -129,6 +131,9 @@ func (sc *storeConfig) registerFlag(cmd extkingpin.FlagClause) {
cmd.Flag("block-meta-fetch-concurrency", "Number of goroutines to use when fetching block metadata from object storage.").
Default("32").IntVar(&sc.blockMetaFetchConcurrency)

cmd.Flag("debug.series-batch-size", "The batch size when fetching series from TSDB blocks. Setting the number too high can lead to slower retrieval, while setting it too low can lead to throttling caused by too many calls made to object storage.").
Hidden().Default(strconv.Itoa(store.SeriesBatchSize)).IntVar(&sc.seriesBatchSize)

sc.filterConf = &store.FilterConfig{}

cmd.Flag("min-time", "Start of time range limit to serve. Thanos Store will serve only metrics, which happened later than this value. Option can be a constant time in RFC3339 format or time duration relative to current time, such as -1d or 2h45m. Valid duration units are ms, s, m, h, d, w, y.").
Expand Down Expand Up @@ -340,6 +345,7 @@ func runStore(
store.WithChunkPool(chunkPool),
store.WithFilterConfig(conf.filterConf),
store.WithChunkHashCalculation(true),
store.WithSeriesBatchSize(conf.seriesBatchSize),
}

if conf.debugLogging {
Expand Down
Loading

0 comments on commit 39fa005

Please sign in to comment.