Skip to content

Commit

Permalink
Ruler: protect overrides map with mutex when accessing tenant configs (
Browse files Browse the repository at this point in the history
…#11601)

**What this PR does / why we need it**:
A ruler handling many hundreds of rules can provoke a situation where
the WAL appender reads & modifies tenant configs concurrently in an
unsafe way; this PR protects that with a mutex.

**Which issue(s) this PR fixes**:
Fixes #11569

(cherry picked from commit cd3cf62)
  • Loading branch information
Danny Kopping authored and grafana-delivery-bot[bot] committed Jan 8, 2024
1 parent c601756 commit f13f143
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 1 deletion.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
* [10375](https://github.com/grafana/loki/pull/10375) **trevorwhitney**: Fix ingester query when getting label values by passing matchers
* [11433](https://github.com/grafana/loki/pull/11433) Upgrade otelhttp from 0.40.0 -> 0.44.0 and base alpine image from 3.18.3 -> 3.18.5 to fix a few CVES (CVE-2023-45142, CVE-2022-21698, CVE-2023-5363).

* [11074](https://github.com/grafana/loki/pull/11074) **hainenber** Fix panic in lambda-promtail due to mishandling of empty DROP_LABELS env var.
* [11195](https://github.com/grafana/loki/pull/11195) **canuteson** Generate tsdb_shipper storage_config even if using_boltdb_shipper is false
* [11551](https://github.com/grafana/loki/pull/11551) **dannykopping** Do not reflect label names in request metrics' "route" label.
* [11601](https://github.com/grafana/loki/pull/11601) **dannykopping** Ruler: Fixed a panic that can be caused by concurrent read-write access of tenant configs when there are a large amount of rules.

##### Changes

Expand Down
7 changes: 6 additions & 1 deletion pkg/ruler/registry.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"fmt"
"net/url"
"strings"
"sync"
"time"

"github.com/go-kit/log"
Expand Down Expand Up @@ -33,7 +34,8 @@ type walRegistry struct {
logger log.Logger
manager instance.Manager

metrics *storageRegistryMetrics
metrics *storageRegistryMetrics
overridesMu sync.Mutex

config Config
overrides RulesLimits
Expand Down Expand Up @@ -223,6 +225,9 @@ func (r *walRegistry) getTenantConfig(tenant string) (instance.Config, error) {
}

func (r *walRegistry) getTenantRemoteWriteConfig(tenant string, base RemoteWriteConfig) (*RemoteWriteConfig, error) {
r.overridesMu.Lock()
defer r.overridesMu.Unlock()

overrides, err := base.Clone()
if err != nil {
return nil, fmt.Errorf("error generating tenant remote-write config: %w", err)
Expand Down
27 changes: 27 additions & 0 deletions pkg/ruler/registry_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"fmt"
"net/url"
"strings"
"sync"
"testing"
"time"

Expand Down Expand Up @@ -376,6 +377,32 @@ func TestTenantRemoteWriteConfigWithOverride(t *testing.T) {
assert.ElementsMatch(t, actual, expected, "QueueConfig capacity do not match")
}

func TestTenantRemoteWriteConfigWithOverrideConcurrentAccess(t *testing.T) {
require.NotPanics(t, func() {
reg := setupRegistry(t, cfg, newFakeLimits())
var wg sync.WaitGroup
for i := 0; i < 1000; i++ {
wg.Add(1)
go func(reg *walRegistry) {
defer wg.Done()

_, err := reg.getTenantConfig(enabledRWTenant)
require.NoError(t, err)
}(reg)

wg.Add(1)
go func(reg *walRegistry) {
defer wg.Done()

_, err := reg.getTenantConfig(additionalHeadersRWTenant)
require.NoError(t, err)
}(reg)
}

wg.Wait()
})
}

func TestTenantRemoteWriteConfigWithoutOverride(t *testing.T) {
reg := setupRegistry(t, backCompatCfg, newFakeLimitsBackwardCompat())

Expand Down

0 comments on commit f13f143

Please sign in to comment.