tsdb: expand mem per worker based on sql pool size #74662

dhartunian · 2022-01-11T00:30:28Z

Previously, the memory limit for all tsdb workers was set at a static
64MiB. This cap created issues seen in #24018 where this limit was hit
on a 30 node cluster. To alleviate the issue, the number of workers was
reduced, raising the per-worker allocation.

We've currently hit this limit again as part of load testing with larger
clusters and have decided to make the per-query worker memory limit
dynamic. The per-worker limit is now raised based on the amount of memory
available to the SQL Pool via the MemoryPoolSize configuration
variable. This is set to be 25% of the system memory by default. The
tsdb memory cap per-worker is now doubled until it reaches 1/128 of
the memory pool setting.

For example, on a node with 128 - 256 GiB of memory, this will
correspond to 512 MiB allocated for all running tsdb queries.

In addition, the ts server is now connected to the same BytesMonitor
instance as the SQL memory monitor and workers will becapped at double
the query limit. Results are monitored as before but a cap is not
introduced there since we didn't have one present previously.

This behavior is gated behind a private cluster setting that's enabled
by default and sets the ratio at 1/128 of the SQL memory pool.

Resolves #72986

Release note (ops change): customers running clusters with 240 nodes or
more can effectively access tsdb metrics.

cockroach-teamcity · 2022-01-11T00:30:35Z

This change is

jordanlewis · 2022-01-11T02:11:22Z

I get a little nervous about overall server memory usage to see a change like this, though I suppose that before, none of the tsdb pools were accounted for whatsoever.

This might be a good time to add memory accounting to tsdb worker pools rather than keep them unaccounted. How many workers are there per node?

rimadeodhar

Given this PR will result in logic that could cause memory usage increase, it would be nice to gate it behind a feature flag so we can quickly disable it as necessary. Although, based on the code below, seems like this logic is disabled by setting the QueryMemoryMax within cfg. Not sure how dynamically configurable that value is, but if it is dynamic, then maybe feature flag might not be needed.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)

pkg/ts/server.go, line 126 at r1 (raw file):

	}
	if cfg.QueryMemoryMax != 0 {
		queryMemoryMax = cfg.QueryMemoryMax

So if there is a config setting, the doubling logic will not apply. Is that intentional? When is the QueryMemoryMax set within cfg?

Code quote:

	if cfg.QueryMemoryMax != 0 {
		queryMemoryMax = cfg.QueryMemoryMax
	}

pkg/ts/server_test.go, line 307 at r1 (raw file):

	}{
		{poolSize: int64(2 * GiB), expectedTsDBWorkerMax: int64(64 * MiB)},
		{poolSize: int64(32 * GiB), expectedTsDBWorkerMax: int64(256 * MiB)},

How is the expectedTsDBWorkerMax computed in relation to the poolsize, shouldn't it be 1/128th of the poolsize? What am I missing?

Code quote:

		{poolSize: int64(2 * GiB), expectedTsDBWorkerMax: int64(64 * MiB)},
		{poolSize: int64(32 * GiB), expectedTsDBWorkerMax: int64(256 * MiB)},
		{poolSize: int64(48 * GiB), expectedTsDBWorkerMax: int64(512 * MiB)},

srosenberg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)

pkg/ts/server.go, line 119 at r1 (raw file):

	queryMemoryMax := queryMemoryMax
	// Double size until we hit 1/128 of the memory pool setting. Our
	// typical default here is 64 MiB which corresponds to a pool of 8 GiB

That seems like a lot since we'd also need to account for the multiplier, i.e., QueryWorkerMax. What's the absolute max for TSDB in this example? I echo Jordan's comment that we should consider limiting the total memory allocated to TSDB (across all workers).

pkg/ts/server.go, line 122 at r1 (raw file):

	// which corresponds to 32 GiB of system memory (assuming a default
	// setting of 25% for the pool).
	for queryMemoryMax < memoryPoolSize/128 {

You could replace the for-loop with the equivalent,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128)

srosenberg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)

pkg/ts/server.go, line 122 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

You could replace the for-loop with the equivalent,
queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128)

Correcting the above typo,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128/queryMemoryMax)

dhartunian

This might be a good time to add memory accounting to tsdb worker pools rather than keep them unaccounted. How many workers are there per node?

@jordanlewis there are 8 workers per node but I now realize the limit is for all workers, not per-worker. Adjusted commit/PR message to reflect this.

I've hooked it into the SQL Memory monitor in the latest update.

Given this PR will result in logic that could cause memory usage increase, it would be nice to gate it behind a feature flag so we can quickly disable it as necessary. Although, based on the code below, seems like this logic is disabled by setting the QueryMemoryMax within cfg. Not sure how dynamically configurable that value is, but if it is dynamic, then maybe feature flag might not be needed.

@rimadeodhar good idea. I'm gating behind a private cluster setting that's enabled by default

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @rimadeodhar, @srosenberg, and @tbg)

pkg/ts/server.go, line 119 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

That seems like a lot since we'd also need to account for the multiplier, i.e., QueryWorkerMax. What's the absolute max for TSDB in this example? I echo Jordan's comment that we should consider limiting the total memory allocated to TSDB (across all workers).

I misunderstood initially, the cap is on all workers and is divided per-query by num workers down below.

pkg/ts/server.go, line 122 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Correcting the above typo,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128/queryMemoryMax)

neat! Had to do a bit of casting, do you know if that can be avoided here?

pkg/ts/server.go, line 126 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

So if there is a config setting, the doubling logic will not apply. Is that intentional? When is the QueryMemoryMax set within cfg?

Yes this is intentional because I assume if someone is manually configuring, they wouldn't expect the number to be arbitrarily modified.

This is currently only used in tests to exercise the chunking behavior with artificially low limits.

pkg/ts/server_test.go, line 307 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

How is the expectedTsDBWorkerMax computed in relation to the poolsize, shouldn't it be 1/128th of the poolsize? What am I missing?

It's not exactly 1/128 because we double until we cross that threshold.

2 GiB / 128 = 16 MiB so we stay at 64 MiB limit
32 GiB / 128 = 256 MiB which we hit exactly by doubling twice (64 * 2 * 2)
48 GiB / 128 = 384 MiB which we cross by doubling the example above one more time (64 * 2 * 2 * 2)

My latest change adjusts the logic slightly to take the floor of the log which keeps the 3rd example at 256 MiB limit.
See what you think of the logarithm-based formula, maybe that's easier to understand but I'm not sure.

srosenberg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @rimadeodhar, and @tbg)

pkg/ts/server.go, line 122 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

neat! Had to do a bit of casting, do you know if that can be avoided here?

Both memoryPoolSize and queryMemoryMax are powers of 2, right? So, the integer division is without remainder, i.e., equivalent to floating point division; thus, we can get away with one cast. I'd also use a temp. for better readability, e.g.,

maxAvail := memoryPoolSize/128
queryMemoryMax <<= int(math.Log2(maxAvail / queryMemoryMax)

tbg

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rimadeodhar)

pkg/ts/server.go, line 122 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Both memoryPoolSize and queryMemoryMax are powers of 2, right? So, the integer division is without remainder, i.e., equivalent to floating point division; thus, we can get away with one cast. I'd also use a temp. for better readability, e.g.,
maxAvail := memoryPoolSize/128
queryMemoryMax <<= int(math.Log2(maxAvail / queryMemoryMax)

There is no performance concern here, and I would (generally, everywhere where perf is not the issue) prefer we gravitate towards the clearest, not the most efficient, solution here.

rimadeodhar

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @nvanbenschoten)

-- commits, line 21 at r2:
Nit: missing space - be capped not becapped :).

-- commits, line 29 at r2:
This is a good question. I don't have any thoughts here to be honest. Ideally, this should be configurable so that the memory limit can be increased/decreased dynamically and we don't have to worry about it being hardcoded.

pkg/ts/server.go, line 51 at r2 (raw file):

)

var TSDBAutoMemoryGrowthEnabled = func() *settings.BoolSetting {

Wondering whether instead of making boolean, should we make it tsdb_worker_memory_cap or something like that whereby setting it to 0 disables it and can be increased dynamically to allow for a fraction value to be set? Could be set to 1/128 by default.
The pros are we can tweak the memory cap for individual clusters if we run into this issue again, the cons are the cluster setting becomes a bit unwieldy to use. My instinct is that this cluster setting will not be manipulated that frequently by end users but will mostly be used by DB and support engineers. So it might be better to make it more fine grained to allow for more control than just a simple boolean. What do you think?

dhartunian

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rimadeodhar)

-- commits, line 21 at r2:

Previously, rimadeodhar (Rima Deodhar) wrote…

Nit: missing space - be capped not becapped :).

Done.

-- commits, line 29 at r2:

Previously, rimadeodhar (Rima Deodhar) wrote…

This is a good question. I don't have any thoughts here to be honest. Ideally, this should be configurable so that the memory limit can be increased/decreased dynamically and we don't have to worry about it being hardcoded.

Added configuration.

pkg/ts/server.go, line 122 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

There is no performance concern here, and I would (generally, everywhere where perf is not the issue) prefer we gravitate towards the clearest, not the most efficient, solution here.

Thx for the reminder @tbg. I futzed with it a bit but still had to cast the arguments for math.Log2 so I'm sticking with the original formulation.

pkg/ts/server.go, line 51 at r2 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

Wondering whether instead of making boolean, should we make it tsdb_worker_memory_cap or something like that whereby setting it to 0 disables it and can be increased dynamically to allow for a fraction value to be set? Could be set to 1/128 by default.
The pros are we can tweak the memory cap for individual clusters if we run into this issue again, the cons are the cluster setting becomes a bit unwieldy to use. My instinct is that this cluster setting will not be manipulated that frequently by end users but will mostly be used by DB and support engineers. So it might be better to make it more fine grained to allow for more control than just a simple boolean. What do you think?

I like this idea. I made it an int, do you think that's too confusing? I worry about asking people to put in decimal values.

rimadeodhar

Reviewed 2 of 4 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

-- commits, line 21 at r4:
Nit: be capped

pkg/ts/server.go, line 51 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I like this idea. I made it an int, do you think that's too confusing? I worry about asking people to put in decimal values.

Nice. I think the explanation is good, so it should help avoid any confusion.

pkg/ts/server.go, line 148 at r4 (raw file):

	targetMemRatio := TSDBRatioOfSQLMemoryPool.Get(&settings.SV)
	if targetMemRatio > 0 {
		// Double size until we hit 1/128 of the memory pool setting. Our

Nit: Comment is not accurate anymore. Should be until we hit 1/targetMemRatio of the memory pool setting.

pkg/ts/server.go, line 152 at r4 (raw file):

		// which corresponds to 8 GiB of system memory (assuming a default
		// setting of 25% for the pool).
		memoryCap := memoryPoolSize / targetMemRatio

Since targetMemRatio is now configurable externally, should we confirm that it doesn't exceed the memoryPoolSize before doing this division. This way we prevent any unintentional behavior.

dhartunian · 2022-02-01T23:59:28Z

Thinking about this a little further I'm coming to the conclusion that the cluster setting is not the correct direction for this feature. I can see in the history of the code for the SQL memory flag, that this used to be a cluster setting but was turned into a command line flag since nodes may not be identical across a cluster. I think a similar approach maybe be useful here and we can set a --tsdb-max-mem command line flag that defaults to max(64MiB, 1% of memory) by default. This makes it easy to fold it into the same limit accounting the cache and SQL memory flags currently get used in and simplifies the formula instead of deriving it from another flag's value.

I was initially hesitant to pollute the command line with yet another flag but I wonder if perhaps this one can be silent or undocumented since it generally should not be touched and there's no query performance enhancement gain to tweaking it.

I'll check with server team about what determination to use when adding new flags.

knz · 2022-02-02T09:20:13Z

David, it looks like you have developed the right line of thinking about the motivation for a new command-line flag. The reasoning sounds good to me.

Sometimes @bdarnell has a more nuanced opinion (I'll let Ben chime in if he so desires) but IMHO it's warranted here.

Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. This PR introduces a new CLI flag `--max-tsdb-memory` to mirror the functionality of the `--max-sql-memory` flag by accepting bytes or a percentage of system memory. The default is set to be `1%` of system memory or 64 MiB, whichever is greater. This ensures that performance after this PR is equal or better for timeseries queries without eating too far into memory budgets for SQL. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. Resolves cockroachdb#72986 Release note (cli change, ops change): A new CLI flag `--max-tsdb-memory` is now available, that can set the memory budget for timeseries queries when processing requests from the Metrics page in DB Console. Most customers should not need to tweak this setting as the default of 1% of system memory or 64 MiB, whichever is greater, is adequate for most deployments. In the case where a deployment of hundreds of nodes has low per-node memory available (below 8 GiB for instance) it may be necessary to increase this value to `2%` or higher in order to render timeseries graphs for the cluster using the DB Console. Otherwise, the default settings will be adequate for the vast majority of deployments.

dhartunian

@rimadeodhar modified PR to use command line flag which makes the implementation quite simple and re-uses the existing config var to hold the value.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

bdarnell

We used to have problems detecting memory limits in containers. Are those fully resolved these days? If not, this flag would require action on the part of any deployments whose memory limits are not being detected properly. (and then again for the next such flag, etc).

Rather than recommending that anyone who sets --max-sql-memory and --cache also add each new memory flag, we should consider adding a --total-memory flag to override our detection of the memory limit so that the defaults of .25, .25, and .01 can be relied upon.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

dhartunian · 2022-02-14T20:54:30Z

We used to have problems detecting memory limits in containers. Are those fully resolved these days? If not, this flag would require action on the part of any deployments whose memory limits are not being detected properly. (and then again for the next such flag, etc).

I'll check on this but in this PR, we maintain the floor of 64MB which is what we allocate today so in the case of us detecting less memory than is available, there would not be a regression from how the binary behaves today.

Rather than recommending that anyone who sets --max-sql-memory and --cache also add each new memory flag, we should consider adding a --total-memory flag to override our detection of the memory limit so that the defaults of .25, .25, and .01 can be relied upon.

Thanks for the suggestion @bdarnell, tracking the CLI feature here #76535

bdarnell · 2022-02-14T21:02:04Z

in the case of us detecting less memory than is available

The known problems were in the other direction: we'd either see the total physical memory of the machine without regard to any limits placed on the container, or we'd get some completely implausible number (exabytes). These would result in OOMs as the system (especially the --cache flag thought there was much more memory available than their actually was. That's less of a problem for this flag since it defaults to 0.01 instead of 0.25, but if we're likely to actually use all the memory allowed here it could go too high.

rimadeodhar

Reviewed 9 of 9 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

dhartunian · 2022-02-15T15:47:54Z

bors r+

craig · 2022-02-15T17:42:02Z

Build succeeded:

GitHub CI (Cockroach)

dhartunian requested review from nvanbenschoten and a team January 11, 2022 00:30

dhartunian requested review from a team as code owners January 11, 2022 00:30

dhartunian requested review from tbg and removed request for a team January 11, 2022 00:30

rimadeodhar reviewed Jan 11, 2022

View reviewed changes

srosenberg reviewed Jan 12, 2022

View reviewed changes

dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch from ea73a12 to 10f488b Compare January 14, 2022 19:02

dhartunian commented Jan 14, 2022

View reviewed changes

srosenberg reviewed Jan 14, 2022

View reviewed changes

tbg requested a review from rimadeodhar January 17, 2022 08:03

tbg reviewed Jan 17, 2022

View reviewed changes

rimadeodhar requested changes Jan 19, 2022

View reviewed changes

dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch from 10f488b to bcaec7e Compare January 27, 2022 00:20

dhartunian commented Jan 27, 2022

View reviewed changes

dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch 2 times, most recently from 322b83b to 755355b Compare January 31, 2022 20:57

rimadeodhar reviewed Jan 31, 2022

View reviewed changes

dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch from 755355b to 97150df Compare February 2, 2022 22:00

dhartunian requested a review from a team as a code owner February 2, 2022 22:00

dhartunian commented Feb 2, 2022

View reviewed changes

dhartunian requested a review from rimadeodhar February 9, 2022 22:06

bdarnell reviewed Feb 14, 2022

View reviewed changes

dhartunian mentioned this pull request Feb 14, 2022

server: add --total-memory flag to ease memory configuration #76535

Closed

rimadeodhar approved these changes Feb 14, 2022

View reviewed changes

craig bot merged commit 9132eac into cockroachdb:master Feb 15, 2022

cockroach-teamcity mentioned this pull request Feb 15, 2022

tsdb: expand mem per worker based on sql pool size cockroachdb/docs#13031

Closed

knz mentioned this pull request Mar 10, 2022

server: auto-derive --cache/--max-sql-mem from single allowable memory flag --total-memory #8260

Closed

dhartunian deleted the expand-memory-cap-for-tsdb-queries branch November 14, 2022 16:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tsdb: expand mem per worker based on sql pool size #74662

tsdb: expand mem per worker based on sql pool size #74662

dhartunian commented Jan 11, 2022 •

edited

Loading

cockroach-teamcity commented Jan 11, 2022

jordanlewis commented Jan 11, 2022

rimadeodhar left a comment

srosenberg left a comment

srosenberg left a comment

dhartunian left a comment

srosenberg left a comment

tbg left a comment

rimadeodhar left a comment

dhartunian left a comment

rimadeodhar left a comment

dhartunian commented Feb 1, 2022

knz commented Feb 2, 2022

dhartunian left a comment

bdarnell left a comment

dhartunian commented Feb 14, 2022

bdarnell commented Feb 14, 2022

rimadeodhar left a comment

dhartunian commented Feb 15, 2022

craig bot commented Feb 15, 2022

tsdb: expand mem per worker based on sql pool size #74662

tsdb: expand mem per worker based on sql pool size #74662

Conversation

dhartunian commented Jan 11, 2022 • edited Loading

cockroach-teamcity commented Jan 11, 2022

jordanlewis commented Jan 11, 2022

rimadeodhar left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

dhartunian left a comment

Choose a reason for hiding this comment

srosenberg left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

rimadeodhar left a comment

Choose a reason for hiding this comment

dhartunian left a comment

Choose a reason for hiding this comment

rimadeodhar left a comment

Choose a reason for hiding this comment

dhartunian commented Feb 1, 2022

knz commented Feb 2, 2022

dhartunian left a comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

dhartunian commented Feb 14, 2022

bdarnell commented Feb 14, 2022

rimadeodhar left a comment

Choose a reason for hiding this comment

dhartunian commented Feb 15, 2022

craig bot commented Feb 15, 2022

dhartunian commented Jan 11, 2022 •

edited

Loading