Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsdb: expand mem per worker based on sql pool size #74662

Merged

Conversation

dhartunian
Copy link
Collaborator

@dhartunian dhartunian commented Jan 11, 2022

Previously, the memory limit for all tsdb workers was set at a static
64MiB. This cap created issues seen in #24018 where this limit was hit
on a 30 node cluster. To alleviate the issue, the number of workers was
reduced, raising the per-worker allocation.

We've currently hit this limit again as part of load testing with larger
clusters and have decided to make the per-query worker memory limit
dynamic. The per-worker limit is now raised based on the amount of memory
available to the SQL Pool via the MemoryPoolSize configuration
variable. This is set to be 25% of the system memory by default. The
tsdb memory cap per-worker is now doubled until it reaches 1/128 of
the memory pool setting.

For example, on a node with 128 - 256 GiB of memory, this will
correspond to 512 MiB allocated for all running tsdb queries.

In addition, the ts server is now connected to the same BytesMonitor
instance as the SQL memory monitor and workers will becapped at double
the query limit. Results are monitored as before but a cap is not
introduced there since we didn't have one present previously.

This behavior is gated behind a private cluster setting that's enabled
by default and sets the ratio at 1/128 of the SQL memory pool.

Resolves #72986

Release note (ops change): customers running clusters with 240 nodes or
more can effectively access tsdb metrics.

@dhartunian dhartunian requested review from nvanbenschoten and a team January 11, 2022 00:30
@dhartunian dhartunian requested review from a team as code owners January 11, 2022 00:30
@dhartunian dhartunian requested review from tbg and removed request for a team January 11, 2022 00:30
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis
Copy link
Member

I get a little nervous about overall server memory usage to see a change like this, though I suppose that before, none of the tsdb pools were accounted for whatsoever.

This might be a good time to add memory accounting to tsdb worker pools rather than keep them unaccounted. How many workers are there per node?

Copy link
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this PR will result in logic that could cause memory usage increase, it would be nice to gate it behind a feature flag so we can quickly disable it as necessary. Although, based on the code below, seems like this logic is disabled by setting the QueryMemoryMax within cfg. Not sure how dynamically configurable that value is, but if it is dynamic, then maybe feature flag might not be needed.

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)


pkg/ts/server.go, line 126 at r1 (raw file):

	}
	if cfg.QueryMemoryMax != 0 {
		queryMemoryMax = cfg.QueryMemoryMax

So if there is a config setting, the doubling logic will not apply. Is that intentional? When is the QueryMemoryMax set within cfg?

Code quote:

	if cfg.QueryMemoryMax != 0 {
		queryMemoryMax = cfg.QueryMemoryMax
	}

pkg/ts/server_test.go, line 307 at r1 (raw file):

	}{
		{poolSize: int64(2 * GiB), expectedTsDBWorkerMax: int64(64 * MiB)},
		{poolSize: int64(32 * GiB), expectedTsDBWorkerMax: int64(256 * MiB)},

How is the expectedTsDBWorkerMax computed in relation to the poolsize, shouldn't it be 1/128th of the poolsize? What am I missing?

Code quote:

		{poolSize: int64(2 * GiB), expectedTsDBWorkerMax: int64(64 * MiB)},
		{poolSize: int64(32 * GiB), expectedTsDBWorkerMax: int64(256 * MiB)},
		{poolSize: int64(48 * GiB), expectedTsDBWorkerMax: int64(512 * MiB)},

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)


pkg/ts/server.go, line 119 at r1 (raw file):

	queryMemoryMax := queryMemoryMax
	// Double size until we hit 1/128 of the memory pool setting. Our
	// typical default here is 64 MiB which corresponds to a pool of 8 GiB

That seems like a lot since we'd also need to account for the multiplier, i.e., QueryWorkerMax. What's the absolute max for TSDB in this example? I echo Jordan's comment that we should consider limiting the total memory allocated to TSDB (across all workers).


pkg/ts/server.go, line 122 at r1 (raw file):

	// which corresponds to 32 GiB of system memory (assuming a default
	// setting of 25% for the pool).
	for queryMemoryMax < memoryPoolSize/128 {

You could replace the for-loop with the equivalent,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128)

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)


pkg/ts/server.go, line 122 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

You could replace the for-loop with the equivalent,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128)

Correcting the above typo,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128/queryMemoryMax)

@dhartunian dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch from ea73a12 to 10f488b Compare January 14, 2022 19:02
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a good time to add memory accounting to tsdb worker pools rather than keep them unaccounted. How many workers are there per node?

@jordanlewis there are 8 workers per node but I now realize the limit is for all workers, not per-worker. Adjusted commit/PR message to reflect this.

I've hooked it into the SQL Memory monitor in the latest update.

Given this PR will result in logic that could cause memory usage increase, it would be nice to gate it behind a feature flag so we can quickly disable it as necessary. Although, based on the code below, seems like this logic is disabled by setting the QueryMemoryMax within cfg. Not sure how dynamically configurable that value is, but if it is dynamic, then maybe feature flag might not be needed.

@rimadeodhar good idea. I'm gating behind a private cluster setting that's enabled by default

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @rimadeodhar, @srosenberg, and @tbg)


pkg/ts/server.go, line 119 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

That seems like a lot since we'd also need to account for the multiplier, i.e., QueryWorkerMax. What's the absolute max for TSDB in this example? I echo Jordan's comment that we should consider limiting the total memory allocated to TSDB (across all workers).

I misunderstood initially, the cap is on all workers and is divided per-query by num workers down below.


pkg/ts/server.go, line 122 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Correcting the above typo,

queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128/queryMemoryMax)

neat! Had to do a bit of casting, do you know if that can be avoided here?


pkg/ts/server.go, line 126 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

So if there is a config setting, the doubling logic will not apply. Is that intentional? When is the QueryMemoryMax set within cfg?

Yes this is intentional because I assume if someone is manually configuring, they wouldn't expect the number to be arbitrarily modified.

This is currently only used in tests to exercise the chunking behavior with artificially low limits.


pkg/ts/server_test.go, line 307 at r1 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

How is the expectedTsDBWorkerMax computed in relation to the poolsize, shouldn't it be 1/128th of the poolsize? What am I missing?

It's not exactly 1/128 because we double until we cross that threshold.

2 GiB / 128 = 16 MiB so we stay at 64 MiB limit
32 GiB / 128 = 256 MiB which we hit exactly by doubling twice (64 * 2 * 2)
48 GiB / 128 = 384 MiB which we cross by doubling the example above one more time (64 * 2 * 2 * 2)

My latest change adjusts the logic slightly to take the floor of the log which keeps the 3rd example at 256 MiB limit.
See what you think of the logarithm-based formula, maybe that's easier to understand but I'm not sure.

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @rimadeodhar, and @tbg)


pkg/ts/server.go, line 122 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

neat! Had to do a bit of casting, do you know if that can be avoided here?

Both memoryPoolSize and queryMemoryMax are powers of 2, right? So, the integer division is without remainder, i.e., equivalent to floating point division; thus, we can get away with one cast. I'd also use a temp. for better readability, e.g.,

maxAvail := memoryPoolSize/128
queryMemoryMax <<= int(math.Log2(maxAvail / queryMemoryMax)

@tbg tbg requested a review from rimadeodhar January 17, 2022 08:03
Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rimadeodhar)


pkg/ts/server.go, line 122 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Both memoryPoolSize and queryMemoryMax are powers of 2, right? So, the integer division is without remainder, i.e., equivalent to floating point division; thus, we can get away with one cast. I'd also use a temp. for better readability, e.g.,

maxAvail := memoryPoolSize/128
queryMemoryMax <<= int(math.Log2(maxAvail / queryMemoryMax)

There is no performance concern here, and I would (generally, everywhere where perf is not the issue) prefer we gravitate towards the clearest, not the most efficient, solution here.

Copy link
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @nvanbenschoten)


-- commits, line 21 at r2:
Nit: missing space - be capped not becapped :).


-- commits, line 29 at r2:
This is a good question. I don't have any thoughts here to be honest. Ideally, this should be configurable so that the memory limit can be increased/decreased dynamically and we don't have to worry about it being hardcoded.


pkg/ts/server.go, line 51 at r2 (raw file):

)

var TSDBAutoMemoryGrowthEnabled = func() *settings.BoolSetting {

Wondering whether instead of making boolean, should we make it tsdb_worker_memory_cap or something like that whereby setting it to 0 disables it and can be increased dynamically to allow for a fraction value to be set? Could be set to 1/128 by default.
The pros are we can tweak the memory cap for individual clusters if we run into this issue again, the cons are the cluster setting becomes a bit unwieldy to use. My instinct is that this cluster setting will not be manipulated that frequently by end users but will mostly be used by DB and support engineers. So it might be better to make it more fine grained to allow for more control than just a simple boolean. What do you think?

@dhartunian dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch from 10f488b to bcaec7e Compare January 27, 2022 00:20
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rimadeodhar)


-- commits, line 21 at r2:

Previously, rimadeodhar (Rima Deodhar) wrote…

Nit: missing space - be capped not becapped :).

Done.


-- commits, line 29 at r2:

Previously, rimadeodhar (Rima Deodhar) wrote…

This is a good question. I don't have any thoughts here to be honest. Ideally, this should be configurable so that the memory limit can be increased/decreased dynamically and we don't have to worry about it being hardcoded.

Added configuration.


pkg/ts/server.go, line 122 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

There is no performance concern here, and I would (generally, everywhere where perf is not the issue) prefer we gravitate towards the clearest, not the most efficient, solution here.

Thx for the reminder @tbg. I futzed with it a bit but still had to cast the arguments for math.Log2 so I'm sticking with the original formulation.


pkg/ts/server.go, line 51 at r2 (raw file):

Previously, rimadeodhar (Rima Deodhar) wrote…

Wondering whether instead of making boolean, should we make it tsdb_worker_memory_cap or something like that whereby setting it to 0 disables it and can be increased dynamically to allow for a fraction value to be set? Could be set to 1/128 by default.
The pros are we can tweak the memory cap for individual clusters if we run into this issue again, the cons are the cluster setting becomes a bit unwieldy to use. My instinct is that this cluster setting will not be manipulated that frequently by end users but will mostly be used by DB and support engineers. So it might be better to make it more fine grained to allow for more control than just a simple boolean. What do you think?

I like this idea. I made it an int, do you think that's too confusing? I worry about asking people to put in decimal values.

@dhartunian dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch 2 times, most recently from 322b83b to 755355b Compare January 31, 2022 20:57
Copy link
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)


-- commits, line 21 at r4:
Nit: be capped


pkg/ts/server.go, line 51 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I like this idea. I made it an int, do you think that's too confusing? I worry about asking people to put in decimal values.

Nice. I think the explanation is good, so it should help avoid any confusion.


pkg/ts/server.go, line 148 at r4 (raw file):

	targetMemRatio := TSDBRatioOfSQLMemoryPool.Get(&settings.SV)
	if targetMemRatio > 0 {
		// Double size until we hit 1/128 of the memory pool setting. Our

Nit: Comment is not accurate anymore. Should be until we hit 1/targetMemRatio of the memory pool setting.


pkg/ts/server.go, line 152 at r4 (raw file):

		// which corresponds to 8 GiB of system memory (assuming a default
		// setting of 25% for the pool).
		memoryCap := memoryPoolSize / targetMemRatio

Since targetMemRatio is now configurable externally, should we confirm that it doesn't exceed the memoryPoolSize before doing this division. This way we prevent any unintentional behavior.

@dhartunian
Copy link
Collaborator Author

Thinking about this a little further I'm coming to the conclusion that the cluster setting is not the correct direction for this feature. I can see in the history of the code for the SQL memory flag, that this used to be a cluster setting but was turned into a command line flag since nodes may not be identical across a cluster. I think a similar approach maybe be useful here and we can set a --tsdb-max-mem command line flag that defaults to max(64MiB, 1% of memory) by default. This makes it easy to fold it into the same limit accounting the cache and SQL memory flags currently get used in and simplifies the formula instead of deriving it from another flag's value.

I was initially hesitant to pollute the command line with yet another flag but I wonder if perhaps this one can be silent or undocumented since it generally should not be touched and there's no query performance enhancement gain to tweaking it.

I'll check with server team about what determination to use when adding new flags.

@knz
Copy link
Contributor

knz commented Feb 2, 2022

David, it looks like you have developed the right line of thinking about the motivation for a new command-line flag. The reasoning sounds good to me.

Sometimes @bdarnell has a more nuanced opinion (I'll let Ben chime in if he so desires) but IMHO it's warranted here.

Previously, the memory limit for all `tsdb` workers was set at a static
64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit
on a 30 node cluster. To alleviate the issue, the number of workers was
reduced, raising the per-worker allocation.

We've currently hit this limit again as part of load testing with larger
clusters and have decided to make the per-query worker memory limit
dynamic.

This PR introduces a new CLI flag `--max-tsdb-memory` to mirror the
functionality of the `--max-sql-memory` flag by accepting bytes or a
percentage of system memory. The default is set to be `1%` of system
memory or 64 MiB, whichever is greater. This ensures that performance
after this PR is equal or better for timeseries queries without eating
too far into memory budgets for SQL.

In addition, the ts server is now connected to the same `BytesMonitor`
instance as the SQL memory monitor and workers will becapped at double
the query limit. Results are monitored as before but a cap is not
introduced there since we didn't have one present previously.

Resolves cockroachdb#72986

Release note (cli change, ops change): A new CLI flag `--max-tsdb-memory`
is now available, that can set the memory budget for timeseries queries
when processing requests from the Metrics page in DB Console. Most
customers should not need to tweak this setting as the default of 1% of
system memory or 64 MiB, whichever is greater, is adequate for most
deployments. In the case where a deployment of hundreds of nodes has
low per-node memory available (below 8 GiB for instance) it may be
necessary to increase this value to `2%` or higher in order to render
timeseries graphs for the cluster using the DB Console. Otherwise, the
default settings will be adequate for the vast majority of deployments.
@dhartunian dhartunian force-pushed the expand-memory-cap-for-tsdb-queries branch from 755355b to 97150df Compare February 2, 2022 22:00
@dhartunian dhartunian requested a review from a team as a code owner February 2, 2022 22:00
Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rimadeodhar modified PR to use command line flag which makes the implementation quite simple and re-uses the existing config var to hold the value.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to have problems detecting memory limits in containers. Are those fully resolved these days? If not, this flag would require action on the part of any deployments whose memory limits are not being detected properly. (and then again for the next such flag, etc).

Rather than recommending that anyone who sets --max-sql-memory and --cache also add each new memory flag, we should consider adding a --total-memory flag to override our detection of the memory limit so that the defaults of .25, .25, and .01 can be relied upon.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

@dhartunian
Copy link
Collaborator Author

We used to have problems detecting memory limits in containers. Are those fully resolved these days? If not, this flag would require action on the part of any deployments whose memory limits are not being detected properly. (and then again for the next such flag, etc).

I'll check on this but in this PR, we maintain the floor of 64MB which is what we allocate today so in the case of us detecting less memory than is available, there would not be a regression from how the binary behaves today.

Rather than recommending that anyone who sets --max-sql-memory and --cache also add each new memory flag, we should consider adding a --total-memory flag to override our detection of the memory limit so that the defaults of .25, .25, and .01 can be relied upon.

Thanks for the suggestion @bdarnell, tracking the CLI feature here #76535

@bdarnell
Copy link
Contributor

in the case of us detecting less memory than is available

The known problems were in the other direction: we'd either see the total physical memory of the machine without regard to any limits placed on the container, or we'd get some completely implausible number (exabytes). These would result in OOMs as the system (especially the --cache flag thought there was much more memory available than their actually was. That's less of a problem for this flag since it defaults to 0.01 instead of 0.25, but if we're likely to actually use all the memory allowed here it could go too high.

Copy link
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 9 files at r5, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)

@dhartunian
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ui: graphs don't load on 240 node cluster
8 participants