-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tsdb: expand mem per worker based on sql pool size #74662
tsdb: expand mem per worker based on sql pool size #74662
Conversation
I get a little nervous about overall server memory usage to see a change like this, though I suppose that before, none of the tsdb pools were accounted for whatsoever. This might be a good time to add memory accounting to tsdb worker pools rather than keep them unaccounted. How many workers are there per node? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this PR will result in logic that could cause memory usage increase, it would be nice to gate it behind a feature flag so we can quickly disable it as necessary. Although, based on the code below, seems like this logic is disabled by setting the QueryMemoryMax within cfg. Not sure how dynamically configurable that value is, but if it is dynamic, then maybe feature flag might not be needed.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)
pkg/ts/server.go, line 126 at r1 (raw file):
} if cfg.QueryMemoryMax != 0 { queryMemoryMax = cfg.QueryMemoryMax
So if there is a config setting, the doubling logic will not apply. Is that intentional? When is the QueryMemoryMax
set within cfg
?
Code quote:
if cfg.QueryMemoryMax != 0 {
queryMemoryMax = cfg.QueryMemoryMax
}
pkg/ts/server_test.go, line 307 at r1 (raw file):
}{ {poolSize: int64(2 * GiB), expectedTsDBWorkerMax: int64(64 * MiB)}, {poolSize: int64(32 * GiB), expectedTsDBWorkerMax: int64(256 * MiB)},
How is the expectedTsDBWorkerMax computed in relation to the poolsize, shouldn't it be 1/128th of the poolsize? What am I missing?
Code quote:
{poolSize: int64(2 * GiB), expectedTsDBWorkerMax: int64(64 * MiB)},
{poolSize: int64(32 * GiB), expectedTsDBWorkerMax: int64(256 * MiB)},
{poolSize: int64(48 * GiB), expectedTsDBWorkerMax: int64(512 * MiB)},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)
pkg/ts/server.go, line 119 at r1 (raw file):
queryMemoryMax := queryMemoryMax // Double size until we hit 1/128 of the memory pool setting. Our // typical default here is 64 MiB which corresponds to a pool of 8 GiB
That seems like a lot since we'd also need to account for the multiplier, i.e., QueryWorkerMax
. What's the absolute max for TSDB in this example? I echo Jordan's comment that we should consider limiting the total memory allocated to TSDB (across all workers).
pkg/ts/server.go, line 122 at r1 (raw file):
// which corresponds to 32 GiB of system memory (assuming a default // setting of 25% for the pool). for queryMemoryMax < memoryPoolSize/128 {
You could replace the for-loop with the equivalent,
queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @tbg)
pkg/ts/server.go, line 122 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
You could replace the for-loop with the equivalent,
queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128)
Correcting the above typo,
queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128/queryMemoryMax)
ea73a12
to
10f488b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a good time to add memory accounting to tsdb worker pools rather than keep them unaccounted. How many workers are there per node?
@jordanlewis there are 8 workers per node but I now realize the limit is for all workers, not per-worker. Adjusted commit/PR message to reflect this.
I've hooked it into the SQL Memory monitor in the latest update.
Given this PR will result in logic that could cause memory usage increase, it would be nice to gate it behind a feature flag so we can quickly disable it as necessary. Although, based on the code below, seems like this logic is disabled by setting the QueryMemoryMax within cfg. Not sure how dynamically configurable that value is, but if it is dynamic, then maybe feature flag might not be needed.
@rimadeodhar good idea. I'm gating behind a private cluster setting that's enabled by default
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @rimadeodhar, @srosenberg, and @tbg)
pkg/ts/server.go, line 119 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
That seems like a lot since we'd also need to account for the multiplier, i.e.,
QueryWorkerMax
. What's the absolute max for TSDB in this example? I echo Jordan's comment that we should consider limiting the total memory allocated to TSDB (across all workers).
I misunderstood initially, the cap is on all workers and is divided per-query by num workers down below.
pkg/ts/server.go, line 122 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Correcting the above typo,
queryMemoryMax = queryMemoryMax << math.Log2(memoryPoolSize/128/queryMemoryMax)
neat! Had to do a bit of casting, do you know if that can be avoided here?
pkg/ts/server.go, line 126 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
So if there is a config setting, the doubling logic will not apply. Is that intentional? When is the
QueryMemoryMax
set withincfg
?
Yes this is intentional because I assume if someone is manually configuring, they wouldn't expect the number to be arbitrarily modified.
This is currently only used in tests to exercise the chunking behavior with artificially low limits.
pkg/ts/server_test.go, line 307 at r1 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
How is the expectedTsDBWorkerMax computed in relation to the poolsize, shouldn't it be 1/128th of the poolsize? What am I missing?
It's not exactly 1/128 because we double until we cross that threshold.
2 GiB / 128 = 16 MiB so we stay at 64 MiB limit
32 GiB / 128 = 256 MiB which we hit exactly by doubling twice (64 * 2 * 2)
48 GiB / 128 = 384 MiB which we cross by doubling the example above one more time (64 * 2 * 2 * 2)
My latest change adjusts the logic slightly to take the floor of the log which keeps the 3rd example at 256 MiB limit.
See what you think of the logarithm-based formula, maybe that's easier to understand but I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten, @rimadeodhar, and @tbg)
pkg/ts/server.go, line 122 at r1 (raw file):
Previously, dhartunian (David Hartunian) wrote…
neat! Had to do a bit of casting, do you know if that can be avoided here?
Both memoryPoolSize and queryMemoryMax are powers of 2, right? So, the integer division is without remainder, i.e., equivalent to floating point division; thus, we can get away with one cast. I'd also use a temp. for better readability, e.g.,
maxAvail := memoryPoolSize/128
queryMemoryMax <<= int(math.Log2(maxAvail / queryMemoryMax)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rimadeodhar)
pkg/ts/server.go, line 122 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Both memoryPoolSize and queryMemoryMax are powers of 2, right? So, the integer division is without remainder, i.e., equivalent to floating point division; thus, we can get away with one cast. I'd also use a temp. for better readability, e.g.,
maxAvail := memoryPoolSize/128 queryMemoryMax <<= int(math.Log2(maxAvail / queryMemoryMax)
There is no performance concern here, and I would (generally, everywhere where perf is not the issue) prefer we gravitate towards the clearest, not the most efficient, solution here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @nvanbenschoten)
-- commits, line 21 at r2:
Nit: missing space - be capped not becapped :).
-- commits, line 29 at r2:
This is a good question. I don't have any thoughts here to be honest. Ideally, this should be configurable so that the memory limit can be increased/decreased dynamically and we don't have to worry about it being hardcoded.
pkg/ts/server.go, line 51 at r2 (raw file):
) var TSDBAutoMemoryGrowthEnabled = func() *settings.BoolSetting {
Wondering whether instead of making boolean, should we make it tsdb_worker_memory_cap or something like that whereby setting it to 0 disables it and can be increased dynamically to allow for a fraction value to be set? Could be set to 1/128 by default.
The pros are we can tweak the memory cap for individual clusters if we run into this issue again, the cons are the cluster setting becomes a bit unwieldy to use. My instinct is that this cluster setting will not be manipulated that frequently by end users but will mostly be used by DB and support engineers. So it might be better to make it more fine grained to allow for more control than just a simple boolean. What do you think?
10f488b
to
bcaec7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rimadeodhar)
Previously, rimadeodhar (Rima Deodhar) wrote…
Nit: missing space - be capped not becapped :).
Done.
Previously, rimadeodhar (Rima Deodhar) wrote…
This is a good question. I don't have any thoughts here to be honest. Ideally, this should be configurable so that the memory limit can be increased/decreased dynamically and we don't have to worry about it being hardcoded.
Added configuration.
pkg/ts/server.go, line 122 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
There is no performance concern here, and I would (generally, everywhere where perf is not the issue) prefer we gravitate towards the clearest, not the most efficient, solution here.
Thx for the reminder @tbg. I futzed with it a bit but still had to cast the arguments for math.Log2
so I'm sticking with the original formulation.
pkg/ts/server.go, line 51 at r2 (raw file):
Previously, rimadeodhar (Rima Deodhar) wrote…
Wondering whether instead of making boolean, should we make it tsdb_worker_memory_cap or something like that whereby setting it to 0 disables it and can be increased dynamically to allow for a fraction value to be set? Could be set to 1/128 by default.
The pros are we can tweak the memory cap for individual clusters if we run into this issue again, the cons are the cluster setting becomes a bit unwieldy to use. My instinct is that this cluster setting will not be manipulated that frequently by end users but will mostly be used by DB and support engineers. So it might be better to make it more fine grained to allow for more control than just a simple boolean. What do you think?
I like this idea. I made it an int, do you think that's too confusing? I worry about asking people to put in decimal values.
322b83b
to
755355b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 4 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)
-- commits, line 21 at r4:
Nit: be capped
pkg/ts/server.go, line 51 at r2 (raw file):
Previously, dhartunian (David Hartunian) wrote…
I like this idea. I made it an int, do you think that's too confusing? I worry about asking people to put in decimal values.
Nice. I think the explanation is good, so it should help avoid any confusion.
pkg/ts/server.go, line 148 at r4 (raw file):
targetMemRatio := TSDBRatioOfSQLMemoryPool.Get(&settings.SV) if targetMemRatio > 0 { // Double size until we hit 1/128 of the memory pool setting. Our
Nit: Comment is not accurate anymore. Should be until we hit 1/targetMemRatio of the memory pool setting.
pkg/ts/server.go, line 152 at r4 (raw file):
// which corresponds to 8 GiB of system memory (assuming a default // setting of 25% for the pool). memoryCap := memoryPoolSize / targetMemRatio
Since targetMemRatio is now configurable externally, should we confirm that it doesn't exceed the memoryPoolSize before doing this division. This way we prevent any unintentional behavior.
Thinking about this a little further I'm coming to the conclusion that the cluster setting is not the correct direction for this feature. I can see in the history of the code for the SQL memory flag, that this used to be a cluster setting but was turned into a command line flag since nodes may not be identical across a cluster. I think a similar approach maybe be useful here and we can set a I was initially hesitant to pollute the command line with yet another flag but I wonder if perhaps this one can be silent or undocumented since it generally should not be touched and there's no query performance enhancement gain to tweaking it. I'll check with server team about what determination to use when adding new flags. |
David, it looks like you have developed the right line of thinking about the motivation for a new command-line flag. The reasoning sounds good to me. Sometimes @bdarnell has a more nuanced opinion (I'll let Ben chime in if he so desires) but IMHO it's warranted here. |
Previously, the memory limit for all `tsdb` workers was set at a static 64MiB. This cap created issues seen in cockroachdb#24018 where this limit was hit on a 30 node cluster. To alleviate the issue, the number of workers was reduced, raising the per-worker allocation. We've currently hit this limit again as part of load testing with larger clusters and have decided to make the per-query worker memory limit dynamic. This PR introduces a new CLI flag `--max-tsdb-memory` to mirror the functionality of the `--max-sql-memory` flag by accepting bytes or a percentage of system memory. The default is set to be `1%` of system memory or 64 MiB, whichever is greater. This ensures that performance after this PR is equal or better for timeseries queries without eating too far into memory budgets for SQL. In addition, the ts server is now connected to the same `BytesMonitor` instance as the SQL memory monitor and workers will becapped at double the query limit. Results are monitored as before but a cap is not introduced there since we didn't have one present previously. Resolves cockroachdb#72986 Release note (cli change, ops change): A new CLI flag `--max-tsdb-memory` is now available, that can set the memory budget for timeseries queries when processing requests from the Metrics page in DB Console. Most customers should not need to tweak this setting as the default of 1% of system memory or 64 MiB, whichever is greater, is adequate for most deployments. In the case where a deployment of hundreds of nodes has low per-node memory available (below 8 GiB for instance) it may be necessary to increase this value to `2%` or higher in order to render timeseries graphs for the cluster using the DB Console. Otherwise, the default settings will be adequate for the vast majority of deployments.
755355b
to
97150df
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rimadeodhar modified PR to use command line flag which makes the implementation quite simple and re-uses the existing config var to hold the value.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to have problems detecting memory limits in containers. Are those fully resolved these days? If not, this flag would require action on the part of any deployments whose memory limits are not being detected properly. (and then again for the next such flag, etc).
Rather than recommending that anyone who sets --max-sql-memory
and --cache
also add each new memory flag, we should consider adding a --total-memory
flag to override our detection of the memory limit so that the defaults of .25, .25, and .01 can be relied upon.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)
I'll check on this but in this PR, we maintain the floor of 64MB which is what we allocate today so in the case of us detecting less memory than is available, there would not be a regression from how the binary behaves today.
Thanks for the suggestion @bdarnell, tracking the CLI feature here #76535 |
The known problems were in the other direction: we'd either see the total physical memory of the machine without regard to any limits placed on the container, or we'd get some completely implausible number (exabytes). These would result in OOMs as the system (especially the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @nvanbenschoten, and @rimadeodhar)
bors r+ |
Build succeeded: |
Previously, the memory limit for all
tsdb
workers was set at a static64MiB. This cap created issues seen in #24018 where this limit was hit
on a 30 node cluster. To alleviate the issue, the number of workers was
reduced, raising the per-worker allocation.
We've currently hit this limit again as part of load testing with larger
clusters and have decided to make the per-query worker memory limit
dynamic. The per-worker limit is now raised based on the amount of memory
available to the SQL Pool via the
MemoryPoolSize
configurationvariable. This is set to be 25% of the system memory by default. The
tsdb
memory cap per-worker is now doubled until it reaches1/128
ofthe memory pool setting.
For example, on a node with 128 - 256 GiB of memory, this will
correspond to 512 MiB allocated for all running
tsdb
queries.In addition, the ts server is now connected to the same
BytesMonitor
instance as the SQL memory monitor and workers will becapped at double
the query limit. Results are monitored as before but a cap is not
introduced there since we didn't have one present previously.
This behavior is gated behind a private cluster setting that's enabled
by default and sets the ratio at 1/128 of the SQL memory pool.
Resolves #72986
Release note (ops change): customers running clusters with 240 nodes or
more can effectively access tsdb metrics.