High memory usage on Nomad agent due to static buffer in template client? #18508

deverton-godaddy · 2023-09-15T02:33:49Z

Nomad version

$ nomad version
Nomad v1.5.8
BuildDate 2023-07-21T13:48:41Z
Revision b3a653dff306e57edcd63bce6c4a54c0918f2f3d

Operating system and Environment details

Amazon Linux 2 on amd64 and aarch64

Issue

The Nomad agent has surprisingly high memory usage for our use case. We're seeing 8 GB of memory usage for about 1000 allocations on a single machine.

Reproduction steps

Not entirely sure what the exact scenario that causes this, but possible it's related to the number of templates. Each of the 1000 tasks has one template which for reasons discussed below, matches some of the memory usage numbers.

Expected Result

Not using 8GB of RAM for a 1000 allocations with one template.

Actual Result

We enabled pprof and grabbed a heap dump. The top 10 objects are shown below which has a pretty large outlier.

Showing nodes accounting for 1.43GB, 86.87% of 1.64GB total
Dropped 393 nodes (cum <= 0.01GB)
Showing top 10 nodes out of 111
      flat  flat%   sum%        cum   cum%
    1.24GB 75.74% 75.74%     1.24GB 75.74%  google.golang.org/grpc/test/bufconn.newPipe
    0.03GB  1.92% 77.66%     0.06GB  3.80%  github.com/hashicorp/nomad/client/taskenv.(*Builder).Build
    0.02GB  1.43% 79.09%     0.03GB  1.85%  github.com/hashicorp/nomad/client/taskenv.(*Builder).buildEnv
    0.02GB  1.39% 80.48%     0.02GB  1.48%  encoding/json.(*Decoder).refill
    0.02GB  1.34% 81.82%     0.02GB  1.34%  fmt.Sprintf
    0.02GB  1.29% 83.11%     0.04GB  2.51%  github.com/hashicorp/nomad/client/taskenv.(*Builder).setNode
    0.02GB  1.14% 84.24%     0.02GB  1.22%  io.copyBuffer
    0.02GB  1.01% 85.25%     0.02GB  1.01%  github.com/armon/go-metrics.(*Metrics).filterLabels
    0.01GB  0.87% 86.12%     0.01GB  0.87%  bufio.NewReaderSize
    0.01GB  0.75% 86.87%     0.01GB  0.75%  golang.org/x/exp/maps.Clone[...]

Tracing this through the code lands on this line ln := bufconn.Listen(1024 * 1024) which

seems like quite a large static buffer
is using test code from the GRPC library in a non-test context

However the only place that seems to call it is agent.setupClient() which should be called once at startup.

So I'm a bit stumped on how we end up with so many instances of the buffer.

The full profile chart is below

The text was updated successfully, but these errors were encountered:

tgross · 2023-09-15T13:49:24Z

Hi @deverton-godaddy! First, thanks so much for showing up with a heap profile here! 😀

I think you're hitting an unfortunate design constrain around template blocks, which is that we embed consul-template to run the templating engine, and each running template is its own instance of consul-template. Each template runner that connects "back" to Nomad for Nomad services or variables needs to open its own connection. That bufconn.Listen method only configures the listener, it doesn't allocate the buffers until a connection is made, and then it creates two buffers of the size (1 MiB here). That adds up quite a bit.

The heap profile you're showing shows 1272MiB of heap being used by these buffers, so I suspect maybe only ~600 of the 1000 templates are querying Nomad services or variables.

Given all that:

Having a MiB-sized buffer for both tx/rx does seem like a lot; it looks like this code was lifted from Vault (which does or at least used to do the same thing but not on the scale Nomad does where we have potentially 100s or 1000s of templates), so I'm going to take that back to the team to figure out whether we can safely resize that or make it a user-configurable buffer.
Using this bit of testing code in gRPC itself got lifted out of Vault in client: add Nomad template service functionality to runner. #12458. I can talk to @jrasell and @schmichael to get their thoughts on how we might tighten that up.
Also, the entire heap in your profile is 1680MiB which is a lot less than the 8GiB RSS you're reporting, so it might also be helpful to get a goroutine profile (with ?debug=2) to verify there's no unexpected excess goroutine spawning either.

tgross · 2023-09-15T15:47:41Z

Still chatting with the team about this, but in the meanwhile I did a quick smoke test cranking down that buffer to only 10k and it worked just fine even with Variables that were 4x that size (as I'd expect). Obviously that means more read calls on that buffer so I'm not totally certain we'd recommend that yet without doing some more testing about the impact over many allocations.

But if you wanted to try it yourself on a client the diff was:

diff --git a/helper/bufconndialer/bufconndialer.go b/helper/bufconndialer/bufconndialer.go
index 2a7650d57d..22acb22758 100644
--- a/helper/bufconndialer/bufconndialer.go
+++ b/helper/bufconndialer/bufconndialer.go
@@ -24,7 +24,7 @@ type BufConnWrapper struct {
 // New returns a new BufConnWrapper with a new bufconn.Listener. The wrapper
 // provides a dialer for creating connections to the listener.
 func New() (net.Listener, *BufConnWrapper) {
-       ln := bufconn.Listen(1024 * 1024)
+       ln := bufconn.Listen(1024 * 10)
        return ln, &BufConnWrapper{listener: ln}
 }

(Or split the difference and do 1025 * 100 if that feels dodgy.)

In #12458 we added an in-memory connection buffer so that template runners that want access to the Nomad API for Service Registration and Variables can communicate with Nomad without having to create a real HTTP client. The size of this buffer (1 MiB) was taken directly from its usage in Vault, and each connection makes 2 such buffers (send and receive). Because each template runner has its own connection, when there are large numbers of allocations this adds up to significant memory usage. The largest Nomad Variable payload is 64KiB, and a small amount of metadata. Service Registration responses are much smaller, and we don't include check results in them (as Consul does), so the size is relatively bounded. We should be able to safely reduce the size of the buffer by a factor of 10 or more without forcing the template runner to make multiple read calls over the buffer. Fixes: #18508

tgross · 2023-09-15T19:28:20Z

After some discussion with the team, I've opened #18524 with a draft PR setting the buffer to 100KiB so as to allow reading the largest Variable with a single read call.

deverton-godaddy · 2023-09-18T03:25:20Z

I'll dig in to why we see a much higher RSS at some point because I was also surprised that the heap profile didn't match with what we see at the system level. I figured I'd report the obvious outlier object first as that seemed like it might be an easy fix which it seems like it might be.

You mention that "each running template is its own instance of consul-template" which suggests if a job has say 4 templates there's 4 connections so 4 MB of buffers (before the patch)? How feasible is it that this might eventually be converted to a single (or a pool) of shared buffers? At the density and template use we're targeting, even the 10x reduction in memory usage might not be enough.

In #12458 we added an in-memory connection buffer so that template runners that want access to the Nomad API for Service Registration and Variables can communicate with Nomad without having to create a real HTTP client. The size of this buffer (1 MiB) was taken directly from its usage in Vault, and each connection makes 2 such buffers (send and receive). Because each template runner has its own connection, when there are large numbers of allocations this adds up to significant memory usage. The largest Nomad Variable payload is 64KiB, and a small amount of metadata. Service Registration responses are much smaller, and we don't include check results in them (as Consul does), so the size is relatively bounded. We should be able to safely reduce the size of the buffer by a factor of 10 or more without forcing the template runner to make multiple read calls over the buffer. Fixes: #18508

tgross · 2023-09-18T13:13:38Z

You mention that "each running template is its own instance of consul-template" which suggests if a job has say 4 templates there's 4 connections so 4 MB of buffers (before the patch)?

Correct.

How feasible is it that this might eventually be converted to a single (or a pool) of shared buffers? At the density and template use we're targeting, even the 10x reduction in memory usage might not be enough.

We're probably looking to make template runners more isolated, rather than less, as the sandbox the runners are in currently is all code in the agent, and this has been a historic source of privilege escalation bugs we'd like to solve by design rather than relying on being infallible developers (:grinning:). Having something more like what we've done with artifact blocks where the runner is in a separate process is probably where we want to go. That'd also let us do things like persist template state across client agent restarts (#13313) and keep template updates coming even when the client is down.

But we don't want to do that kind of thing and then make the memory usage problem worse, for sure, so we'd likely need to rework a lot of the consul-template internals to do that so we have a very lean process. In the meantime, I've just landed the PR to reduce the buffer size. I might try to look at making that a user-configurable option because I suspect we could even get away with 10k instead of 100k for a lot of users, but I'm not going to be able to look at that for a few weeks at best.

deverton-godaddy added the type/bug label Sep 15, 2023

tgross self-assigned this Sep 15, 2023

tgross added theme/client theme/resource-utilization theme/template labels Sep 15, 2023

tgross mentioned this issue Sep 15, 2023

helper: reduce size of buffer used by template connections #18524

Merged

tgross closed this as completed in #18524 Sep 18, 2023

hc-github-team-nomad-core mentioned this issue Sep 18, 2023

Backport of helper: reduce size of buffer used by template connections into release/1.6.x #18525

Merged

tgross added this to the 1.6.x milestone Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage on Nomad agent due to static buffer in template client? #18508

High memory usage on Nomad agent due to static buffer in template client? #18508

deverton-godaddy commented Sep 15, 2023

tgross commented Sep 15, 2023

tgross commented Sep 15, 2023

tgross commented Sep 15, 2023

deverton-godaddy commented Sep 18, 2023

tgross commented Sep 18, 2023

High memory usage on Nomad agent due to static buffer in template client? #18508

High memory usage on Nomad agent due to static buffer in template client? #18508

Comments

deverton-godaddy commented Sep 15, 2023

Nomad version

Operating system and Environment details

Issue

Reproduction steps

Expected Result

Actual Result

tgross commented Sep 15, 2023

tgross commented Sep 15, 2023

tgross commented Sep 15, 2023

deverton-godaddy commented Sep 18, 2023

tgross commented Sep 18, 2023