Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: set sidecar_task globally #9161

Open
fredwangwang opened this issue Oct 23, 2020 · 8 comments
Open

feature request: set sidecar_task globally #9161

fredwangwang opened this issue Oct 23, 2020 · 8 comments

Comments

@fredwangwang
Copy link
Contributor

Nomad version

0.12.5

Operating system and Environment details

linux

Problem

For our envoy sidecar we are using a custom docker image as well as setting up some environment variables, results in a big chunk needs to be added everytime we declare a consul connected service. This 1. kinda pollutes the job file and 2. make management of the sidecar_task a bit challenging.

So would like to have something like sidecar-task-defaults, that would declare the default sidecar-task in one go.

@tgross
Copy link
Member

tgross commented Oct 23, 2020

Hi @fredwangwang! We're currently working on HCL2 support, which will open the door to future improvements on the jobspec like having blocks you can include. I think this is probably a better approach than a cluster-wide or client-wide configuration value.

@idrennanvmware
Copy link
Contributor

idrennanvmware commented Oct 23, 2020

Hi @tgross

So the real issue we are working around here is that we have a telemetry collector running on the host (let's say 10.10.10.1) but the envoy proxy is running in the bridge/mesh as is the application emitting telemetry and they're both doing it to 127.0.0.1 but that is actually 127.0.0.1 Inside "their" container and not the host. So we need to be able to Inject the host address (10.10.10.1) to the application (easy with nomad template/interpolation) AND envoy container so that it can set it's address for the envoy config, which is 127.0.0.1 by default(not possible with interpolation). Basically we can't collect envoy telemetry without this

Hope this provides some clarity to the issue we are trying to work around

@tgross tgross added the theme/consul/connect Consul Connect integration label Oct 26, 2020
@tgross
Copy link
Member

tgross commented Oct 26, 2020

So the real issue we are working around here is that we have a telemetry collector running on the host (let's say 10.10.10.1)

Am I understanding the topology correctly that you're pushing metrics to this collector (ex. it's something like statsd) and the collector isn't scraping metrics from envoy and the application? This is a bit of a challenging case with service mesh setups (no matter the orchestrator or mesh, as far as I can tell), because the collector isn't in the same network namespace as the application so it's almost like talking to another host entirely. It might be worth looking at having the collector register itself as a Consul service that you can look up by DNS name, to reduce the differences in runtime config between jobs.

AND envoy container so that it can set it's address for the envoy config, which is 127.0.0.1 by default(not possible with interpolation).

I'm not sure what you mean here by "not possible with interpolation"? The sidecar_task should accept all the same interpolation hooks that a regular task does (and if not, that's a bug we should fix).

@idrennanvmware
Copy link
Contributor

Hi @tgross

Yeah the topology is a bit weird while we transition to mesh. The problem is ensuring that other services still have backwards compatibility with the telemetry collector. There are a few options we can consider (including running multiple collectors during this phase) though so we aren't out of options. Just balancing a few things :)

As far as the interpolation goes, we had this

connect {
  sidecar_service {
    proxy {
      config {
        envoy_dogstatsd_url = "udp://${attr.unique.network.ip-address}:8125"
      }
}

but envoy sidecar kept crashing over and over again until we removed the interpolation

connect {
  sidecar_service {
    proxy {
      config {
        envoy_dogstatsd_url = "udp://10.10.10.1:8125"
      }
}

@tgross
Copy link
Member

tgross commented Oct 26, 2020

Ok, that missing interpolation isn't intentional for sure. I've added a comment about that to #7221 so we can get that fixed up. Going to tag this issue as an HCL issue as a use case for an "include" type block for configuration, but I think the underlying problem you're trying to solve is best solved with fixing interpolation for sure.

@fredwangwang
Copy link
Contributor Author

fredwangwang commented Oct 26, 2020

Am I understanding the topology correctly that you're pushing metrics to this collector (ex. it's something like statsd) and the collector isn't scraping metrics from envoy and the application?

That is accurate. Here are some more context.

First I want to say both of them (push metrics from app and envoy) are workaround-able.

  1. For the app, currently our app hardcoded the collector addr to localhost, we can update it so that it takes a env var to set the collector addr.
  2. For the envoy
    we did discover that sidecar_task allows to set an environment variable, which does accept the nomad variable string ${attr.unique.network.ip-address}. So we can set an envvar like COLLECTOR_ADDR and reference it by envoy_dogstatsd_url = "$COLLECTOR_ADDR" as directed here. (btw, the section is not very obvious in how to set the environment variable, so that might be something can be mentioned explicitly in the docs.)
    Actually nomad does not pass the environment from sidecar_task to consul when executing bootstrap. So the only way right now is to set COLLECTOR_ADDR when executing nomad.

However, what we really want is getting both app and envoy to emit to the collector on the host without changing the app source code or envoy_dogstatd_url if possible. As you mentioned @tgross, service mesh makes this kinda hard due to statsd uses UDP instead of TCP. I tried to "workaround it" (again, lol) by changing the envoy_bootstrap_json_tpl in the proxy-defautls settings:

  1. insert a static UDP cluster pointing to {{ . AgentAddress }}:8125 ({{.AgentAddress}} is the host address), 8125 is collector port
  2. insert a static UDP listener listen on 8125 and forward to the static cluster above
  3. insert a stats_sinks section:
"stats_sinks": [{"name": "envoy.dog_statsd",
    "config": {"address": {"socket_address": {
          "address": "127.0.0.1",
          "port_value": 8125
        }}}}],

Now the app is able to push to localhost:8125 within the container, which will be routed to host:8125 by the static UDP listener we "hijacked" into the envoy rules. Envoy can push to localhost:8125 as well! Everything works fine, just one problem: Now we are changing a 200 line go template, which by the way breaks between consul minor versions (expectable, as we are really trying to change some internals).


Soo, that is a really convoluted way of saying, there is no easy way right now for us to send UDP traffic between services with service mesh.

And I guess I am asking for a solution (set sidecar_task globally) which is an implementation about how I think I can use to solve my real problem.

Alternative thoughts: is having UDP protocol support possible in sidecar upstreams? Basically above proved that Envoy does UDP traffic just fine, as long as consul connect and nomad supports it, I think it will be a really nice way to route UDP traffic between services without all the hack.

@idrennanvmware
Copy link
Contributor

idrennanvmware commented Nov 2, 2020

@tgross we found another instance of this for #7221 . The network job stanza, specifically DNS, does not allow interpolation either. Example

  network {
      mode = "bridge"
      port "api_expose_healthcheck" {
        to = -1
      }
      dns {
        servers = ["127.0.0.1", "${attr.unique.network.ip-address}", "8.8.8.8"]
      }
    }

will render the literal "${attr.unique.network.ip-address}"

@idrennanvmware
Copy link
Contributor

idrennanvmware commented Jan 7, 2021

I had created a separate ticket (#9749) but @tgross pointed me back to this one since it's (very similar) and wanted to add a new use case to the sidecar specifically along envoy resources

--------- ORIGINAL POST----------
This is a feature request for the Nomad team to consider (or tell us why its a bad idea) :)

Scenario
As we move more and more services to mesh we are seeing a significant increase in the amount of resources required to run the various tasks. Often, our tasks are 128 CPU or less but due to the envoy defaults of 250 for just the sidecar this stacks up quickly and we're needing more resources to run the sidecar than the main task itself.

Very often, at least in our scenarios, these tasks need far less resources than the default and while we can customize them at the job level the impact is that we have to ask every service team to reduce their resources (and this can be a large number of teams and changes from the default .

if we had the ability to set this at a cluster level we could start with a lower default for all sidecars for connect. We would still want to be able to override the default at the job level, of course, but this would be as teams needed more resources vs the inverse that we have today with over-provisioning.

Thanks for the consideration!
Ian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants