Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[collector] Update collector to use pod ip #603

Merged

Conversation

TylerHelmuth
Copy link
Member

The collector's security guidelines are being update to not exclude containerized environments from the 0.0.0.0 guidelines. This PR brings the charts into compliance.

Related issue: open-telemetry/opentelemetry-collector#6938

@TylerHelmuth TylerHelmuth requested a review from a team January 17, 2023 18:06
@TylerHelmuth
Copy link
Member Author

/cc @povilasv

Copy link
Member

@Aneurysm9 Aneurysm9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing where the MY_POD_IP environment variable is being set. I'd expect to see a mapping in the pod spec to create an environment variable using the downward API.

@TylerHelmuth
Copy link
Member Author

@Aneurysm9 the env var is being set here

The chart was already using it for the prometheus config.

@povilasv
Copy link
Contributor

povilasv commented Jan 17, 2023

I like the change :) Quesiton - Would this change potentially break users? Like host network mode? Hosts with multiple ips or smth like that?

How would user go back to 0.0.0.0 if he wanted to?

@TylerHelmuth
Copy link
Member Author

@povilasv probably ya. Networking in k8s is complicated enough that this will probably affect at least some users. I'll add an entry in the UPGRADING.md doc.

Copy link
Contributor

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, maybe also adding some text to helm's Notes.txt so that folks can see this during upgrade?

@dmitryax
Copy link
Member

LGTM

@TylerHelmuth TylerHelmuth merged commit 2e8b7ad into open-telemetry:main Jan 24, 2023
@TylerHelmuth TylerHelmuth deleted the update-security-guidance branch January 24, 2023 22:59
@cep21
Copy link

cep21 commented Feb 17, 2023

I recently ran into an issue that I think is related to this. If the recommendations for open telemetry in kubernetes are

  • Use as a daemonset
  • Use ${MY_POD_IP}

Then how do services talk to the daemonset pods in this configuration? Do you recommend a Service object for that? Usually, these two things together conflict.

@povilasv
Copy link
Contributor

@cep21 if you have an issue please create an issue not comment on merged PR.

Typically services talk via local node port. For example:

        env:
          - name: NODE
            valueFrom:
              fieldRef:
                fieldPath: status.hostIP
          - name: OTEL_EXPORTER_OTLP_ENDPOINT
            value: "$(NODE):4317"

@fernferret
Copy link
Contributor

fernferret commented Mar 24, 2023

I believe this PR prevents the default configuration from using a kubectl port-forward since that tries to connect to 127.0.0.1 in the container.

This is the error message I got (with a Deployment based install):

kubectl port-forward -n obs svc/otel-coll 4317:4317
Forwarding from 127.0.0.1:4317 -> 4317
Forwarding from [::1]:4317 -> 4317
Handling connection for 4317
E0324 22:22:01.547223   48156 portforward.go:407] an error occurred forwarding 4317 -> 4317:
  error forwarding port 4317 to pod ac1...e39, uid : failed to execute portforward in network
  namespace "/var/run/netns/cni-4e...9d": failed to connect to localhost:4317 inside namespace
  "ac1...e39", IPv4: dial tcp4 127.0.0.1:4317: connect: connection refused IPv6 dial tcp6:
  address localhost: no suitable address found
E0324 22:22:01.547596   48156 portforward.go:233] lost connection to pod

I don't think this behavior is wrong, but it was surprising to a novice and bit me today when trying to debug some ingest pipelines and I thought I'd just port-forward to perform a quick test, but that didn't quite work out, but I did learn a lot more about configuring the otel-collector!

I think it would be nice if a note about this caveat was added to the excellent NOTES.txt note (since that note is what led me here).

Would using a NetworkPolicy be sufficient to mitigate the concerns of the 0.0.0.0 binding, or is binding to the Pod IP + 127.0.0.1 (option 2 below) still preferable?

I did end up playing around with 2 options:

Option 1: Revert back to to 0.0.0.0

config:
  exporters:
    otlp:
      endpoint: test-collector:4317
      tls:
        insecure: true
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
  service:
    pipelines:
      traces:
        exporters:
          - logging
          - otlp
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp

Option 2: Add a 127.0.0.1 binding

This is the option I went with for now since this is just in my lab. I think (I'm still an OTEL novice) this would allow me to be a bit more flexible routing traces.

Here's the snippet from my values file:

config:
  exporters:
    otlp:
      endpoint: test-collector:4317
      tls:
        insecure: true
  receivers:
    otlp/local:
      protocols:
        grpc:
          endpoint: 127.0.0.1:4317
        http:
          endpoint: 127.0.0.1:4318
  service:
    pipelines:
      traces:
        exporters:
          - logging
          - otlp
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - otlp/local

@TylerHelmuth
Copy link
Member Author

@fernferret thanks for sharing with us. I think using Pod IP is the most appropriate default value, but I like the idea of updating our docs to include instructions on how to configure the chart for local development port-forwarding.

@fernferret
Copy link
Contributor

@TylerHelmuth Awesome, thanks for the reply. If you'd like me to open a PR for the documentation I'm willing to write something up.

Do you know if there there any practical performance differences with running the 2 collectors vs just one in a production setup? In my head the answer is "no" since I'd imagine the 127.0.0.1 listener is just blocking on the socket not doing much if there is no debug traffic, but I haven't had a look at the actual code performing this yet.

I'd imagine that once I get comfortable I wouldn't need/want this in production.

@TylerHelmuth
Copy link
Member Author

@fernferret feel free to open a PR to improve the docs.

moh-osman3 pushed a commit to moh-osman3/opentelemetry-helm-charts that referenced this pull request Apr 12, 2023
* Update collector to use pod ip

* bump chart

* Update UPGRADING.md

* Add endpoint note in NOTES.txt

* bump chart

* fix bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants