You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to deploy Collector with k8s_observer enabled, the collector never starts successfully and falls into a constant restarting loop because the readiness/liveness checks never succeed.
Even after increasing the timeoutSeconds of the probes the situation is not improved.
Steps to Reproduce
Using the provided configuration deploy the collector using Helm: helm install daemonset open-telemetry/opentelemetry-collector --values daemonset.yaml
Check that the Collector Pods fail to come to a healthy state and they are constantly getting restarted
Expected Result
Collector Pods should be running without restarts with this basic configuration.
2023-11-29T12:51:34.736Z info service@v0.90.0/telemetry.go:86 Setting up own telemetry...
2023-11-29T12:51:34.736Z info service@v0.90.0/telemetry.go:203 Serving Prometheus metrics {"address": "10.68.0.132:8888", "level": "Basic"}
2023-11-29T12:51:34.737Z info exporter@v0.90.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "traces", "name": "debug"}
2023-11-29T12:51:34.738Z info memorylimiterprocessor@v0.90.0/memorylimiter.go:138 Using percentage memory limiter {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "total_memory_mib": 3928, "limit_percentage": 80, "spike_limit_percentage": 25}
2023-11-29T12:51:34.738Z info memorylimiterprocessor@v0.90.0/memorylimiter.go:102 Memory limiter configured {"kind": "processor", "name": "memory_limiter", "pipeline": "traces", "limit_mib": 3142, "spike_limit_mib": 982, "check_interval": 5}
2023-11-29T12:51:34.738Z info exporter@v0.90.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "logs", "name": "debug"}
2023-11-29T12:51:34.738Z info exporter@v0.90.0/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2023-11-29T12:51:34.739Z info kube/client.go:113 k8s filtering {"kind": "processor", "name": "k8sattributes", "pipeline": "traces", "labelSelector": "", "fieldSelector": "spec.nodeName=gke-otel-demo-default-pool-0f2ce7eb-c7qr"}
2023-11-29T12:51:34.739Z warn jaegerreceiver@v0.90.0/factory.go:49 jaeger receiver will deprecate Thrift-gen and replace it with Proto-gen to be compatbible to jaeger 1.42.0 and higher. See https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/18485 for more details. {"kind": "receiver", "name": "jaeger", "data_type": "traces"}
2023-11-29T12:51:34.739Z info kube/client.go:113 k8s filtering {"kind": "processor", "name": "k8sattributes", "pipeline": "metrics", "labelSelector": "", "fieldSelector": "spec.nodeName=gke-otel-demo-default-pool-0f2ce7eb-c7qr"}
2023-11-29T12:51:34.740Z info kube/client.go:113 k8s filtering {"kind": "processor", "name": "k8sattributes", "pipeline": "logs", "labelSelector": "", "fieldSelector": "spec.nodeName=gke-otel-demo-default-pool-0f2ce7eb-c7qr"}
2023-11-29T12:51:34.741Z info service@v0.90.0/service.go:148 Starting otelcol-contrib... {"Version": "0.90.0", "NumCPU": 2}
2023-11-29T12:51:34.741Z info extensions/extensions.go:34 Starting extensions...
2023-11-29T12:51:34.741Z info extensions/extensions.go:37 Extension is starting... {"kind": "extension", "name": "k8s_observer"}
2023-11-29T12:51:34.741Z info extensions/extensions.go:45 Extension started. {"kind": "extension", "name": "k8s_observer"}
2023-11-29T12:51:34.742Z info otlpreceiver@v0.90.0/otlp.go:83 Starting GRPC server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.68.0.132:4317"}
2023-11-29T12:51:34.742Z info otlpreceiver@v0.90.0/otlp.go:101 Starting HTTP server {"kind": "receiver", "name": "otlp", "data_type": "traces", "endpoint": "10.68.0.132:4318"}
2023-11-29T12:51:34.742Z info service@v0.90.0/service.go:174 Everything is ready. Begin running and processing data.
2023-11-29T12:52:04.138Z info otelcol@v0.90.0/collector.go:258 Received signal from OS {"signal": "terminated"}
2023-11-29T12:52:04.139Z info service@v0.90.0/service.go:188 Starting shutdown...
2023-11-29T12:52:04.140Z info extensions/extensions.go:52 Stopping extensions...
2023-11-29T12:52:04.140Z info service@v0.90.0/service.go:202 Shutdown complete.
Additional context
It seems that the collector fails to report ready/healthy. Here is what describe output gives:
Warning Unhealthy 4m24s (x11 over 5m23s) kubelet Readiness probe failed: Get "http://10.68.0.132:13133/": dial tcp 10.68.0.132:13133: connect: connection refused
Warning Unhealthy 4m24s (x6 over 5m14s) kubelet Liveness probe failed: Get "http://10.68.0.132:13133/": dial tcp 10.68.0.132:13133: connect: connection refused
Normal Killing 4m24s (x2 over 4m54s) kubelet Container opentelemetry-collector failed liveness probe, will be restarted
The text was updated successfully, but these errors were encountered:
The issue is that the probe relies on the health check extension, which is now omitted from the service extensions sequence. This should be included along with any other desired extension:
I guess this is sth that the helm-chart should check on start up and explicitly inform the user about it. I will create an issue in the helm-charts repo since this is not related to the collector's implementation specifically.
Component(s)
extension/observer/k8sobserver
What happened?
Description
Trying to deploy Collector with
k8s_observer
enabled, the collector never starts successfully and falls into a constant restarting loop because the readiness/liveness checks never succeed.Even after increasing the
timeoutSeconds
of the probes the situation is not improved.Steps to Reproduce
helm install daemonset open-telemetry/opentelemetry-collector --values daemonset.yaml
Expected Result
Collector Pods should be running without restarts with this basic configuration.
Actual Result
Constant restarts.
Collector version
0.90.0
Environment information
Environment
GKE:
v1.27.3-gke.100
OpenTelemetry Collector configuration
Log output
Additional context
It seems that the collector fails to report ready/healthy. Here is what
describe
output gives:The text was updated successfully, but these errors were encountered: