-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set gRPC logger to zap #2237
Comments
**Description:** Adds a connection state reporter to the Jaeger exporter sending state changes to the log and as a metric. **Link to tracking Issue:** Closes #1861, as it's the best we can do at this exporter level. On a more general level, the collector could provide more verbose info about the underlying gRPC connections. See #2237 for more information. **Testing:** unit and manual tests. **Documentation:** None. Here are the complete logs for an OpenTelemetry Collector that got started without a Jaeger backend available. Note that the config.yaml contains two exporters, one (`jaeger`) with the `insecure: true` option, and a second one (`jaeger/2`) without this option. A Jaeger all-in-one is then started, without TLS support. This ends with the first exporter eventually transitioning to "READY", but the second won't. ``` 2020-12-02T11:17:26.116+0100 INFO service/service.go:409 Starting OpenTelemetry Collector... {"Version": "latest", "GitHash": "<NOT PROPERLY GENERATED>", "NumCPU": 12} 2020-12-02T11:17:26.116+0100 INFO service/service.go:253 Setting up own telemetry... 2020-12-02T11:17:26.130+0100 INFO service/telemetry.go:101 Serving Prometheus metrics {"address": "localhost:8888", "level": 0, "service.instance.id": "b4f3bc7c-2593-48e5-9ef9-8b585bbcf4fc"} 2020-12-02T11:17:26.131+0100 INFO service/service.go:290 Loading configuration... 2020-12-02T11:17:26.131+0100 INFO service/service.go:301 Applying configuration... 2020-12-02T11:17:26.131+0100 INFO service/service.go:322 Starting extensions... 2020-12-02T11:17:26.132+0100 INFO builder/exporters_builder.go:306 Exporter is enabled. {"component_kind": "exporter", "exporter": "jaeger"} 2020-12-02T11:17:26.132+0100 INFO builder/exporters_builder.go:306 Exporter is enabled. {"component_kind": "exporter", "exporter": "jaeger/2"} 2020-12-02T11:17:26.132+0100 INFO service/service.go:337 Starting exporters... 2020-12-02T11:17:26.132+0100 INFO builder/exporters_builder.go:92 Exporter is starting... {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger"} 2020-12-02T11:17:26.132+0100 INFO jaegerexporter/exporter.go:183 State of the connection with the Jaeger Collector backend {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger", "state": "CONNECTING"} 2020-12-02T11:17:26.132+0100 INFO builder/exporters_builder.go:97 Exporter started. {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger"} 2020-12-02T11:17:26.132+0100 INFO builder/exporters_builder.go:92 Exporter is starting... {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2"} 2020-12-02T11:17:26.132+0100 INFO jaegerexporter/exporter.go:183 State of the connection with the Jaeger Collector backend {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2", "state": "CONNECTING"} 2020-12-02T11:17:26.132+0100 INFO builder/exporters_builder.go:97 Exporter started. {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2"} 2020-12-02T11:17:26.132+0100 INFO builder/pipelines_builder.go:207 Pipeline is enabled. {"pipeline_name": "traces", "pipeline_datatype": "traces"} 2020-12-02T11:17:26.132+0100 INFO service/service.go:350 Starting processors... 2020-12-02T11:17:26.132+0100 INFO builder/pipelines_builder.go:51 Pipeline is starting... {"pipeline_name": "traces", "pipeline_datatype": "traces"} 2020-12-02T11:17:26.132+0100 INFO builder/pipelines_builder.go:61 Pipeline is started. {"pipeline_name": "traces", "pipeline_datatype": "traces"} 2020-12-02T11:17:26.132+0100 INFO builder/receivers_builder.go:235 Receiver is enabled. {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp", "datatype": "traces"} 2020-12-02T11:17:26.132+0100 INFO service/service.go:362 Starting receivers... 2020-12-02T11:17:26.132+0100 INFO builder/receivers_builder.go:70 Receiver is starting... {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"} 2020-12-02T11:17:26.132+0100 INFO otlpreceiver/otlp.go:93 Starting GRPC server on endpoint 0.0.0.0:4317 {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"} 2020-12-02T11:17:26.132+0100 INFO otlpreceiver/otlp.go:130 Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680 {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"} 2020-12-02T11:17:26.132+0100 INFO otlpreceiver/otlp.go:93 Starting GRPC server on endpoint 0.0.0.0:55680 {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"} 2020-12-02T11:17:26.132+0100 INFO builder/receivers_builder.go:75 Receiver started. {"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"} 2020-12-02T11:17:26.132+0100 INFO service/service.go:265 Everything is ready. Begin running and processing data. 2020-12-02T11:17:27.132+0100 INFO jaegerexporter/exporter.go:183 State of the connection with the Jaeger Collector backend {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2", "state": "TRANSIENT_FAILURE"} 2020-12-02T11:17:28.132+0100 INFO jaegerexporter/exporter.go:183 State of the connection with the Jaeger Collector backend {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger", "state": "TRANSIENT_FAILURE"} 2020-12-02T11:17:43.132+0100 INFO jaegerexporter/exporter.go:183 State of the connection with the Jaeger Collector backend {"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger", "state": "READY"} ``` And here are the metrics for the final state: ``` # HELP otelcol_jaegerexporter_conn_state Last connection state: 0 = Idle, 1 = Connecting, 2 = Ready, 3 = TransientFailure, 4 = Shutdown # TYPE otelcol_jaegerexporter_conn_state gauge otelcol_jaegerexporter_conn_state{exporter_name="jaeger",service_instance_id="341f6179-0c34-4ad1-b2e5-19b2bed4c9d1"} 2 otelcol_jaegerexporter_conn_state{exporter_name="jaeger/2",service_instance_id="341f6179-0c34-4ad1-b2e5-19b2bed4c9d1"} 3 ``` Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
@jpkrohling now that #4009 is checked in. I am happy to work on this enhancement. Just as a confirmation do you envision this enhancement in this way? service:
telemetry:
logs:
level: "DEBUG"
development: true
grpc_logs: true
|
I think I would include the gRPC logs as info, and without any entry in the configuration. If people need to customize it for some reason, we might want to wrap it behind a config option. |
Thank you @jpkrohling - Just as a confirmation
service:
telemetry:
development: true
|
For 1, I think it could be made part of the production profile. As someone managing a production-quality instance, I would be interested in knowing when the connection between the collector and the backend Jaeger breaks and restores. For 2, I would vote for a minimal solution for the moment and see what people really need. For 3, not so sure. gRPC is somewhat different, as the client won't immediately fail if the server isn't available especially when the retries are enabled. As such, it's hard to tell what's the problem when things stop working. For HTTP, an error in the connection is usually caught right away and the client can decide to fail (and log) or retry itself. |
@pavankrish123, are you still interested in working on this one? |
* adding NewNoopMeterProvider to follow trace api * Update CHANGELOG.md Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> * fix lint Co-authored-by: Anthony Mirabella <a9@aneurysm9.com> Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
@jpkrohling yes I am still interested. (Sorry was on a break). Will push out initial commit by this weekend. |
(comment copied from PR for the sake of completeness) @jpkrohling - during my research with grpc logging, i found this settings that can be used to set and control grpc logs. Grpc by defaults sets its logging to error and logger output is set to stderr. It also allows controlling these settings via environmental variables GRPC_GO_LOG_SEVERITY_LEVEL, GRPC_GO_LOG_VERBOSITY_LEVEL and GRPC_GO_LOG_FORMATTER So if we do something like We get effectively the same detail of grpc messages without any code changes. 2021-11-12T20:02:23.318-0800 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "logging"}
2021-11-12T20:02:23.318-0800 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "logging"}
2021-11-12T20:02:23.318-0800 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "otlp"}
2021/11/12 20:02:23 INFO: [core] original dial target is: "0.0.0.0:8889"
2021/11/12 20:02:23 INFO: [core] dial target "0.0.0.0:8889" parse failed: parse "0.0.0.0:8889": first path segment in URL cannot contain colon
2021/11/12 20:02:23 INFO: [core] fallback to scheme "passthrough"
2021/11/12 20:02:23 INFO: [core] parsed dial target is: {Scheme:passthrough Authority: Endpoint:0.0.0.0:8889 URL:{Scheme:passthrough Opaque: User: Host: Path:/0.0.0.0:8889 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}}
2021/11/12 20:02:23 INFO: [core] Channel authority set to "0.0.0.0:8889"
2021/11/12 20:02:23 INFO: [core] ccResolverWrapper: sending update to cc: {[{0.0.0.0:8889 <nil> <nil> 0 <nil>}] <nil> <nil>}
2021/11/12 20:02:23 INFO: [core] ClientConn switching balancer to "pick_first"
2021/11/12 20:02:23 INFO: [core] Channel switches to new LB policy "pick_first"
2021-11-12T20:02:23.318-0800 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "otlp"}
2021-11-12T20:02:23.318-0800 info service/service.go:96 Starting processors...
2021-11-12T20:02:23.318-0800 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "traces"}
2021/11/12 20:02:23 INFO: [core] Subchannel Connectivity change to CONNECTING
2021-11-12T20:02:23.318-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "traces"}
2021/11/12 20:02:23 INFO: [core] Subchannel picks a new address "0.0.0.0:8889" to connect
2021-11-12T20:02:23.318-0800 info service/service.go:101 Starting receivers...
2021/11/12 20:02:23 INFO: [core] Channel Connectivity change to CONNECTING
2021-11-12T20:02:23.318-0800 info builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"}
2021-11-12T20:02:23.318-0800 info otlpreceiver/otlp.go:68 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"}
2021-11-12T20:02:23.319-0800 info builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "otlp"}
2021-11-12T20:02:23.319-0800 info service/telemetry.go:92 Setting up own telemetry...
2021-11-12T20:02:23.323-0800 info service/telemetry.go:116 Serving Prometheus metrics {"address": ":8888", "level": "basic", "service.instance.id": "f20d3995-e635-4f2a-9f2a-7767d0d38eac", "service.version": "v0.35.0-279-gb8544417"}
2021-11-12T20:02:23.323-0800 info service/collector.go:235 Starting otelcol... {"Version": "v0.35.0-279-gb8544417", "NumCPU": 12}
2021-11-12T20:02:23.323-0800 info service/collector.go:131 Everything is ready. Begin running and processing data.
2021/11/12 20:02:24 WARNING: [core] grpc: addrConn.createTransport failed to connect to {0.0.0.0:8889 0.0.0.0:8889 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp 0.0.0.0:8889: connect: connection refused"
2021/11/12 20:02:24 INFO: [core] Subchannel Connectivity change to TRANSIENT_FAILURE
2021/11/12 20:02:24 INFO: [core] Channel Connectivity change to TRANSIENT_FAILURE
2021/11/12 20:02:25 INFO: [core] Subchannel Connectivity change to IDLE
2021/11/12 20:02:25 INFO: [core] Channel Connectivity change to IDLE Do we still want these messages go through our collector logger? I wonder if we are getting any value by doing so especially because collector logger is also writing (and can only write to) to the same STDERR ? (and we expose basically same logging configuration on collector as well) Let me know how you feel about using env variables to control grpc logging. Happy to discard this PR. |
My personal opinion is that the collector shouldn't require knowledge about gRPC from the people running the collector. If they set the collector logging level to debug, I would expect to see the gRPC messages in the log, hopefully in a consistent format with the rest of the log entries. We may want to take this discussion to the SIG meeting next week for a decision (I'm not able to join today's). |
@jpkrohling any leads on how to proceed - Thank you |
@pavankrish123, try to create a thread on Slack to discuss it. If we can't get a green light from a maintainer (@bogdandrutu?), let's discuss this tomorrow during the SIG meeting. |
I would have all the INFO statements from your previous comment as part of the prod profile. I don't think they are that verbose. If possible at all, one thing I would change is to omit the subchannel logging, having it only for the main channel. |
Adding a feature - This feature sets the grpc logger to one derived from collector logger configuration and enables users to control grpc logs. - In the case of collector default configuration, gRPC logs minimum level is set to WARN (to reduce the log traffic) but shared rest of collector's logger configuration. ```bash $ ./bin/cmd-otelcol --config ./bin/config.yaml 2021-11-30T16:47:38.293-0800 info service/collector.go:201 Applying configuration... 2021-11-30T16:47:38.294-0800 info builder/exporters_builder.go:254 Exporter was built. {"kind": "exporter", "name": "otlp"} 2021-11-30T16:47:38.294-0800 info builder/exporters_builder.go:254 Exporter was built. {"kind": "exporter", "name": "logging"} 2021-11-30T16:47:38.295-0800 info builder/pipelines_builder.go:222 Pipeline was built. {"name": "pipeline", "name": "metrics"} 2021-11-30T16:47:38.295-0800 info builder/pipelines_builder.go:222 Pipeline was built. {"name": "pipeline", "name": "traces"} 2021-11-30T16:47:38.295-0800 info builder/receivers_builder.go:224 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "traces"} 2021-11-30T16:47:38.295-0800 info builder/receivers_builder.go:224 Receiver was built. {"kind": "receiver", "name": "otlp", "datatype": "metrics"} 2021-11-30T16:47:38.295-0800 info service/service.go:86 Starting extensions... 2021-11-30T16:47:38.295-0800 info service/service.go:91 Starting exporters... 2021-11-30T16:47:38.295-0800 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "logging"} 2021-11-30T16:47:38.295-0800 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "logging"} 2021-11-30T16:47:38.295-0800 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "otlp"} 2021-11-30T16:47:38.295-0800 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "otlp"} 2021-11-30T16:47:38.295-0800 info service/service.go:96 Starting processors... 2021-11-30T16:47:38.295-0800 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "metrics"} 2021-11-30T16:47:38.295-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "metrics"} 2021-11-30T16:47:38.295-0800 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "traces"} 2021-11-30T16:47:38.295-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "traces"} 2021-11-30T16:47:38.295-0800 info service/service.go:101 Starting receivers... 2021-11-30T16:47:38.295-0800 info builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"} 2021-11-30T16:47:38.295-0800 info otlpreceiver/otlp.go:68 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"} 2021-11-30T16:47:38.302-0800 info builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "otlp"} 2021-11-30T16:47:38.302-0800 info service/telemetry.go:92 Setting up own telemetry... 2021-11-30T16:47:38.310-0800 info service/telemetry.go:116 Serving Prometheus metrics {"address": ":8888", "level": "basic", "service.instance.id": "a8caf36a-00df-407e-b356-23569252e525", "service.version": "latest"} 2021-11-30T16:47:38.310-0800 info service/collector.go:250 Starting cmd-otelcol... {"Version": "0.40.0-dev", "NumCPU": 12} 2021-11-30T16:47:38.310-0800 info service/collector.go:138 Everything is ready. Begin running and processing data. 2021-11-30T16:47:38.384-0800 warn zapgrpc/zapgrpc.go:191 [core] grpc: addrConn.createTransport failed to connect to {otelcol:4317 otelcol:4317 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host" 2021-11-30T16:47:42.912-0800 INFO loggingexporter/logging_exporter.go:40 TracesExporter {"#spans": 200} ``` - for all other levels - gRPC logs min log level takes whatever is set for collector log sharing rest of collector's logger configuration. for example ```yaml service: telemetry: logs: level: "DEBUG" encoding: "console" pipelines: traces: receivers: [otlp] processors: [batch] exporters: [logging, otlp] ``` in non development made ```bash $ ./bin/cmd-otelcol --config ./bin/config.yaml 2021-11-30T17:35:09.442-0800 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "otlp"} 2021-11-30T17:35:09.442-0800 info zapgrpc/zapgrpc.go:174 [core] original dial target is: "otelcol:4317" 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] parsed dial target is: {Scheme:otelcol Authority: Endpoint:4317 URL:{Scheme:otelcol Opaque:4317 User: Host: Path: RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] fallback to scheme "passthrough" 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] parsed dial target is: {Scheme:passthrough Authority: Endpoint:otelcol:4317 URL:{Scheme:passthrough Opaque: User: Host: Path:/otelcol:4317 RawPath: ForceQuery:false RawQuery: Fragment: RawFragment:}} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] Channel authority set to "otelcol:4317" 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] ccResolverWrapper: sending update to cc: {[{otelcol:4317 <nil> <nil> 0 <nil>}] <nil> <nil>} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] ClientConn switching balancer to "pick_first" 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] Channel switches to new LB policy "pick_first" 2021-11-30T17:35:09.443-0800 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "otlp"} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to CONNECTING 2021-11-30T17:35:09.443-0800 info builder/exporters_builder.go:40 Exporter is starting... {"kind": "exporter", "name": "logging"} 2021-11-30T17:35:09.443-0800 info builder/exporters_builder.go:48 Exporter started. {"kind": "exporter", "name": "logging"} 2021-11-30T17:35:09.443-0800 info service/service.go:96 Starting processors... 2021-11-30T17:35:09.443-0800 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "traces"} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel picks a new address "otelcol:4317" to connect 2021-11-30T17:35:09.443-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "traces"} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc0000f4440, {CONNECTING <nil>} 2021-11-30T17:35:09.443-0800 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "metrics"} 2021-11-30T17:35:09.443-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to CONNECTING 2021-11-30T17:35:09.443-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "metrics"} 2021-11-30T17:35:09.443-0800 info service/service.go:101 Starting receivers... 2021-11-30T17:35:09.444-0800 info builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"} 2021-11-30T17:35:09.444-0800 info otlpreceiver/otlp.go:68 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"} 2021-11-30T17:35:09.445-0800 info builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "otlp"} 2021-11-30T17:35:09.445-0800 info service/telemetry.go:92 Setting up own telemetry... 2021-11-30T17:35:09.449-0800 warn zapgrpc/zapgrpc.go:191 [core] grpc: addrConn.createTransport failed to connect to {otelcol:4317 otelcol:4317 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host" 2021-11-30T17:35:09.449-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to TRANSIENT_FAILURE 2021-11-30T17:35:09.449-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc0000f4440, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host"} 2021-11-30T17:35:09.449-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to TRANSIENT_FAILURE 2021-11-30T17:35:09.455-0800 info service/telemetry.go:116 Serving Prometheus metrics {"address": ":8888", "level": "basic", "service.instance.id": "a579c69f-edb3-4fca-8911-e6fd9a5f5344", "service.version": "latest"} 2021-11-30T17:35:09.456-0800 info service/collector.go:243 Starting cmd-otelcol... {"Version": "0.40.0-dev", "NumCPU": 12} 2021-11-30T17:35:09.456-0800 info service/collector.go:136 Everything is ready. Begin running and processing data. 2021-11-30T17:35:10.449-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to IDLE 2021-11-30T17:35:10.449-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc0000f4440, {IDLE connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host"} 2021-11-30T17:35:10.449-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to IDLE 2021-11-30T17:35:12.072-0800 info zapgrpc/zapgrpc.go:174 [transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing" 2021-11-30T17:35:12.255-0800 INFO loggingexporter/logging_exporter.go:40 TracesExporter {"#spans": 2} 2021-11-30T17:35:28.595-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to TRANSIENT_FAILURE 2021-11-30T17:35:28.595-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc0000f4440, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host"} 2021-11-30T17:35:28.595-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to TRANSIENT_FAILURE 2021-11-30T17:35:28.595-0800 info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup otelcol: no such host\"", "interval": "13.101300599s"} 2021-11-30T17:35:32.049-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to IDLE 2021-11-30T17:35:32.049-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc0000f4440, {IDLE connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host"} 2021-11-30T17:35:32.049-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to IDLE ``` DEBUG in development mode ```yaml service: telemetry: logs: level: "DEBUG" development: true encoding: "console" pipelines: traces: receivers: [otlp] processors: [batch] exporters: [logging, otlp] ``` produces ```bash 2021-11-30T17:30:55.091-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "metrics"} 2021-11-30T17:30:55.091-0800 info builder/pipelines_builder.go:54 Pipeline is starting... {"name": "pipeline", "name": "traces"} 2021-11-30T17:30:55.091-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel picks a new address "otelcol:4317" to connect 2021-11-30T17:30:55.091-0800 info builder/pipelines_builder.go:65 Pipeline is started. {"name": "pipeline", "name": "traces"} 2021-11-30T17:30:55.091-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc00011c8f0, {CONNECTING <nil>} 2021-11-30T17:30:55.091-0800 info service/service.go:101 Starting receivers... 2021-11-30T17:30:55.091-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to CONNECTING 2021-11-30T17:30:55.091-0800 info builder/receivers_builder.go:68 Receiver is starting... {"kind": "receiver", "name": "otlp"} 2021-11-30T17:30:55.091-0800 info otlpreceiver/otlp.go:68 Starting GRPC server on endpoint 0.0.0.0:4317 {"kind": "receiver", "name": "otlp"} 2021-11-30T17:30:55.091-0800 info builder/receivers_builder.go:73 Receiver started. {"kind": "receiver", "name": "otlp"} 2021-11-30T17:30:55.091-0800 info service/telemetry.go:92 Setting up own telemetry... 2021-11-30T17:30:55.101-0800 info service/telemetry.go:116 Serving Prometheus metrics {"address": ":8888", "level": "basic", "service.instance.id": "38ece474-2cdc-42a3-b442-46477936c2b7", "service.version": "latest"} 2021-11-30T17:30:55.101-0800 info service/collector.go:243 Starting cmd-otelcol... {"Version": "0.40.0-dev", "NumCPU": 12} 2021-11-30T17:30:55.101-0800 info service/collector.go:136 Everything is ready. Begin running and processing data. 2021-11-30T17:30:55.236-0800 warn zapgrpc/zapgrpc.go:191 [core] grpc: addrConn.createTransport failed to connect to {otelcol:4317 otelcol:4317 <nil> <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host" go.uber.org/zap/zapgrpc.(*Logger).Warningln go.uber.org/zap@v1.19.1/zapgrpc/zapgrpc.go:191 google.golang.org/grpc/internal/grpclog.WarningDepth google.golang.org/grpc@v1.42.0/internal/grpclog/grpclog.go:46 google.golang.org/grpc/grpclog.(*componentData).WarningDepth google.golang.org/grpc@v1.42.0/grpclog/component.go:41 google.golang.org/grpc/internal/channelz.Warningf google.golang.org/grpc@v1.42.0/internal/channelz/logging.go:75 google.golang.org/grpc.(*addrConn).createTransport google.golang.org/grpc@v1.42.0/clientconn.go:1315 google.golang.org/grpc.(*addrConn).tryAllAddrs google.golang.org/grpc@v1.42.0/clientconn.go:1249 google.golang.org/grpc.(*addrConn).resetTransport google.golang.org/grpc@v1.42.0/clientconn.go:1184 google.golang.org/grpc.(*addrConn).connect google.golang.org/grpc@v1.42.0/clientconn.go:845 2021-11-30T17:30:55.237-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to TRANSIENT_FAILURE 2021-11-30T17:30:55.237-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc00011c8f0, {TRANSIENT_FAILURE connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host"} 2021-11-30T17:30:55.237-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to TRANSIENT_FAILURE 2021-11-30T17:30:56.238-0800 info zapgrpc/zapgrpc.go:174 [core] Subchannel Connectivity change to IDLE 2021-11-30T17:30:56.238-0800 info zapgrpc/zapgrpc.go:174 [core] pickfirstBalancer: UpdateSubConnState: 0xc00011c8f0, {IDLE connection error: desc = "transport: Error while dialing dial tcp: lookup otelcol: no such host"} 2021-11-30T17:30:56.238-0800 info zapgrpc/zapgrpc.go:174 [core] Channel Connectivity change to IDLE ^C2021-11-30T17:30:58.448-0800 info service/collector.go:167 Received signal from OS {"signal": "interrupt"} 2021-11-30T17:30:58.448-0800 info service/collector.go:259 Starting shutdown... 2021-11-30 ``` **Link to tracking Issue:** #2237 **Testing:** Added unit tests and manual testing
I believe this is done, please reopen/comment if not |
Is your feature request related to a problem? Please describe.
As part of #2102, we should provide users with logs about the underlying connections provided and used by receivers and exporters. Given that quite a few of them use gRPC clients and servers, the bootstrap procedure for the collector should have a line similar to this:
This would cause messages like this when connections fail to be established:
Upon success, this would be shown:
And when there's a problem with the TLS handshake (see #1861):
Describe alternatives you've considered
While I think this is useful to be active for all profiles (dev, prod), some users might consider this too verbose. In such case, I'd be fine in replacing it only when the profile is set to
dev
(or log-level=debug).The text was updated successfully, but these errors were encountered: