Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidecar proxy with TLSV1_ALERT_PROTOCOL_VERSION error #17362

Open
CarbonCollins opened this issue Apr 25, 2023 · 11 comments
Open

Sidecar proxy with TLSV1_ALERT_PROTOCOL_VERSION error #17362

CarbonCollins opened this issue Apr 25, 2023 · 11 comments
Labels
type/bug Feature does not function as expected

Comments

@CarbonCollins
Copy link

CarbonCollins commented Apr 25, 2023

Nomad version

Output from nomad version

Nomad v1.5.3
BuildDate 2023-04-04T20:09:50Z
Revision 434f7a1745c6304d607562daa9a4a635def7153f

Output from consul version

Consul v1.15.2
Revision 5e08e229
Build Date 2023-03-30T17:51:19Z
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Operating system and Environment details

Ubuntu 22.04.2

Issue

When attempting to use consul connect on a job the side car never reaches a healthy state and prints out every few seconds:

[2023-04-25 16:59:09.648][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 77s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Reproduction steps

Have consul connect enabled and configured, deploy a job which uses an envoy sidecar for consul connect and start the job. After an init period the sidecar will start printing out the warning messages every few seconds. Both Consul and Nomad have ACL enabled.

Consul TLS, Address, and Ports configs (These are templated from ansible):

addresses {
  dns = "127.0.0.1 {{ cluster_internal_ip }}"
  http = "127.0.0.1 {{ cluster_internal_ip }}"
  https = "127.0.0.1 {{ cluster_internal_ip }}"
  grpc = "127.0.0.1 {{ cluster_internal_ip }}"
  grpc_tls = "127.0.0.1 {{ cluster_internal_ip }}"
}

ports {
  dns = {{ consul_dns_port | int }}
  http = {{ consul_http_port | int }}
  https = {{ consul_https_port | int }}
  grpc = {{ consul_grpc_port | int }}
  grpc_tls = {{ consul_grpc_tls_port | int }}
}

tls {
  defaults {
    tls_min_version = "TLSv1_3"

    ca_file = "{{ consul_tls_dir }}/certs/{{ cluster_base_domain }}-agent-ca.pem"
  }

  internal_rpc {
    verify_server_hostname = true
    verify_incoming        = true
    verify_outgoing        = true
  }

  https {
    tls_min_version = "TLSv1_3"
    ca_file         = "{{ consul_tls_dir }}/certs/ca.pem"
    cert_file       = "{{ consul_tls_dir }}/certs/local.pem"
    key_file        = "{{ consul_tls_dir }}/private/local-key.pem"
  }

  grpc {
    use_auto_cert   = true
    verify_incoming = true
  }
}

Consul server additonally has:

tls {
  defaults {
    cert_file = "{{ consul_tls_dir }}/certs/{{ cluster_datacenter }}-server-{{ cluster_base_domain }}.pem"
    key_file = "{{ consul_tls_dir }}/private/{{ cluster_datacenter }}-server-{{ cluster_base_domain }}-key.pem"
  }
}

auto_encrypt {
  allow_tls = true
}

connect {
  enabled = true
}

Nomad Consul config:

consul {
  allow_unauthenticated = false

  address = "127.0.0.1:{{ consul_https_port }}"
  grpc_address = "127.0.0.1:{{ consul_grpc_tls_port }}"

  ca_file = "{{ nomad_config_dir }}/.consul/ca.pem"
  grpc_ca_file = "{{ nomad_config_dir }}/.consul/agent-ca.pem"

  ssl = true
  verify_ssl = true
  share_ssl = true

  auto_advertise = true
  checks_use_advertise = true

  server_service_name = "nomad"
  client_service_name = "nomad-client"

  server_auto_join = false
  client_auto_join = false
}

Expected Result

Sidecar registers service with Consul and becomes healthy ready to serve traffic

Actual Result

Sidecar registers service with Consul and then does not come to a healthy state, Instead prints out warnings every few seconds like:

[2023-04-25 16:59:09.648][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 77s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Job file (if appropriate)

I'm currently attempting to run this job file:

https://gitlab.com/carboncollins-cloud/monitoring/log-management/-/blob/main/job.template.nomad.hcl

Sidecar Logs in full:

[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:406] statically linked extensions:
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.http.stateful_header_formatters: envoy.http.stateful_header_formatters.preserve_case, preserve_case
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.upstream_options: envoy.extensions.upstreams.http.v3.HttpProtocolOptions, envoy.extensions.upstreams.tcp.v3.TcpProtocolOptions, envoy.upstreams.http.http_protocol_options, envoy.upstreams.tcp.tcp_protocol_options
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   quic.http_server_connection: quic.http_server_connection.default
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.dubbo_proxy.serializers: dubbo.hessian2
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.tls.cert_validator: envoy.tls.cert_validator.default, envoy.tls.cert_validator.spiffe
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.route.early_data_policy: envoy.route.early_data_policy.default
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.dubbo_proxy.filters: envoy.filters.dubbo.router
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.access_loggers: envoy.access_loggers.file, envoy.access_loggers.http_grpc, envoy.access_loggers.open_telemetry, envoy.access_loggers.stderr, envoy.access_loggers.stdout, envoy.access_loggers.tcp_grpc, envoy.access_loggers.wasm, envoy.file_access_log, envoy.http_grpc_access_log, envoy.open_telemetry_access_log, envoy.stderr_access_log, envoy.stdout_access_log, envoy.tcp_grpc_access_log, envoy.wasm_access_log
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.formatter: envoy.formatter.metadata, envoy.formatter.req_without_query
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.http.original_ip_detection: envoy.http.original_ip_detection.custom_header, envoy.http.original_ip_detection.xff
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.grpc_credentials: envoy.grpc_credentials.aws_iam, envoy.grpc_credentials.default, envoy.grpc_credentials.file_based_metadata
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.http.stateful_session: envoy.http.stateful_session.cookie, envoy.http.stateful_session.header
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.filters.listener: envoy.filters.listener.http_inspector, envoy.filters.listener.original_dst, envoy.filters.listener.original_src, envoy.filters.listener.proxy_protocol, envoy.filters.listener.tls_inspector, envoy.listener.http_inspector, envoy.listener.original_dst, envoy.listener.original_src, envoy.listener.proxy_protocol, envoy.listener.tls_inspector
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   network.connection.client: default, envoy_internal
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.clusters: envoy.cluster.eds, envoy.cluster.logical_dns, envoy.cluster.original_dst, envoy.cluster.static, envoy.cluster.strict_dns, envoy.clusters.aggregate, envoy.clusters.dynamic_forward_proxy, envoy.clusters.redis
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.rbac.matchers: envoy.rbac.matchers.upstream_ip_port
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.quic.proof_source: envoy.quic.proof_source.filter_chain
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.stats_sinks: envoy.dog_statsd, envoy.graphite_statsd, envoy.metrics_service, envoy.stat_sinks.dog_statsd, envoy.stat_sinks.graphite_statsd, envoy.stat_sinks.hystrix, envoy.stat_sinks.metrics_service, envoy.stat_sinks.statsd, envoy.stat_sinks.wasm, envoy.statsd
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.http.cache: envoy.extensions.http.cache.file_system_http_cache, envoy.extensions.http.cache.simple
[2023-04-25 16:57:51.484][1][info][main] [source/server/server.cc:408]   envoy.network.dns_resolver: envoy.network.dns_resolver.cares, envoy.network.dns_resolver.getaddrinfo
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.quic.connection_id_generator: envoy.quic.deterministic_connection_id_generator
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.quic.server.crypto_stream: envoy.quic.crypto_stream.server.quiche
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.tracers: envoy.dynamic.ot, envoy.tracers.datadog, envoy.tracers.dynamic_ot, envoy.tracers.opencensus, envoy.tracers.opentelemetry, envoy.tracers.skywalking, envoy.tracers.xray, envoy.tracers.zipkin, envoy.zipkin
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.matching.http.input: envoy.matching.inputs.destination_ip, envoy.matching.inputs.destination_port, envoy.matching.inputs.direct_source_ip, envoy.matching.inputs.dns_san, envoy.matching.inputs.request_headers, envoy.matching.inputs.request_trailers, envoy.matching.inputs.response_headers, envoy.matching.inputs.response_trailers, envoy.matching.inputs.server_name, envoy.matching.inputs.source_ip, envoy.matching.inputs.source_port, envoy.matching.inputs.source_type, envoy.matching.inputs.status_code_class_input, envoy.matching.inputs.status_code_input, envoy.matching.inputs.subject, envoy.matching.inputs.uri_san
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.transport_sockets.downstream: envoy.transport_sockets.alts, envoy.transport_sockets.quic, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.starttls, envoy.transport_sockets.tap, envoy.transport_sockets.tcp_stats, envoy.transport_sockets.tls, raw_buffer, starttls, tls
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.matching.http.custom_matchers: envoy.matching.custom_matchers.trie_matcher
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.dubbo_proxy.protocols: dubbo
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.health_checkers: envoy.health_checkers.redis, envoy.health_checkers.thrift
[2023-04-25 16:57:51.485][1][info][main] [source/server/server.cc:408]   envoy.filters.http: envoy.bandwidth_limit, envoy.buffer, envoy.cors, envoy.csrf, envoy.ext_authz, envoy.ext_proc, envoy.fault, envoy.filters.http.adaptive_concurrency, envoy.filters.http.admission_control, envoy.filters.http.alternate_protocols_cache, envoy.filters.http.aws_lambda, envoy.filters.http.aws_request_signing, envoy.filters.http.bandwidth_limit, envoy.filters.http.buffer, envoy.filters.http.cache, envoy.filters.http.cdn_loop, envoy.filters.http.composite, envoy.filters.http.compressor, envoy.filters.http.cors, envoy.filters.http.csrf, envoy.filters.http.custom_response, envoy.filters.http.decompressor, envoy.filters.http.dynamic_forward_proxy, envoy.filters.http.ext_authz, envoy.filters.http.ext_proc, envoy.filters.http.fault, envoy.filters.http.file_system_buffer, envoy.filters.http.gcp_authn, envoy.filters.http.grpc_http1_bridge, envoy.filters.http.grpc_http1_reverse_bridge, envoy.filters.http.grpc_json_transcoder, envoy.filters.http.grpc_stats, envoy.filters.http.grpc_web, envoy.filters.http.header_to_metadata, envoy.filters.http.health_check, envoy.filters.http.ip_tagging, envoy.filters.http.jwt_authn, envoy.filters.http.local_ratelimit, envoy.filters.http.lua, envoy.filters.http.match_delegate, envoy.filters.http.oauth2, envoy.filters.http.on_demand, envoy.filters.http.original_src, envoy.filters.http.rate_limit_quota, envoy.filters.http.ratelimit, envoy.filters.http.rbac, envoy.filters.http.router, envoy.filters.http.set_metadata, envoy.filters.http.stateful_session, envoy.filters.http.tap, envoy.filters.http.wasm, envoy.grpc_http1_bridge, envoy.grpc_json_transcoder, envoy.grpc_web, envoy.health_check, envoy.ip_tagging, envoy.local_rate_limit, envoy.lua, envoy.rate_limit, envoy.router
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.wasm.runtime: envoy.wasm.runtime.null, envoy.wasm.runtime.v8
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.common.key_value: envoy.key_value.file_based
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.resolvers: envoy.ip
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.retry_host_predicates: envoy.retry_host_predicates.omit_canary_hosts, envoy.retry_host_predicates.omit_host_metadata, envoy.retry_host_predicates.previous_hosts
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.internal_redirect_predicates: envoy.internal_redirect_predicates.allow_listed_routes, envoy.internal_redirect_predicates.previous_routes, envoy.internal_redirect_predicates.safe_cross_scheme
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.bootstrap: envoy.bootstrap.internal_listener, envoy.bootstrap.wasm, envoy.extensions.network.socket_interface.default_socket_interface
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.matching.network.input: envoy.matching.inputs.application_protocol, envoy.matching.inputs.destination_ip, envoy.matching.inputs.destination_port, envoy.matching.inputs.direct_source_ip, envoy.matching.inputs.dns_san, envoy.matching.inputs.server_name, envoy.matching.inputs.source_ip, envoy.matching.inputs.source_port, envoy.matching.inputs.source_type, envoy.matching.inputs.subject, envoy.matching.inputs.transport_protocol, envoy.matching.inputs.uri_san
[2023-04-25 16:57:51.486][1][info][main] [source/server/server.cc:408]   envoy.udp_packet_writer: envoy.udp_packet_writer.default, envoy.udp_packet_writer.gso
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.thrift_proxy.filters: envoy.filters.thrift.header_to_metadata, envoy.filters.thrift.payload_to_metadata, envoy.filters.thrift.rate_limit, envoy.filters.thrift.router
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.upstreams: envoy.filters.connection_pools.tcp.generic
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.regex_engines: envoy.regex_engines.google_re2
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.guarddog_actions: envoy.watchdog.abort_action, envoy.watchdog.profile_action
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.listener_manager_impl: envoy.listener_manager_impl.default
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.connection_handler: envoy.connection_handler.default
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.path.match: envoy.path.match.uri_template.uri_template_matcher
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.path.rewrite: envoy.path.rewrite.uri_template.uri_template_rewriter
[2023-04-25 16:57:51.487][1][info][main] [source/server/server.cc:408]   envoy.load_balancing_policies: envoy.load_balancing_policies.least_request, envoy.load_balancing_policies.random, envoy.load_balancing_policies.round_robin
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.filters.http.upstream: envoy.buffer, envoy.filters.http.admission_control, envoy.filters.http.buffer, envoy.filters.http.upstream_codec
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.access_loggers.extension_filters: envoy.access_loggers.extension_filters.cel
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.resource_monitors: envoy.resource_monitors.fixed_heap, envoy.resource_monitors.injected_resource
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.retry_priorities: envoy.retry_priorities.previous_priorities
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.http.header_validators: envoy.http.header_validators.envoy_default
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.filters.network: envoy.echo, envoy.ext_authz, envoy.filters.network.connection_limit, envoy.filters.network.direct_response, envoy.filters.network.dubbo_proxy, envoy.filters.network.echo, envoy.filters.network.ext_authz, envoy.filters.network.http_connection_manager, envoy.filters.network.local_ratelimit, envoy.filters.network.mongo_proxy, envoy.filters.network.ratelimit, envoy.filters.network.rbac, envoy.filters.network.redis_proxy, envoy.filters.network.sni_cluster, envoy.filters.network.sni_dynamic_forward_proxy, envoy.filters.network.tcp_proxy, envoy.filters.network.thrift_proxy, envoy.filters.network.wasm, envoy.filters.network.zookeeper_proxy, envoy.http_connection_manager, envoy.mongo_proxy, envoy.ratelimit, envoy.redis_proxy, envoy.tcp_proxy
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.matching.network.custom_matchers: envoy.matching.custom_matchers.trie_matcher
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.compression.compressor: envoy.compression.brotli.compressor, envoy.compression.gzip.compressor, envoy.compression.zstd.compressor
[2023-04-25 16:57:51.488][1][info][main] [source/server/server.cc:408]   envoy.thrift_proxy.transports: auto, framed, header, unframed
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.matching.common_inputs: envoy.matching.common_inputs.environment_variable
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.matching.action: envoy.matching.actions.format_string, filter-chain-name
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.http.custom_response: envoy.extensions.http.custom_response.local_response_policy, envoy.extensions.http.custom_response.redirect_policy
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.compression.decompressor: envoy.compression.brotli.decompressor, envoy.compression.gzip.decompressor, envoy.compression.zstd.decompressor
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.http.early_header_mutation: envoy.http.early_header_mutation.header_mutation
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.matching.input_matchers: envoy.matching.matchers.consistent_hashing, envoy.matching.matchers.ip
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.transport_sockets.upstream: envoy.transport_sockets.alts, envoy.transport_sockets.http_11_proxy, envoy.transport_sockets.internal_upstream, envoy.transport_sockets.quic, envoy.transport_sockets.raw_buffer, envoy.transport_sockets.starttls, envoy.transport_sockets.tap, envoy.transport_sockets.tcp_stats, envoy.transport_sockets.tls, envoy.transport_sockets.upstream_proxy_protocol, raw_buffer, starttls, tls
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.filters.udp_listener: envoy.filters.udp.dns_filter, envoy.filters.udp_listener.udp_proxy
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.rate_limit_descriptors: envoy.rate_limit_descriptors.expr
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.request_id: envoy.request_id.uuid
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.config.validators: envoy.config.validators.minimum_clusters, envoy.config.validators.minimum_clusters_validator
[2023-04-25 16:57:51.489][1][info][main] [source/server/server.cc:408]   envoy.thrift_proxy.protocols: auto, binary, binary/non-strict, compact, twitter
[2023-04-25 16:57:51.499][1][warning][misc] [source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.cluster.v3.Cluster Using deprecated option 'envoy.config.cluster.v3.Cluster.http2_protocol_options' from file cluster.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/version_history/version_history for details. If continued use of this field is absolutely necessary, see https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#using-runtime-overrides-for-deprecated-features for how to apply a temporary and highly discouraged override.
[2023-04-25 16:57:51.499][1][warning][misc] [source/common/protobuf/message_validator_impl.cc:21] Deprecated field: type envoy.config.bootstrap.v3.Admin Using deprecated option 'envoy.config.bootstrap.v3.Admin.access_log_path' from file bootstrap.proto. This configuration will be removed from Envoy soon. Please see https://www.envoyproxy.io/docs/envoy/latest/version_history/version_history for details. If continued use of this field is absolutely necessary, see https://www.envoyproxy.io/docs/envoy/latest/configuration/operations/runtime#using-runtime-overrides-for-deprecated-features for how to apply a temporary and highly discouraged override.
[2023-04-25 16:57:51.499][1][info][main] [source/server/server.cc:456] HTTP header map info:
[2023-04-25 16:57:51.506][1][info][main] [source/server/server.cc:459]   request header map: 672 bytes: :authority,:method,:path,:protocol,:scheme,accept,accept-encoding,access-control-request-headers,access-control-request-method,access-control-request-private-network,authentication,authorization,cache-control,cdn-loop,connection,content-encoding,content-length,content-type,expect,grpc-accept-encoding,grpc-timeout,if-match,if-modified-since,if-none-match,if-range,if-unmodified-since,keep-alive,origin,pragma,proxy-connection,proxy-status,referer,te,transfer-encoding,upgrade,user-agent,via,x-client-trace-id,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-downstream-service-cluster,x-envoy-downstream-service-node,x-envoy-expected-rq-timeout-ms,x-envoy-external-address,x-envoy-force-trace,x-envoy-hedge-on-per-try-timeout,x-envoy-internal,x-envoy-ip-tags,x-envoy-is-timeout-retry,x-envoy-max-retries,x-envoy-original-path,x-envoy-original-url,x-envoy-retriable-header-names,x-envoy-retriable-status-codes,x-envoy-retry-grpc-on,x-envoy-retry-on,x-envoy-upstream-alt-stat-name,x-envoy-upstream-rq-per-try-timeout-ms,x-envoy-upstream-rq-timeout-alt-response,x-envoy-upstream-rq-timeout-ms,x-envoy-upstream-stream-duration-ms,x-forwarded-client-cert,x-forwarded-for,x-forwarded-host,x-forwarded-port,x-forwarded-proto,x-ot-span-context,x-request-id
[2023-04-25 16:57:51.506][1][info][main] [source/server/server.cc:459]   request trailer map: 120 bytes: 
[2023-04-25 16:57:51.506][1][info][main] [source/server/server.cc:459]   response header map: 432 bytes: :status,access-control-allow-credentials,access-control-allow-headers,access-control-allow-methods,access-control-allow-origin,access-control-allow-private-network,access-control-expose-headers,access-control-max-age,age,cache-control,connection,content-encoding,content-length,content-type,date,etag,expires,grpc-message,grpc-status,keep-alive,last-modified,location,proxy-connection,proxy-status,server,transfer-encoding,upgrade,vary,via,x-envoy-attempt-count,x-envoy-decorator-operation,x-envoy-degraded,x-envoy-immediate-health-check-fail,x-envoy-ratelimited,x-envoy-upstream-canary,x-envoy-upstream-healthchecked-cluster,x-envoy-upstream-service-time,x-request-id
[2023-04-25 16:57:51.506][1][info][main] [source/server/server.cc:459]   response trailer map: 144 bytes: grpc-message,grpc-status
[2023-04-25 16:57:51.540][1][info][main] [source/server/server.cc:819] runtime: layers:
  - name: base
    static_layer:
      re2.max_program_size.error_level: 1048576
[2023-04-25 16:57:51.543][1][info][admin] [source/server/admin/admin.cc:67] admin address: 127.0.0.2:19001
[2023-04-25 16:57:51.545][1][info][config] [source/server/configuration_impl.cc:131] loading tracing configuration
[2023-04-25 16:57:51.545][1][info][config] [source/server/configuration_impl.cc:91] loading 0 static secret(s)
[2023-04-25 16:57:51.545][1][info][config] [source/server/configuration_impl.cc:97] loading 1 cluster(s)
[2023-04-25 16:57:51.680][1][info][config] [source/server/configuration_impl.cc:101] loading 0 listener(s)
[2023-04-25 16:57:51.680][1][info][config] [source/server/configuration_impl.cc:113] loading stats configuration
[2023-04-25 16:57:51.681][1][info][runtime] [source/common/runtime/runtime_impl.cc:463] RTDS has finished initialization
[2023-04-25 16:57:51.681][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:222] cm init: initializing cds
[2023-04-25 16:57:51.682][1][warning][main] [source/server/server.cc:794] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2023-04-25 16:57:51.684][1][info][main] [source/server/server.cc:915] starting main dispatch loop
[2023-04-25 16:58:06.683][1][warning][config] [source/common/config/grpc_subscription_impl.cc:120] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.cluster.v3.Cluster
[2023-04-25 16:58:06.683][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:226] cm init: all clusters initialized
[2023-04-25 16:58:06.683][1][info][main] [source/server/server.cc:896] all clusters initialized. initializing init manager
[2023-04-25 16:58:21.688][1][warning][config] [source/common/config/grpc_subscription_impl.cc:120] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.listener.v3.Listener
[2023-04-25 16:58:21.688][1][info][config] [source/extensions/listener_managers/listener_manager/listener_manager_impl.cc:852] all dependencies initialized. starting workers
[2023-04-25 16:58:39.965][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 48s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 16:58:44.253][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 52s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 16:59:00.740][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 69s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 16:59:09.648][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 77s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 16:59:21.633][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 89s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 16:59:33.627][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 101s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:00:02.644][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 130s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:00:16.816][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 145s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:00:17.003][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 145s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:00:24.142][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 152s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:00:48.382][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 176s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:01:16.919][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 205s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:01:39.429][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 227s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:02:00.094][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 248s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:02:07.098][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 255s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:02:09.933][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 258s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:02:38.744][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 287s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION
[2023-04-25 17:02:43.870][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 292s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436526:SSL routines:OPENSSL_internal:TLSV1_ALERT_PROTOCOL_VERSION

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

I noticed that the client seems to be refusing the rpc connection but the port looks correct...

Apr 25 18:40:33 soc-se-07 nomad[1273]:     2023-04-25T18:40:33.849+0200 [INFO]  agent: (runner) creating new runner (dry: false, once: false)
Apr 25 18:40:33 soc-se-07 nomad[1273]:     2023-04-25T18:40:33.851+0200 [INFO]  agent: (runner) creating watcher
Apr 25 18:40:33 soc-se-07 nomad[1273]:     2023-04-25T18:40:33.852+0200 [INFO]  agent: (runner) starting
Apr 25 18:40:33 soc-se-07 nomad[1273]:     2023-04-25T18:40:33.863+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/691d305c-6906-c5fe-5608-15591f9968b6/loki/loc>
Apr 25 18:40:34 soc-se-07 nomad[1273]:     2023-04-25T18:40:34.224+0200 [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=9e54b97fcdb74055a3ffe5983f0e98b2384a43546a0ac4fe7760d56b2dda72d5
Apr 25 18:40:34 soc-se-07 nomad[1273]:     2023-04-25T18:40:34.560+0200 [INFO]  client.driver_mgr.docker: started container: driver=docker container_id=9e54b97fcdb74055a3ffe5983f0e98b2384a43546a0ac4fe7760d56b2dda72d5
Apr 25 18:40:34 soc-se-07 nomad[1273]:     2023-04-25T18:40:34.671+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=691d305c-6906-c5fe-5608-15591f9968b6 task=loki type=Started msg="Task started by client" failed=false
Apr 25 18:54:35 soc-se-07 nomad[1273]:     2023-04-25T18:54:35.909+0200 [ERROR] client.alloc_runner.runner_hook: error connecting to grpc: alloc_id=691d305c-6906-c5fe-5608-15591f9968b6 error="dial tcp 127.0.0.1:8503: connect: connection refused" dest=127.0.0.1:8503
Apr 25 18:57:49 soc-se-07 nomad[1273]:     2023-04-25T18:57:49.978+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=691d305c-6906-c5fe-5608-15591f9968b6 task=connect-proxy-loki type="Restart Signaled" msg="User requested running tasks to restart" 
Apr 25 18:57:49 soc-se-07 nomad[1273]:     2023-04-25T18:57:49.978+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=691d305c-6906-c5fe-5608-15591f9968b6 task=loki type="Restart Signaled" msg="User requested running tasks to restart" failed=false
Apr 25 18:57:50 soc-se-07 nomad[1273]:     2023-04-25T18:57:50.513+0200 [INFO]  client.driver_mgr.docker: stopped container: container_id=9fc25f6992873cb32cfe7dc75a4173bea01082307830048
@CarbonCollins CarbonCollins added the type/bug Feature does not function as expected label Apr 25, 2023
@CarbonCollins
Copy link
Author

CarbonCollins commented Apr 26, 2023

I realise I missed the port configs as http and grpc are disabled in consul with the following port numbers:

consul_http_port: -1
consul_https_port: 8501
consul_dns_port: 8600
consul_grpc_port: -1
consul_grpc_tls_port: 8503

@CarbonCollins
Copy link
Author

Ive spent a little more time this evening looking into the issue at least,

From a few checks and tests, If I drop the minimum supported tls version down to TLSV1_0 on the consul configurations and then restart the sidecar proxy I get a new error that mentions that It can not verify the CA

[2023-04-26 18:16:17.542][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 34s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

Going into the side car using exec and checking out the contents of /secrets/.envoy_bootstrap.cmd show that it is passing configurations into bootstrapping the envoy proxy:

connect envoy -grpc-addr unix://alloc/tmp/consul_grpc.sock -http-addr 127.0.0.1:8501 -admin-bind 127.0.0.2:19001 -address 127.0.0.1:19101 -proxy-id _nomad-task-691d305c-6906-c5fe-5608-15591f9968b6-group-loki-loki-3100-sidecar-proxy -bootstrap -token <REDACTED> -grpc-ca-file /etc/opt/nomad/.consul/agent-ca.pem -ca-file /etc/opt/nomad/.consul/ca.pem

Checking the paths define for -grpc-ca-file and -ca-file within the container leads to no file being present which might explain the reason its not able to verify the certificate authority

Should these CA files be mounted into the sidecar containers when running the nomad consul integration with share_ssl=true? or is there a manual step I am missing for this? (The CA files here are present on the node at that path)

@shoenig
Copy link
Member

shoenig commented Apr 26, 2023

Hi @CarbonCollins, did the CA and certs come from the consul tls command or were they generated by other means?

I'm not sure the CA path is relevant here; the envoy bootstrap command turns the CA into a string directly embedded in the envoy bootstrap config file.

@CarbonCollins
Copy link
Author

CarbonCollins commented Apr 27, 2023

The CA and cert for the grpc endpoint should have come from the consul tls command, the ca and certs for the https was created separately / self-signed

@CarbonCollins
Copy link
Author

so just to follow up on the previous message, I re-generated my consul agent CA and server Certs and ended up with the same error.

For generating the certs I used:
ca:

consul tls ca create -domain cloud.carboncollins.se -common-name="CarbonCollins-Cloud Consul Agent CA"

server:

consul tls cert create -server -ca ./cloud.carboncollins.se-agent-ca.pem -key ./cloud.carboncollins.se-agent-ca-key.pem -dc soc -domain cloud.carboncollins.se -node soc-se-00

so I can confirm that the CA and certs for the grpc endpoint are from the consul tls command

@CarbonCollins
Copy link
Author

Been further looking into this myself to see if I missed anything in the logs, checking in the consul server and client logs this time and found:

Consul Server Logs
May 03 20:17:09 soc-se-02 consul[34967]: 2023-05-03T20:17:09.930+0200 [INFO]  agent.server: New leader elected: payload=soc-se-01
May 03 20:17:10 soc-se-02 consul[34967]: 2023-05-03T20:17:10.137+0200 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-leaf error="rpc error making call: ACL not found" index=0
May 03 20:17:10 soc-se-02 consul[34967]: 2023-05-03T20:17:10.137+0200 [ERROR] agent.server.cert-manager: failed to handle cache update event: error="leaf cert watch returned an error: rpc error making call: ACL not found"
May 03 20:17:13 soc-se-02 consul[34967]: 2023-05-03T20:17:13.771+0200 [INFO]  agent: Synced node info
Consul Client Logs where sidecar is running
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.857+0200 [WARN]  agent.client: Retrying RPC to server: method=ConnectCA.Roots server=10.0.10.12:8300 error="rpc error making call: EOF"
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.930+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.10:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.931+0200 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused" index=9
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.931+0200 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused" index=9
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.944+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.11:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.945+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.12:8300 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:00 soc-se-07 consul[187397]: 2023-05-03T20:17:00.948+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.10:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.020+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.10:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.021+0200 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=connect-ca-root error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused" index=9
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.024+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.11:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.025+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.12:8300 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.028+0200 [ERROR] agent.client: RPC failed to server: method=ConnectCA.Roots server=10.0.10.10:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.149+0200 [ERROR] agent.client: RPC failed to server: method=Intention.Match server=10.0.10.10:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.150+0200 [WARN]  agent.cache: handling error in Cache.Notify: cache-type=intention-match error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused" index=10
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.153+0200 [ERROR] agent.client: RPC failed to server: method=Intention.Match server=10.0.10.11:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.155+0200 [ERROR] agent.client: RPC failed to server: method=Intention.Match server=10.0.10.12:8300 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.158+0200 [ERROR] agent.client: RPC failed to server: method=Intention.Match server=10.0.10.10:8300 error="rpc error making call: rpc error getting client: failed to get conn: dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:01 soc-se-07 consul[187397]: 2023-05-03T20:17:01.775+0200 [WARN]  agent: [core][Channel hashicorp/nomad#1 SubChannel hashicorp/nomad#1683] grpc: addrConn.createTransport failed to connect to {
May 03 20:17:01 soc-se-07 consul[187397]:   "Addr": "soc-10.0.10.12:8300",
May 03 20:17:01 soc-se-07 consul[187397]:   "ServerName": "soc-se-02",
May 03 20:17:01 soc-se-07 consul[187397]:   "Attributes": null,
May 03 20:17:01 soc-se-07 consul[187397]:   "BalancerAttributes": null,
May 03 20:17:01 soc-se-07 consul[187397]:   "Type": 0,
May 03 20:17:01 soc-se-07 consul[187397]:   "Metadata": null
May 03 20:17:01 soc-se-07 consul[187397]: }. Err: connection error: desc = "transport: Error while dialing dial tcp <nil>->10.0.10.12:8300: connect: connection refused"
May 03 20:17:02 soc-se-07 consul[187397]: 2023-05-03T20:17:02.779+0200 [INFO]  agent.client.serf.lan: serf: EventMemberUpdate: soc-se-02
May 03 20:17:02 soc-se-07 consul[187397]: 2023-05-03T20:17:02.780+0200 [INFO]  agent.client: updating server: server="soc-se-02 (Addr: tcp/10.0.10.12:8300) (DC: soc)"

From this info does it seem like the problem stems more from Consul rather than Nomad?

@CarbonCollins
Copy link
Author

CarbonCollins commented May 6, 2023

Just to update this issue with where I am currently.

I have upgraded from Nomad 1.5.3 to 1.5.4 and yesterday up to 1.5.5 and on both of these newer versions I am still getting the same issue.

I have also updated how Consul is getting consul CA certs now. Its using the vault provider now using the following config

Consul server connect block
connect {
  ca_provider = "vault"

  ca_config {
    address = "https://127.0.0.1:{{ vault_api_port }}"
    ca_file = "{{ consul_config_dir }}/.vault/ca.pem"

    token = "{{ consul_connect_vault_token }}"

    root_pki_path = "c3-connect-root"
    intermediate_pki_path = "c3-connect-soc-intermediate"
  }
}

After this change I am no longer seeing any leaf cert issues that I mentioned in my previous message now

Looking in Vault I see the certificates being generated as I am letting consul be the one to manage these certs.

Within the sidecar I'm seeing logs similar to what was reported in: hashicorp/nomad#15360 where the sidecar starts but reports connection refused errors and have checked again to ensure the steps in: https://developer.hashicorp.com/consul/docs/upgrading/upgrade-specific#service-mesh-compatibility-1 were done.

Checking within the sidecar in /secrets/.envoy_bootstrap.cmd I can see that the -grpc-ca-file flag is being set and like #17362 mentioned previously the file path that is set does not seem to be relevant but within the /secrets/envoy_bootstrap.json I can see that the agent-ca.pem has been rendered correctly under the static_resources.clusters[0].transport_socket.typed_config.common_tls_context.validation_context.trusted_ca.inline_string path in that json file.

I did also stumble across #15913 which does not have any thing I need to actively do to fix afaik, but checking the file changes in there and comparing my envoy_bootstrap.json to the golden one defined for tests seems to at least have the correct format bar a few id changes, service names, and ports. The only one that maybe looks a bit odd (but I have no idea if its a problem or not) is the path to the console.sock file which in my config is alloc/tmp/consul_grpc.sock but in the golden one for those changes is /var/run/consul.sock on top of this my envoy command has the -grpc-addr unix://alloc/tmp/consul_grpc.sock set which has the _grpc suffix on that sock so I am not sure if this is an issue or not...

I'm going to continue attempting to figure out what is going on whenever I have a bit of time but its not been a fun time getting my cluster back up and running after it being offline for a good 3/4 of a year 😅

@CarbonCollins
Copy link
Author

Ok so some more info, I moved onto Nomad and getting its RPC set up with TLS, after deploying all of my Nomad servers and clients with tls enabled on rpc (using the nomad tls cli commands), my sidecars are now showing a different behaviour.

They are no longer complaining about the TLSV1_ALERT_PROTOCOL_VERSION and instead have now changed to the following:

[2023-05-12 17:49:59.521][1][info][admin] [source/server/admin/admin.cc:67] admin address: 127.0.0.2:19001
[2023-05-12 17:49:59.524][1][info][config] [source/server/configuration_impl.cc:131] loading tracing configuration
[2023-05-12 17:49:59.524][1][info][config] [source/server/configuration_impl.cc:91] loading 0 static secret(s)
[2023-05-12 17:49:59.524][1][info][config] [source/server/configuration_impl.cc:97] loading 1 cluster(s)
[2023-05-12 17:49:59.829][1][info][config] [source/server/configuration_impl.cc:101] loading 0 listener(s)
[2023-05-12 17:49:59.829][1][info][config] [source/server/configuration_impl.cc:113] loading stats configuration
[2023-05-12 17:49:59.831][1][info][runtime] [source/common/runtime/runtime_impl.cc:463] RTDS has finished initialization
[2023-05-12 17:49:59.831][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:222] cm init: initializing cds
[2023-05-12 17:49:59.832][1][warning][main] [source/server/server.cc:794] there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
[2023-05-12 17:49:59.836][1][info][main] [source/server/server.cc:915] starting main dispatch loop
[2023-05-12 17:50:14.833][1][warning][config] [source/common/config/grpc_subscription_impl.cc:120] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.cluster.v3.Cluster
[2023-05-12 17:50:14.833][1][info][upstream] [source/common/upstream/cluster_manager_impl.cc:226] cm init: all clusters initialized
[2023-05-12 17:50:14.833][1][info][main] [source/server/server.cc:896] all clusters initialized. initializing init manager
[2023-05-12 17:50:29.837][1][warning][config] [source/common/config/grpc_subscription_impl.cc:120] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.listener.v3.Listener
[2023-05-12 17:50:29.837][1][info][config] [source/extensions/listener_managers/listener_manager/listener_manager_impl.cc:852] all dependencies initialized. starting workers
[2023-05-12 17:50:36.661][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 36s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:50:37.623][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 37s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:50:45.084][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 45s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:51:08.229][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 68s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:51:34.073][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 94s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:51:52.334][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 112s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:51:52.627][1][warning][config] [./source/common/config/grpc_stream.h:201] DeltaAggregatedResources gRPC config stream to local_agent closed since 112s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2023-05-12 17:51:52.702][1][warning][main] [source/server/server.cc:854] caught ENVOY_SIGTERM
[2023-05-12 17:51:52.702][1][info][main] [source/server/server.cc:985] shutting down server instance
[2023-05-12 17:51:52.702][1][info][main] [source/server/server.cc:920] main dispatch loop exited
[2023-05-12 17:51:52.703][1][info][main] [source/server/server.cc:972] exiting

The side car is also throwing a ENVOY_SIGTERM after failing to validate the certificate and restarting periodically now.

Did I miss a requirement for this when setting up consul connect with nomad where if consul has TLS enabled on its RPC then Nomad also needs its own RPC enabled with TLS?

@tgross
Copy link
Member

tgross commented May 15, 2023

Did I miss a requirement for this when setting up consul connect with nomad where if consul has TLS enabled on its RPC then Nomad also needs its own RPC enabled with TLS?

Nomad never talks to Envoy. It only spins up the sidecar task and plumbs thru configuration values that end up in its CLI args. So any TLS issues you're seeing can only be between Consul and Envoy. Consul is the local_agent referred to in the error log you're seeing.

I'm sorry to do this to you but at this point I feel fairly confident this isn't an issue with Nomad. I'm going to move the issue into the Consul repository so that it can get some attention from folks who have more expertise in Consul/Envoy TLS configuration than I do.

@adizam adizam transferred this issue from hashicorp/nomad May 15, 2023
@CarbonCollins
Copy link
Author

No need to be sorry, Coincidentally today I did figure out and confirm that it was a Consul issue... specifically with my client nodes.

I updated the nomad configuration to point directly to the consul server instead of the local agent and everything is happy as can be...

so it currently seems that my local consul agents are not handing TLS correctly right now...

@t-davies
Copy link
Contributor

t-davies commented Sep 20, 2023

FYI, also running into this and with trace logging on the Consul agent that is on the Nomad client I see a bunch of errors relating to the TLS version.

agent: [core][Server #2] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:54522\") failed: tls: client offered only unsupported versions: [303 302 301]"

Dropping tls_min_version down to TLSv1_2, I now see:

[core][Server #2] grpc: Server.Serve failed to create ServerTransport: connection error: desc = "ServerHandshake(\"127.0.0.1:51722\") failed: tls: client didn't provide a certificate"

and in Envoy

[2023-09-20 11:28:44.901][1][warning][config] [./source/common/config/grpc_stream.h:191] DeltaAggregatedResources gRPC config stream to local_agent closed since 40s ago: 14, upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436498:SSL routines:OPENSSL_internal:SSLV3_ALERT_BAD_CERTIFICATE

Edit: ahh which seems to be related to #13124 - where the current workaround is to turn off verify_incoming for gRPC. Seems less than ideal but I guess it is functional. I can only listen for gRPC on localhost too I guess to minimise the risk until this is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected
Projects
None yet
Development

No branches or pull requests

4 participants