Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: stream_api handler "prometheus" response is 26280 bytes. Only 8000 bytes is supported #7377

Closed
joelsdc opened this issue May 27, 2021 · 25 comments · Fixed by #8641
Closed
Labels
core/streams Refers to the streams subsystem plugins/prometheus task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not.

Comments

@joelsdc
Copy link

joelsdc commented May 27, 2021

Summary

  • Kong version (2.4.1) using declarative config (db-less) running via docker.
  • We are seeing in the kong_admin_error.log files the following error:
2021/05/26 19:58:36 [error] 27#0: *112517 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/kong/tools/stream_api.lua:109: stream_api handler "prometheus" response is 26280 bytes.  Only 8000 bytes is supported
stack traceback:
coroutine 0:
	[C]: in function 'error'
	/usr/local/share/lua/5.1/kong/tools/stream_api.lua:109: in function 'handle'
	/usr/local/share/lua/5.1/kong/init.lua:1467: in function 'stream_api'
	content_by_lua(nginx-kong-stream.conf:116):2: in main chunk, udp client: unix:, server: unix:/usr/local/kong/stream_rpc.sock
  if #res > MAX_DATA_LEN then
    error(st_format(
      "stream_api handler %q response is %d bytes.  Only %d bytes is supported",
      key, #res, MAX_DATA_LEN))
  end
  • And MAX_DATA_LEN does have a MAX of 8000 here (/kong/tools/stream_api.lua):
local MAX_DATA_LEN = 8000
  • This started happening after enabling a UDP stream service. With only http services the error does not happen.

Steps To Reproduce

  1. Have multiple HTTP services
  2. Have prometheus plugin enabled globally
  3. All is good
  4. Enable a UDP stream service
  5. Observe kong admin error logs: every time prometheus server scrapes the kong exporter (in our case 30s) we see the mentioned error.

Additional Details & Logs

  • Kong node info:
{
  "version": "2.4.1",
  "node_id": "e5dcfb31-7e5d-48cb-a27c-757df37535d8",
  "pids": {
    "master": 1,
    "workers": [
      27,
      28,
      29,
      30,
      31,
      32,
      33,
      34
    ]
  },
  "configuration": {
    "cassandra_contact_points": [
      "127.0.0.1"
    ],
    "cassandra_port": 9042,
    "cassandra_ssl": false,
    "cassandra_ssl_verify": false,
    "cassandra_write_consistency": "ONE",
    "cassandra_read_consistency": "ONE",
    "cassandra_lb_policy": "RequestRoundRobin",
    "cassandra_refresh_frequency": 60,
    "cassandra_repl_strategy": "SimpleStrategy",
    "cassandra_repl_factor": 1,
    "cassandra_data_centers": [
      "dc1:2",
      "dc2:3"
    ],
    "cassandra_schema_consensus_timeout": 10000,
    "upstream_keepalive_pool_size": 0,
    "ssl_protocols": "TLSv1.1 TLSv1.2 TLSv1.3",
    "lua_package_path": "./?.lua;./?/init.lua;",
    "nginx_http_ssl_protocols": "TLSv1.2 TLSv1.3",
    "nginx_stream_ssl_protocols": "TLSv1.2 TLSv1.3",
    "ssl_prefer_server_ciphers": "on",
    "nginx_http_ssl_prefer_server_ciphers": "off",
    "nginx_stream_ssl_prefer_server_ciphers": "off",
    "ssl_dhparam": "ffdhe2048",
    "nginx_http_ssl_dhparam": "ffdhe2048",
    "nginx_stream_ssl_dhparam": "ffdhe2048",
    "ssl_session_tickets": "on",
    "nginx_http_ssl_session_tickets": "on",
    "nginx_stream_ssl_session_tickets": "on",
    "ssl_session_timeout": "1d",
    "upstream_keepalive_idle_timeout": 60,
    "mem_cache_size": "128m",
    "proxy_access_log": "/usr/local/kong/logs/kong_proxy_access.log custom_fmt",
    "proxy_error_log": "/usr/local/kong/logs/kong_proxy_error.log",
    "proxy_stream_access_log": "logs/access.log basic",
    "proxy_stream_error_log": "logs/error.log",
    "admin_access_log": "/usr/local/kong/logs/kong_admin_access.log custom_fmt",
    "cluster_mtls": "shared",
    "status_access_log": "/usr/local/kong/logs/kong_status_access.log custom_fmt",
    "status_error_log": "/usr/local/kong/logs/kong_status_error.logg",
    "log_level": "notice",
    "database": "off",
    "nginx_optimizations": true,
    "go_plugins_dir": "off",
    "lua_ssl_verify_depth": 1,
    "lua_ssl_protocols": "TLSv1.1 TLSv1.2 TLSv1.3",
    "nginx_http_lua_ssl_protocols": "TLSv1.1 TLSv1.2 TLSv1.3",
    "nginx_stream_lua_ssl_protocols": "TLSv1.1 TLSv1.2 TLSv1.3",
    "go_pluginserver_exe": "/usr/local/bin/go-pluginserver",
    "untrusted_lua": "sandbox",
    "untrusted_lua_sandbox_requires": {},
    "untrusted_lua_sandbox_environment": {},
    "stream_proxy_ssl_enabled": false,
    "admin_ssl_enabled": true,
    "status_ssl_enabled": false,
    "db_cache_warmup_entities": [
      "services"
    ],
    "enabled_headers": {
      "X-Kong-Admin-Latency": true,
      "latency_tokens": true,
      "X-Kong-Upstream-Latency": true,
      "X-Kong-Upstream-Status": false,
      "server_tokens": true,
      "Via": true,
      "X-Kong-Proxy-Latency": true,
      "Server": true,
      "X-Kong-Response-Latency": true
    },
    "db_update_frequency": 5,
    "host_ports": {
      "8000": 80,
      "8443": 443
    },
    "anonymous_reports": false,
    "db_cache_ttl": 0,
    "cluster_ocsp": "off",
    "cassandra_timeout": 5000,
    "pg_timeout": 5000,
    "worker_state_update_frequency": 5,
    "dns_resolver": {},
    "dns_hostsfile": "/etc/hosts",
    "dns_error_ttl": 1,
    "nginx_main_directives": [
      {
        "name": "daemon",
        "value": "off"
      },
      {
        "name": "worker_processes",
        "value": "auto"
      },
      {
        "name": "worker_rlimit_nofile",
        "value": "auto"
      }
    ],
    "dns_stale_ttl": 4,
    "dns_order": [
      "LAST",
      "SRV",
      "A",
      "CNAME"
    ],
    "client_ssl": false,
    "nginx_http_directives": [
      {
        "name": "client_body_buffer_size",
        "value": "10m"
      },
      {
        "name": "client_max_body_size",
        "value": "0"
      },
      {
        "name": "log_format",
        "value": "custom_fmt '$remote_addr $host - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\"'"
      },
      {
        "name": "lua_shared_dict",
        "value": "prometheus_metrics 5m"
      },
      {
        "name": "lua_ssl_protocols",
        "value": "TLSv1.1 TLSv1.2 TLSv1.3"
      },
      {
        "name": "ssl_dhparam",
        "value": "/usr/local/kong/ssl/ffdhe2048.pem"
      },
      {
        "name": "ssl_prefer_server_ciphers",
        "value": "off"
      },
      {
        "name": "ssl_protocols",
        "value": "TLSv1.2 TLSv1.3"
      },
      {
        "name": "ssl_session_tickets",
        "value": "on"
      },
      {
        "name": "ssl_session_timeout",
        "value": "1d"
      }
    ],
    "nginx_upstream_directives": {},
    "nginx_proxy_directives": [
      {
        "name": "real_ip_header",
        "value": "X-Real-IP"
      },
      {
        "name": "real_ip_recursive",
        "value": "off"
      }
    ],
    "nginx_status_directives": {},
    "ssl_cert": [
      "/usr/local/kong/ssl/kong-default.crt",
      "/usr/local/kong/ssl/kong-default-ecdsa.crt"
    ],
    "error_default_type": "text/plain",
    "nginx_stream_directives": [
      {
        "name": "lua_shared_dict",
        "value": "stream_prometheus_metrics 5m"
      },
      {
        "name": "lua_ssl_protocols",
        "value": "TLSv1.1 TLSv1.2 TLSv1.3"
      },
      {
        "name": "ssl_dhparam",
        "value": "/usr/local/kong/ssl/ffdhe2048.pem"
      },
      {
        "name": "ssl_prefer_server_ciphers",
        "value": "off"
      },
      {
        "name": "ssl_protocols",
        "value": "TLSv1.2 TLSv1.3"
      },
      {
        "name": "ssl_session_tickets",
        "value": "on"
      },
      {
        "name": "ssl_session_timeout",
        "value": "1d"
      }
    ],
    "proxy_ssl_enabled": true,
    "nginx_sproxy_directives": {},
    "loaded_plugins": {
      "ldap-auth": true,
      "statsd": true,
      "bot-detection": true,
      "aws-lambda": true,
      "request-termination": true,
      "azure-functions": true,
      "zipkin": true,
      "pre-function": true,
      "post-function": true,
      "prometheus": true,
      "proxy-cache": true,
      "session": true,
      "acme": true,
      "grpc-web": true,
      "grpc-gateway": true,
      "jwt": true,
      "acl": true,
      "correlation-id": true,
      "cors": true,
      "oauth2": true,
      "tcp-log": true,
      "udp-log": true,
      "file-log": true,
      "http-log": true,
      "key-auth": true,
      "hmac-auth": true,
      "basic-auth": true,
      "ip-restriction": true,
      "request-transformer": true,
      "response-transformer": true,
      "request-size-limiting": true,
      "rate-limiting": true,
      "response-ratelimiting": true,
      "syslog": true,
      "loggly": true,
      "datadog": true
    },
    "pg_ro_ssl_verify": false,
    "nginx_http_upstream_directives": {},
    "role": "traditional",
    "lua_ssl_trusted_certificate": {},
    "nginx_pid": "/usr/local/kong/pids/nginx.pid",
    "nginx_http_ssl_session_timeout": "1d",
    "pluginserver_names": {},
    "nginx_err_logs": "/usr/local/kong/logs/error.log",
    "upstream_keepalive_max_requests": 100,
    "pg_semaphore_timeout": 60000,
    "nginx_acc_logs": "/usr/local/kong/logs/access.log",
    "nginx_kong_stream_conf": "/usr/local/kong/nginx-kong-stream.conf",
    "admin_acc_logs": "/usr/local/kong/logs/admin_access.log",
    "kong_env": "/usr/local/kong/.kong_env",
    "nginx_conf": "/usr/local/kong/nginx.conf",
    "dns_no_sync": false,
    "nginx_kong_conf": "/usr/local/kong/nginx-kong.conf",
    "admin_error_log": "/usr/local/kong/logs/kong_admin_error.log",
    "nginx_main_worker_rlimit_nofile": "auto",
    "pg_port": 5432,
    "nginx_events_worker_connections": "auto",
    "ssl_cert_csr_default": "/usr/local/kong/ssl/kong-default.csr",
    "nginx_events_multi_accept": "on",
    "ssl_cert_default": "/usr/local/kong/ssl/kong-default.crt",
    "nginx_events_directives": [
      {
        "name": "multi_accept",
        "value": "on"
      },
      {
        "name": "worker_connections",
        "value": "auto"
      }
    ],
    "ssl_cert_key_default": "/usr/local/kong/ssl/kong-default.key",
    "dns_not_found_ttl": 30,
    "ssl_cert_default_ecdsa": "/usr/local/kong/ssl/kong-default-ecdsa.crt",
    "plugins": [
      "bundled"
    ],
    "ssl_cert_key_default_ecdsa": "/usr/local/kong/ssl/kong-default-ecdsa.key",
    "nginx_http_status_directives": {},
    "client_ssl_cert_default": "/usr/local/kong/ssl/kong-default.crt",
    "client_ssl_cert_key_default": "/usr/local/kong/ssl/kong-default.key",
    "nginx_http_log_format": "custom_fmt '$remote_addr $host - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\"'",
    "admin_ssl_cert_default": "/usr/local/kong/ssl/admin-kong-default.crt",
    "admin_ssl_cert_key_default": "/usr/local/kong/ssl/admin-kong-default.key",
    "real_ip_recursive": "off",
    "admin_ssl_cert_default_ecdsa": "/usr/local/kong/ssl/admin-kong-default-ecdsa.crt",
    "db_update_propagation": 0,
    "admin_ssl_cert_key_default_ecdsa": "/usr/local/kong/ssl/admin-kong-default-ecdsa.key",
    "nginx_admin_client_max_body_size": "10m",
    "status_ssl_cert_default": "/usr/local/kong/ssl/status-kong-default.crt",
    "cluster_data_plane_purge_delay": 1209600,
    "nginx_admin_client_body_buffer_size": "10m",
    "cluster_listeners": [
      {
        "ip": "0.0.0.0",
        "port": 8005,
        "bind": false,
        "ssl": false,
        "http2": false,
        "proxy_protocol": false,
        "deferred": false,
        "reuseport": false,
        "backlog=%d+": false,
        "listener": "0.0.0.0:8005"
      }
    ],
    "status_ssl_cert_default_ecdsa": "/usr/local/kong/ssl/status-kong-default-ecdsa.crt",
    "pg_user": "kong",
    "status_ssl_cert_key_default_ecdsa": "/usr/local/kong/ssl/status-kong-default-ecdsa.key",
    "declarative_config": "/usr/local/kong/declarative/kong.yml",
    "nginx_supstream_directives": {},
    "cluster_control_plane": "127.0.0.1:8005",
    "lua_socket_pool_size": 30,
    "ssl_ciphers": "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384",
    "proxy_listen": [
      "0.0.0.0:8000 reuseport backlog=16384",
      "0.0.0.0:8443 http2 ssl reuseport backlog=16384"
    ],
    "admin_ssl_cert": [
      "/usr/local/kong/ssl/admin-kong-default.crt",
      "/usr/local/kong/ssl/admin-kong-default-ecdsa.crt"
    ],
    "db_resurrect_ttl": 30,
    "pg_host": "127.0.0.1",
    "ssl_cert_key": [
      "/usr/local/kong/ssl/kong-default.key",
      "/usr/local/kong/ssl/kong-default-ecdsa.key"
    ],
    "status_ssl_cert": {},
    "port_maps": [
      "80:8000",
      "443:8443"
    ],
    "pg_database": "kong",
    "admin_listen": [
      "0.0.0.0:8001",
      "0.0.0.0:8444 ssl"
    ],
    "status_listen": [
      "0.0.0.0:8100"
    ],
    "stream_listen": [
      "0.0.0.0:5353 udp"
    ],
    "cluster_listen": [
      "0.0.0.0:8005"
    ],
    "kic": false,
    "admin_ssl_cert_key": [
      "/usr/local/kong/ssl/admin-kong-default.key",
      "/usr/local/kong/ssl/admin-kong-default-ecdsa.key"
    ],
    "admin_listeners": [
      {
        "ip": "0.0.0.0",
        "port": 8001,
        "bind": false,
        "ssl": false,
        "http2": false,
        "proxy_protocol": false,
        "deferred": false,
        "reuseport": false,
        "backlog=%d+": false,
        "listener": "0.0.0.0:8001"
      },
      {
        "ip": "0.0.0.0",
        "port": 8444,
        "bind": false,
        "ssl": true,
        "http2": false,
        "proxy_protocol": false,
        "deferred": false,
        "reuseport": false,
        "backlog=%d+": false,
        "listener": "0.0.0.0:8444 ssl"
      }
    ],
    "status_ssl_cert_key": {},
    "proxy_listeners": [
      {
        "ip": "0.0.0.0",
        "port": 8000,
        "bind": false,
        "ssl": false,
        "http2": false,
        "proxy_protocol": false,
        "deferred": false,
        "reuseport": true,
        "backlog=16384": true,
        "listener": "0.0.0.0:8000 reuseport backlog=16384"
      },
      {
        "ip": "0.0.0.0",
        "port": 8443,
        "bind": false,
        "ssl": true,
        "http2": true,
        "proxy_protocol": false,
        "deferred": false,
        "reuseport": true,
        "backlog=16384": true,
        "listener": "0.0.0.0:8443 ssl http2 reuseport backlog=16384"
      }
    ],
    "ssl_cipher_suite": "intermediate",
    "stream_listeners": [
      {
        "ip": "0.0.0.0",
        "port": 5353,
        "bind": false,
        "ssl": false,
        "listener": "0.0.0.0:5353 udp",
        "proxy_protocol": false,
        "reuseport": false,
        "backlog=%d+": false,
        "udp": true
      }
    ],
    "nginx_daemon": "off",
    "worker_consistency": "strict",
    "nginx_main_daemon": "off",
    "nginx_worker_processes": "auto",
    "nginx_main_worker_processes": "auto",
    "nginx_stream_ssl_session_timeout": "1d",
    "trusted_ips": {},
    "real_ip_header": "X-Real-IP",
    "nginx_proxy_real_ip_header": "X-Real-IP",
    "lua_package_cpath": "",
    "nginx_proxy_real_ip_recursive": "off",
    "client_max_body_size": "0",
    "nginx_http_client_max_body_size": "0",
    "client_body_buffer_size": "8k",
    "nginx_http_client_body_buffer_size": "10m",
    "status_listeners": [
      {
        "ssl": false,
        "ip": "0.0.0.0",
        "listener": "0.0.0.0:8100",
        "port": 8100
      }
    ],
    "status_ssl_cert_key_default": "/usr/local/kong/ssl/status-kong-default.key",
    "pg_ssl": false,
    "pg_ssl_verify": false,
    "pg_max_concurrent_queries": 0,
    "cassandra_keyspace": "kong",
    "headers": [
      "server_tokens",
      "latency_tokens"
    ],
    "nginx_admin_directives": [
      {
        "name": "client_body_buffer_size",
        "value": "10m"
      },
      {
        "name": "client_max_body_size",
        "value": "10m"
      }
    ],
    "prefix": "/usr/local/kong",
    "pg_ro_ssl": false,
    "cassandra_username": "kong"
  },
  "plugins": {
    "enabled_in_cluster": [
      "prometheus",
      "ip-restriction",
      "request-termination"
    ],
    "available_on_server": {
      "ldap-auth": true,
      "statsd": true,
      "bot-detection": true,
      "aws-lambda": true,
      "request-termination": true,
      "azure-functions": true,
      "zipkin": true,
      "pre-function": true,
      "post-function": true,
      "prometheus": true,
      "proxy-cache": true,
      "session": true,
      "acme": true,
      "grpc-web": true,
      "grpc-gateway": true,
      "jwt": true,
      "acl": true,
      "correlation-id": true,
      "cors": true,
      "oauth2": true,
      "tcp-log": true,
      "udp-log": true,
      "file-log": true,
      "http-log": true,
      "key-auth": true,
      "hmac-auth": true,
      "basic-auth": true,
      "ip-restriction": true,
      "request-transformer": true,
      "response-transformer": true,
      "request-size-limiting": true,
      "rate-limiting": true,
      "response-ratelimiting": true,
      "syslog": true,
      "loggly": true,
      "datadog": true
    }
  },
  "hostname": "d0b6f7121b4d",
  "tagline": "Welcome to kong",
  "timers": {
    "running": 25,
    "pending": 3
  },
  "lua_version": "LuaJIT 2.1.0-beta3"
}
  • Ubuntu 20 & Docker 20.10.6

  • This is a stripped down version of the config we are working, with sensitive info replaced. Please excuse me if there are any mistakes:

_format_version: "2.1"
_transform: true

services:
- name: SERVICE1_INTERNAL
  url: http://localhost
  routes:
  - name: catch_all
    paths: ["/"]
    hosts: ["host1.example.com"]
    protocols: ['https']
- name: SERVICE2_INTERNAL
  url: http://service2
  routes:
  - name: old_endpoint
    paths: ['/']
    hosts: ['oldhost1.example.com', 'oldhost2.example.com']
    protocols: ['http']
    preserve_host: true
    strip_path: false
# - name: SERVICE3_INTERNAL
#   url: udp://service3
#   routes:
#   - name: dns
#     protocols: ['udp']
#     destinations:
#     - port: 5353

upstreams:
- name: service2
  healthchecks:
    active:
      type: http
      http_path: /_health/check
      healthy:
        successes: 2
        interval: 2
      unhealthy:
        tcp_failures: 3
        timeouts: 3
        http_failures: 3
        interval: 5
  targets:
  - target: service2_server1:80
    weight: 100
  - target: service2_server2:80
    weight: 100
  - target: service2_server3:80
    weight: 100
# - name: service3
#   targets:
#   - target: 1.2.3.4:53
#     weight: 100
#   - target: 1.2.3.5:53
#     weight: 100

plugins:
- name: prometheus
- name: request-termination
  service: SERVICE1_INTERNAL
  config:
    status_code: 200
    message: So long and thanks for all the fish!
- name: ip-restriction
  service: SERVICE2_INTERNAL
  config:
    allow:
    - 10.0.0.0/8
    - 172.16.0.0/12
    - 192.168.0.0/16

certificates:
  - snis:
      - name: host1.example.com
    key: |-
      -----BEGIN RSA PRIVATE KEY-----
      -----END RSA PRIVATE KEY-----
    cert: |-
      -----BEGIN CERTIFICATE-----
      -----END CERTIFICATE-----
      -----BEGIN CERTIFICATE-----
      -----END CERTIFICATE-----

If you enable the two small commented out sections, the error begins. Comment them out again, error is gone.

Extra Notes

  • If this issue belongs in the kong-plugin-prometheus please let me know and I'll close/open the issues appropriately.

  • Although we see the error in the logs, the response code returned to prom is still a 200 OK:

1.2.3.4 6.7.8.9 - - [27/May/2021:00:14:36 +0000] "GET /metrics HTTP/1.1" 200 26444 "-" "Prometheus/2.24.0"
@joelsdc
Copy link
Author

joelsdc commented May 27, 2021

Logs from all files:

==> kong_status_error.log <==
2021/05/27 02:41:36 [error] 27#0: *10908 [kong] exporter.lua:351 failed to collect stream metrics: retrieving stream-api response: timeout, client: 1.2.3.4, server: kong_status, request: "GET /metrics HTTP/1.1", host: "6.7.8.9:8100"

==> kong_status_access.log <==
1.2.3.4 6.7.8.9 - - [27/May/2021:02:41:36 +0000] "GET /metrics HTTP/1.1" 200 27891 "-" "Prometheus/2.24.0"

==> kong_admin_error.log <==
2021/05/27 02:41:36 [error] 28#0: *14412 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/kong/tools/stream_api.lua:109: stream_api handler "prometheus" response is 27725 bytes.  Only 8000 bytes is supported
stack traceback:
coroutine 0:
	[C]: in function 'error'
	/usr/local/share/lua/5.1/kong/tools/stream_api.lua:109: in function 'handle'
	/usr/local/share/lua/5.1/kong/init.lua:1467: in function 'stream_api'
	content_by_lua(nginx-kong-stream.conf:118):2: in main chunk, udp client: unix:, server: unix:/usr/local/kong/stream_rpc.sock

@fffonion fffonion added the core/streams Refers to the streams subsystem label May 27, 2021
@fffonion
Copy link
Contributor

fffonion commented Jun 2, 2021

Maybe we can just remove the limit here. How you do you think @javierguerragiraldez ?

@joelsdc
Copy link
Author

joelsdc commented Jun 8, 2021

@fffonion / @javierguerragiraldez can I try to comment out:

  if #res > MAX_DATA_LEN then
    error(st_format(
      "stream_api handler %q response is %d bytes.  Only %d bytes is supported",
      key, #res, MAX_DATA_LEN))
  end

from kong/tools/stream_api.lua and test? Would this be valid or are there more places to touch? I'm just guessing here, first time looking into kong's internals :D

@joelsdc
Copy link
Author

joelsdc commented Jun 14, 2021

Hey guys, before I start messing with custom/patched kong builds etc I'd like to confirm that what I'm proposing to test is actually the correct test to do.

@fffonion / @javierguerragiraldez opinions?

@javierguerragiraldez
Copy link
Contributor

just removing the limit wouldn't help, since the internal nginx structures are preallocated buffers with explicit limts too (i think it was 8192, so instead of making it something like 8186 (8 bytes header), i found better to round down to 8000 for payload).
for arbitrary large payloads, it would be necessary to split and rejoin it.

@joelsdc
Copy link
Author

joelsdc commented Jun 21, 2021

Well, I've been trying to workaround this with no success. I tried removing the "global" prometheus plugin config from my yaml file, and added "per service" prometheus plugin config on all HTTP services and skipped STREAM services, I thought this would be a valid workaround (effectively only disabling the prom plugin for my UDP stream service), but I get the same errors:

==> kong_status_error.log <==
2021/06/21 20:42:06 [error] 27#0: *24667 [kong] exporter.lua:351 failed to collect stream metrics: retrieving stream-api response: timeout, client: 10.150.0.78, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.0.42.34:8100"

and:

==> kong_admin_error.log <==
2021/06/21 20:42:06 [error] 28#0: *28042 lua entry thread aborted: runtime error: /usr/local/share/lua/5.1/kong/tools/stream_api.lua:109: stream_api handler "prometheus" response is 25354 bytes.  Only 8000 bytes is supported
stack traceback:
coroutine 0:
	[C]: in function 'error'
	/usr/local/share/lua/5.1/kong/tools/stream_api.lua:109: in function 'handle'
	/usr/local/share/lua/5.1/kong/init.lua:1467: in function 'stream_api'
	content_by_lua(nginx-kong-stream.conf:116):2: in main chunk, udp client: unix:, server: unix:/usr/local/kong/stream_rpc.sock

Now I'm stuck. Any help is greatly appreciated as I'm not really sure how to split/rejoin replies in kong's
RPC interface between the http and stream subsystem plugins.

@flrgh
Copy link
Contributor

flrgh commented Jul 14, 2021

I ran into the same issue. It's exacerbated by the fact that the stream API server doesn't send any response in this condition, causing the client to wait around until socket:receive() eventually times out. With a default socket read timeout of 60s in our environment, the downstream client (the prometheus scraper/collector calling GET /metrics) just eventually gives up entirely.

For the time being I'm running Kong with this patch that ensures the server always sends a response (even when the data returned from the handler is invalid/too big) and sets a low (5s) timeout for socket:receive() just in case anything blows up catastrophically on the server side. I chose to remove the error() calls since the client will wind up logging the error string anyways.

diff --git a/kong/tools/stream_api.lua b/kong/tools/stream_api.lua
index db2a4fab9..fe834ceb0 100644
--- a/kong/tools/stream_api.lua
+++ b/kong/tools/stream_api.lua
@@ -56,6 +56,7 @@ function stream_api.request(key, data, socket_path)
     return nil, "sending stream-api request: " .. tostring(err)
   end

+  socket:settimeout(5000)
   data, err = socket:receive()
   if not data then
     socket:close()
@@ -102,13 +103,13 @@ function stream_api.handle()
   end

   if type(res) ~= "string" then
-    error(st_format("stream_api handler %q response is not a string", key))
+    assert(socket:send(st_pack("=SP", 2, "handler returned invalid data")))
+    return
   end

   if #res > MAX_DATA_LEN then
-    error(st_format(
-      "stream_api handler %q response is %d bytes.  Only %d bytes is supported",
-      key, #res, MAX_DATA_LEN))
+    assert(socket:send(st_pack("=SP", 2, st_format("response size (%s) exceeds max of %s", #res, MAX_DATA_LEN))))
+    return
   end

   assert(socket:send(st_pack("=SP", 0, res)))

The side effect is that stream metrics collection is still broken in all of my environments, but at least it doesn't break http metrics along with it.

Chunking responses of >8000B into multiple packets seems like it'd be quite a headache to implement in UDP. Maybe this API should be reworked to use TCP instead? Sure it's more overhead, but I doubt it'd be noticeable unless the rate of requests to /metrics is insanely high.

@jasine
Copy link

jasine commented Aug 26, 2021

same issue

@jnetzel-onv
Copy link

Hello,
we have also still the same issue in kong v2.5.0

@jasine
Copy link

jasine commented Sep 19, 2021

we have also still the same issue in kong v2.5.1

@Murphy-hub
Copy link
Contributor

same issue

@Murphy-hub
Copy link
Contributor

@joelsdc Have you solved it? What is the solution

@joelsdc
Copy link
Author

joelsdc commented Nov 22, 2021

Hey @zhangshuaiNB, unfortunately not. As a work around I'm not using Kong for non-HTTP services until this issue is solved. @flrgh posted further up a patch to partially work around this but I don't think it's a long-term solution.

Maybe @javierguerragiraldez or @fffonion can jump in and give us another update on the status of this issue?

@Murphy-hub
Copy link
Contributor

@joelsdc Thank you for your reply. Come on, irons

@cronventis
Copy link

Is there any update on this?

@mayocream mayocream added task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not. plugins/prometheus labels Jan 7, 2022
@hanfi
Copy link

hanfi commented Jan 7, 2022

still have the same issue, just updated to 2.7.0

@xiupengrong
Copy link

still have the same issue, updated to 2.8.1

@esatterwhite
Copy link

@flrgh I'm seeing this exact error on 2.8.1
I don't have any tcp or udp services. only http services

@flrgh
Copy link
Contributor

flrgh commented Jul 7, 2022

Hi @xiupengrong and @esatterwhite. This change has been merged into master and will be in the next major release:

https://github.com/Kong/kong/blob/master/CHANGELOG.md#unreleased

kong/CHANGELOG.md

Lines 338 to 339 in 5d721ac

- The private stream API has been rewritten to allow for larger message payloads
[#8641](https://github.com/Kong/kong/pull/8641)


@esatterwhite if you do not have any tcp or udp services, you can stop the prometheus plugin from using the stream API by setting stream_listen = off in your kong.conf file (or in env var form, KONG_STREAM_LISTEN=off). If this doesn't work for you, please file a new bug, and we'll take a look.

@cybernagle
Copy link

if we are using tcp ingress, is there alternative way to reduce prometheus payload or disable prometheus?

@icebob
Copy link

icebob commented Oct 16, 2022

Same issue here with TCP ingress.

@joelsdc
Copy link
Author

joelsdc commented Oct 16, 2022

@NagleZhang / @icebob I'm working on updating to Kong v3 myself as it has the changes that are supposed to fix this.

Are you guys on 2.X or 3.X? If you are on 3.X and the problem persists it might be worth opening a new ticket with new data.

@icebob
Copy link

icebob commented Oct 16, 2022

I'm on 2.8 currently. I didn't try it on 3.x

@joelsdc
Copy link
Author

joelsdc commented Oct 16, 2022

If you have an easy path to v3 I would give it a try. Check the release notes as the config format has some changes.

I will report back once I test either way just for anyone else reaching this ticket :-)

@zhangzerui20
Copy link

@flrgh same issue in kong 2.4.1 and 2.5.1, can this change be merged into 2.4.1 and 2.5.1 ?

any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core/streams Refers to the streams subsystem plugins/prometheus task/needs-investigation Requires investigation and reproduction before classifying it as a bug or not.
Projects
None yet
Development

Successfully merging a pull request may close this issue.