Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS handshake timeout to AWS timestream with IAM Role #15874

Closed
choseh opened this issue Sep 12, 2024 · 2 comments
Closed

TLS handshake timeout to AWS timestream with IAM Role #15874

choseh opened this issue Sep 12, 2024 · 2 comments
Labels
bug unexpected problem or unintended behavior

Comments

@choseh
Copy link

choseh commented Sep 12, 2024

Relevant telegraf.conf

[global_tags]
[agent]
  interval = "60s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = true
[[inputs.cpu]]
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = true
[[inputs.kernel]]
  interval = "5m"
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "overlay", "aufs", "squashfs", "vfat"]
[[inputs.mem]]
[[inputs.netstat]]
[[inputs.processes]]
[[inputs.swap]]
  fielddrop = ["in","out"]
[[inputs.system]]
  fielddrop = ["uptime","uptime_format"]
[[processors.aws_ec2]]
  imds_tags = ["instanceId", "imageId", "instanceType"]
  ec2_tags = ["Name"]
  timeout = "10s"
  ordered = false
  max_parallel_calls = 10
  tag_cache_size = 1000
[[outputs.timestream]]
  region = "eu-central-1"
  database_name = "telegraf"
  describe_database_on_start = true
  mapping_mode = "multi-table"
  create_table_if_not_exists = true
  create_table_magnetic_store_retention_period_in_days = 365
  create_table_memory_store_retention_period_in_hours = 24
  use_multi_measure_records=true
  measure_name_for_multi_measure_records = "t"

Logs from Telegraf

Sep 12 05:36:06 hostname systemd[1]: Starting telegraf.service - Telegraf...
Sep 12 05:36:07 hostname telegraf[6909]: time="2024-09-12T05:36:07Z" level=warning msg="DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null." func="gosnowflake.(*defaultLogger).Warn" file="log.go:244"
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loading config: /etc/telegraf/telegraf.conf
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.swap" deprecated since version 1.29.0 and will be removed in 1.40.0: use 'fieldexclude' instead
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z W! DeprecationWarning: Option "fielddrop" of plugin "inputs.system" deprecated since version 1.29.0 and will be removed in 1.40.0: use 'fieldexclude' instead
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loading config: /etc/telegraf/telegraf.d/config.conf          
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Starting Telegraf 1.32.0 brought to you by InfluxData the makers of InfluxDB
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Available plugins: 235 inputs, 9 aggregators, 32 processors, 26 parsers, 62 outputs, 6 secret-stores
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loaded inputs: cpu disk exec (6x) kernel mem netstat processes swap system
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loaded aggregators:
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loaded processors: aws_ec2
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loaded secretstores:
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Loaded outputs: timestream
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! Tags enabled:
Sep 12 05:36:07 hostname systemd[1]: Started telegraf.service - Telegraf.
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"", Flush Interval:10s
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! [outputs.timestream] Constructing Timestream client for "multi-table" mode
Sep 12 05:36:07 hostname telegraf[6909]: 2024-09-12T05:36:07Z I! [outputs.timestream] Describing database "telegraf" in region "eu-central-1"
Sep 12 05:36:39 hostname telegraf[6909]: 2024-09-12T05:36:39Z E! [outputs.timestream] Couldn't describe database "telegraf". Check error, fix permissions, connectivity, create database.
Sep 12 05:36:39 hostname telegraf[6909]: 2024-09-12T05:36:39Z E! [agent] Failed to connect to [outputs.timestream], retrying in 15s, error was "operation error Timestream Write: DescribeDatabase, operation error Timestream Write: DescribeEndpoints, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post \"https://ingest.timestream.eu-central-1.amazonaws.com/\": net/http: TLS handshake timeout"
Sep 12 05:36:54 hostname telegraf[6909]: 2024-09-12T05:36:54Z I! [outputs.timestream] Constructing Timestream client for "multi-table" mode
Sep 12 05:36:54 hostname telegraf[6909]: 2024-09-12T05:36:54Z I! [outputs.timestream] Describing database "telegraf" in region "eu-central-1"
Sep 12 05:37:28 hostname telegraf[6909]: 2024-09-12T05:37:28Z E! [outputs.timestream] Couldn't describe database "telegraf". Check error, fix permissions, connectivity, create database.
Sep 12 05:37:28 hostname telegraf[6909]: 2024-09-12T05:37:28Z E! [telegraf] Error running agent: connecting output outputs.timestream: error connecting to output "outputs.timestream": operation error Timestream Write: DescribeDatabase, operation error Timestream Write: DescribeEndpoints, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://ingest.timestream.eu-central-1.amazonaws.com/": net/http: TLS handshake timeout
Sep 12 05:37:28 hostname systemd[1]: telegraf.service: Main process exited, code=exited, status=1/FAILURE
Sep 12 05:37:28 hostname systemd[1]: telegraf.service: Failed with result 'exit-code'.

System info

Telegraf 1.32.0

Docker

No response

Steps to reproduce

  1. start with config
  2. check logs

...

Expected behavior

connections to timestream possible

Actual behavior

telegraf not connecting to timestream

Additional info

we're using IAM roles to permit access to the timestream database
v 1.31 works

@choseh choseh added the bug unexpected problem or unintended behavior label Sep 12, 2024
@choseh
Copy link
Author

choseh commented Sep 16, 2024

Apparently related to SNI and Network Firewall, 1.32 might have changed something in the request that's no longer sending the necessary information (?)

@choseh
Copy link
Author

choseh commented Sep 16, 2024

we found it.
hashicorp/terraform-provider-aws#39311 (similar issue, but basically golang in combination with network firewall)
have to set GODEBUG=tlskyber=0 to make it work again. So actually it's a golang issue.

@choseh choseh closed this as completed Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant