Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf Generating Orphaned DBus Processes on RHEL Servers #2 #13635

Open
elangovanseshan opened this issue Jul 17, 2023 · 26 comments
Open

Telegraf Generating Orphaned DBus Processes on RHEL Servers #2 #13635

elangovanseshan opened this issue Jul 17, 2023 · 26 comments
Labels
bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes

Comments

@elangovanseshan
Copy link

elangovanseshan commented Jul 17, 2023

Relevant telegraf.conf

#Telegraf Configuration
#
# Telegraf is entirely plugin driven. All metrics are gathered from the
# declared inputs, and sent to the declared outputs.
#
# Plugins must be declared in here to be active.
# To deactivate a plugin, comment out the name and any variables.
#
# Use 'telegraf -config telegraf.conf -test' to see what metrics a config
# file would generate.
#
# Environment variables can be used anywhere in this config file, simply prepend
# them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),
# for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)


# Global tags can be specified here in key="value" format.
[global_tags]
  env = "Production_Linux"
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"


# Configuration for telegraf agent
[agent]
  ## Default data collection interval for all inputs
  interval = "1m"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "10s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = false
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_interval = "24h"
  logfile_rotation_max_archives = 2
  logfile_rotation_max_size = "50MB"
  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false


###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################


# # # Send telegraf metrics to file(s)
# [[outputs.file]]
#   ## Files to write to, "stdout" is a specially handled file.
#   files = ["/var/log/telegraf/telegraf.out"]

#   ## Data format to output.
#   ## Each data format has its own unique set of configuration options, read
#   ## more about them here:
#   ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_OUTPUT.md
#   data_format = "influx"


# # Configuration for Wavefront server to send metrics to
[[outputs.wavefront]]
#   ## DNS name of the wavefront proxy server
#   host = "wavefront.example.com"

url = "http://metrics****************:2878"
#
#   ## Port that the Wavefront proxy server listens on
#   port = 2878
#port = 2878
convert_paths = false
namepass = ["prod.*","qa.*","dev.*"]





#----------------------------------
#Linux Input Plugins 
#---------------------------------

###############################################################################################
#                              Linux Input Plugins                                            #
###############################################################################################

## NETWORK METRICS
[[inputs.net]]
  name_prefix = "prod.metrics."
  interval = "15m"
  ignore_protocol_stats = true

## CPU METRICS
[[inputs.cpu]]
  name_prefix = "prod.metrics."
  interval = "10m"
  percpu = false
  totalcpu = true
  collect_cpu_time = false
  report_active = false

## DISK METRICS
[[inputs.disk]]
  name_prefix = "prod.metrics."
  interval = "30m"

[[inputs.diskio]]
  name_prefix = "prod.metrics."
  interval = "5m"

## SYSTEM METRICS
[[inputs.system]]
  name_prefix = "prod.metrics."
  interval = "10m"

## MEMORY METRICS
 [[inputs.mem]]
   name_prefix = "prod.metrics."
   interval = "5m"
   fieldpass = ["active",
                "available",
                "buffered",
                "cached",
                "free",
                "inactive",
                "slab",
                "used",
                "available_percent",
                "used_percent",
                "wired",
                "commit_limit",
                "committed_as",
                "dirty",
                "high_free",
                "huge_pages_free",
                "low_free",
                "mapped",
                "page_tables",
                "shared",
                "swap_cached",
                "swap_free",
                "vmalloc_chunk",
                "vmalloc_used",
                "write_back",
                "write_back_tmp"]

[[inputs.mem]]
  name_prefix = "prod.metrics."
  interval = "60m"
  fieldpass = ["total","high_total","huge_page_size","huge_pages_total","low_total","swap_total","vmalloc_total"]

## SWAP METRICS
[[inputs.swap]]
  name_prefix = "prod.metrics."
  interval = "30m"
  fieldpass = ["free", "total","used", "used_percent"]

[[inputs.swap]]
  name_prefix = "prod.metrics."
  interval = "5m"
  fieldpass = ["in", "out"]

## TELEGRAF INTERNAL METRICS
[[inputs.internal]]
  interval = "60m"
  name_prefix = "prod.metrics."
  namepass = ["internal_gather*"]
    [inputs.internal.tagpass]
      input = ["internal"]

Logs from Telegraf

2023-07-17T15:26:21Z I! Loading config: /etc/telegraf/telegraf.d/monitor.conf
2023-07-17T15:26:21Z I! Starting Telegraf 1.27.2
2023-07-17T15:26:21Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-07-17T15:26:21Z I! Loaded inputs: cpu disk diskio exec (2x) internal mem (2x) net swap (2x) system
2023-07-17T15:26:21Z I! Loaded aggregators:
2023-07-17T15:26:21Z I! Loaded processors:
2023-07-17T15:26:21Z I! Loaded secretstores:
2023-07-17T15:26:21Z I! Loaded outputs: wavefront
2023-07-17T15:26:21Z I! Tags enabled: env=Production_Linux host=stuxsh03
2023-07-17T15:26:21Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"****", Flush Interval:10s
2023-07-17T15:27:53Z I! [agent] Hang on, flushing any cached metrics before shutdown
2023-07-17T15:27:53Z I! [agent] Stopping running outputs
2023-07-17T15:27:58Z I! Loading config: /etc/telegraf/telegraf.conf
2023-07-17T15:27:58Z I! Loading config: /etc/telegraf/telegraf.d/compute_services.conf
2023-07-17T15:27:58Z I! Loading config: /etc/telegraf/telegraf.d/monitor.conf
2023-07-17T15:27:58Z I! Starting Telegraf 1.27.2
2023-07-17T15:27:58Z I! Available plugins: 237 inputs, 9 aggregators, 28 processors, 23 parsers, 59 outputs, 4 secret-stores
2023-07-17T15:27:58Z I! Loaded inputs: cpu disk diskio exec (2x) internal mem (2x) net swap (2x) system
2023-07-17T15:27:58Z I! Loaded aggregators:
2023-07-17T15:27:58Z I! Loaded processors:
2023-07-17T15:27:58Z I! Loaded secretstores:
2023-07-17T15:27:58Z I! Loaded outputs: wavefront
2023-07-17T15:27:58Z I! Tags enabled: env=Production_Linux host**************
2023-07-17T15:27:58Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"***********", Flush Interval:10s
2023-07-17T15:27:58Z D! [agent] Initializing plugins
2023-07-17T15:27:58Z D! [agent] Connecting outputs
2023-07-17T15:27:58Z D! [agent] Attempting connection to [outputs.wavefront]
2023-07-17T15:27:58Z D! [outputs.wavefront] connecting over http/https using Url: ******************:2878
2023-07-17T15:27:58Z D! [agent] Successfully connected to outputs.wavefront
2023-07-17T15:27:58Z D! [agent] Starting service inputs
2023-07-17T15:28:14Z D! [outputs.wavefront] Flushing batch of 1 points
2023-07-17T15:28:14Z D! [outputs.wavefront] Wrote batch of 1 metrics in 63.547432ms
2023-07-17T15:28:14Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:28:30Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:28:46Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:03Z D! [outputs.wavefront] Flushing batch of 1 points
2023-07-17T15:29:03Z D! [outputs.wavefront] Wrote batch of 1 metrics in 33.405965ms
2023-07-17T15:29:03Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:16Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:31Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:29:47Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/named/chroot/etc/named"): permission denied
2023-07-17T15:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/named/chroot/var/named"): permission denied
2023-07-17T15:30:00Z D! [inputs.disk] [SystemPS] => unable to get disk usage ("/var/named/chroot/usr/lib64/bind"): permission denied
2023-07-17T15:30:04Z D! [outputs.wavefront] Error building tags: unexpected type: string, with value: 1 day,  2:37, for: prod.metrics.system.uptime_format
2023-07-17T15:30:04Z D! [outputs.wavefront] Flushing batch of 99 points
2023-07-17T15:30:04Z D! [outputs.wavefront] Wrote batch of 99 metrics in 69.526316ms
2023-07-17T15:30:04Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:14Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:24Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:36Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:30:48Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:31:07Z D! [outputs.wavefront] Flushing batch of 1 points
2023-07-17T15:31:07Z D! [outputs.wavefront] Wrote batch of 1 metrics in 33.711449ms
2023-07-17T15:31:07Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:31:21Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
2023-07-17T15:31:35Z D! [outputs.wavefront] Buffer fullness: 0 / 10000 metrics
stuxsh03.st9793:/root#

System info

telegraf-1.27.2 it's running in OS Linux 2.6.32-754.50.1.el6.x86_64

Docker

No response

Steps to reproduce

Reproducing has been tricky as it doesn't always appear to occur, but on systems that were impacted (hundreds+) reverting Telegraf to an earlier version, stopping the Telegraf service and removing the orphaned process, or performing the below actions resolved the issue.

What we have seen:
Upgrading the Telegraf version 1.14 to 1.25.2 on RHEL servers seems to create an issue where DBus generates many orphaned processes. This eventually causes the system to hit the ceiling of available PIDs. Rolling back to 1.14 seems to clear the problem.

Example from one of our systems:

ps -ef|grep dbus|grep -v grep|wc -l
1459

Based on the issue #13481 it was resolved in recent release telegraf-1.27.2 but we are experiencing the same issue with recent release aswell

Expected behavior

Telegraf works as expected.

Actual behavior

Telegraf inadvertantly creates thousands of orphaned DBus processes which eventually causes the available PID's to hit the maximum ceiling, which causes system degradation.

Additional info

No response

Tasks

No tasks being tracked yet.
@elangovanseshan elangovanseshan added the bug unexpected problem or unintended behavior label Jul 17, 2023
@crflanigan
Copy link
Contributor

@powersj
New issue created

@crflanigan
Copy link
Contributor

crflanigan commented Jul 17, 2023

As an aside, it looks like so far this issue appears to be absent from 1.24.2.

@elangovanseshan
Copy link
Author

elangovanseshan commented Jul 17, 2023

here are the dbus details which we are seeing it in server

root       336     1  0 07:03 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       353     1  0 06:16 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       372     1  0 03:43 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       385     1  0 08:38 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       426     1  0 07:03 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       427     1  0 05:21 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       458     1  0 08:39 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       495     1  0 09:25 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       569     1  0 04:30 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       602     1  0 07:04 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       643     1  0 08:39 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       701     1  0 09:26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       823     1  0 09:26 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       840     1  0 06:17 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       884     1  0 04:31 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       896     1  0 02:47 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       936     1  0 05:21 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       940     1  0 07:04 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       951     1  0 08:40 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root       982     1  0 09:27 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1020     1  0 03:44 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1068     1  0 05:22 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1111     1  0 07:55 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1116     1  0 04:31 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1210     1  0 07:55 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1230     1  0 03:44 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1305     1  0 03:45 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1429     1  0 02:47 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1518     1  0 03:45 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1736     1  0 03:46 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1756     1  0 07:56 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1815     1  0 09:27 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1874     1  0 07:56 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1887     1  0 09:28 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1937     1  0 04:32 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      1963     1  0 03:46 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2022     1  0 08:41 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2029     1  0 04:32 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2044     1  0 09:28 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2112     1  0 07:57 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2118     1  0 09:29 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2154     1  0 05:22 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2156     1  0 04:33 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2184     1  0 09:29 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2211     1  0 02:48 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2242     1  0 07:57 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2253     1  0 08:41 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2272     1  0 04:33 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2298     1  0 05:23 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2330     1  0 09:30 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2353     1  0 07:58 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2365     1  0 04:34 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2408     1  0 07:58 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2418     1  0 07:05 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2461     1  0 05:23 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2480     1  0 07:59 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2602     1  0 07:05 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2623     1  0 04:34 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2643     1  0 07:59 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2678     1  0 03:47 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2681     1  0 08:42 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2690     1  0 05:24 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2704     1  0 04:35 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2748     1  0 04:35 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root      2788     1  0 07:06 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session 

@elangovanseshan
Copy link
Author

elangovanseshan commented Jul 17, 2023

telegraf  5850     1  0 06:54 ?        00:00:02 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf telegraf.d
telegraf  5866     1  0 06:54 ?        00:00:00 /usr/bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     11702 24282  0 07:34 pts/0    00:00:00 grep --color=auto -i telegraf 

@crflanigan
Copy link
Contributor

Is there a way to disable the secret store completely? We don't use it and some component related to it seems to be causing the issues.

@powersj
Copy link
Contributor

powersj commented Jul 17, 2023

Thanks for the issue and logs. Are you seeing this across RHEL 6, 7, and 8 this time? Or only RHEL 6? I have got a RHEL 7 VM up looping over telegraf with --once to see if I can see multiple dbus-daemon's starting. I am over 10k loops and nothing showing up yet.

Is there a way to disable the secret store completely?

Only with a custom build of Telegraf.

Assuming that the issue is with the same code of the secret store as last time, that dbus command runs in the init function of that library. Which means the function is run as soon as the library is imported, before we have any time to do anything else.

@elangovanseshan
Copy link
Author

Thanks Joshua for your reply ,I could see this issue only from RHEL6 and i have the latest version deployed in RHEL7/8 and there i don't see any issue with DBUS.

ps -ef|grep -i telegraf
telegraf 14007     1 10 09:49 ?        00:00:01 /usr/bin/telegraf -pidfile /var/run/telegraf/telegraf.pid -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d
telegraf 14048     1  0 09:49 ?        00:00:00 /bin/dbus-daemon --fork --print-pid 4 --print-address 6 --session
root     14366  7614  0 09:49 pts/0    00:00:00 grep -i telegraf

@powersj
Copy link
Contributor

powersj commented Jul 17, 2023

@crflanigan, @elangovanseshan,

If you must have the newer version of Telegraf on RHEL 6, my suggestion then is to consider building telegraf with the custom builder. The result would provide you with a ~23Mb binary containing only the plugins you need and the secret store plugins would not be present.

git clone https://github.com/influxdata/telegraf
cd telegraf
go build -o ./tools/custom_builder/custom_builder ./tools/custom_builder
./tools/custom_builder/custom_builder --config <conf_file> --config-dir <conf_dir>

Would this be an option for you?

@powersj powersj added the waiting for response waiting for response from contributor label Jul 17, 2023
@crflanigan
Copy link
Contributor

crflanigan commented Jul 17, 2023

Hi @powersj,

We can look at that.
I had thought that the Secret Store is core to Telegraf irrespective of the configuration you use post release 1.25, is that right?

Is Telegraf not supported on RHEL 6, if so, when was the last release where it was supported?

Thanks!

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jul 17, 2023
@elangovanseshan
Copy link
Author

Thank you @powersj ,Let me try custom builder without secret store plugins

@powersj
Copy link
Contributor

powersj commented Jul 17, 2023

I had thought that the Secret Store is core to Telegraf irrespective of the configuration you use post release 1.25, is that right?

Internally to Telegraf, the secret stores are treated like the other plugins, so that you could build telegraf without it.

Is Telegraf not supported on RHEL 6, if so, when was the last release where it was supported?

We have a published doc for supported platforms, which essentially says we support OSes that are under standard support. In line with that, RHEL 6 stopped being supported at the end of 2020. RHEL 7 will stop next June 2024.

While we will not go out of our way to break any previous releases, if we do make a change that breaks them we are less inclined to revert it nor will we continue to test it.

@powersj powersj added the waiting for response waiting for response from contributor label Jul 17, 2023
@crflanigan
Copy link
Contributor

Ok @powersj ,

It sounds like patching this issue is unlikely since it's occuring on an unsupported OS, is that right?

Thanks!

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Jul 17, 2023
@powersj
Copy link
Contributor

powersj commented Jul 18, 2023

It sounds like patching this issue is unlikely since it's occuring on an unsupported OS, is that right?

If you proposed a PR or an idea to get around this we would certainly consider it. We are not going to completely close the door on a fix.

@crflanigan
Copy link
Contributor

@powersj,

Fair enough, thanks buddy!

@elangovanseshan
Copy link
Author

elangovanseshan commented Jul 18, 2023

@crflanigan, @elangovanseshan,

If you must have the newer version of Telegraf on RHEL 6, my suggestion then is to consider building telegraf with the custom builder. The result would provide you with a ~23Mb binary containing only the plugins you need and the secret store plugins would not be present.

git clone https://github.com/influxdata/telegraf
cd telegraf
go build -o ./tools/custom_builder/custom_builder ./tools/custom_builder
./tools/custom_builder/custom_builder --config <conf_file> --config-dir <conf_dir>

Would this be an option for you?

Thanks ! @powersj custom_builder is working fine for me. I passed the sample conf file to build the binary ,it contain the cpu disk diskio exec mem net swap system input plugins and it's working fine. We have multiple internal teams are using multiple input plugins other than i mentioned above so if we build the binary with limited input plugins, it will affect other internal customers, So we would like to build the custom binary with all input plugins but except secret store plugins . Is there any possible way to build it without passing conf file for each input plugins or can we build with dummy conf files without secret store plugins?

@powersj
Copy link
Contributor

powersj commented Jul 18, 2023

So now i would like to build the custom binary with all input plugins but except secret store plugins

You can get a list of all the input plugins by generating the default config and grep'ing out all the input headers:

make
./telegraf config > default.toml
grep "^# \[\[inputs.*\]\]" default.toml | cut -d' ' -f2 | sort | uniq

You could then add that to your example config or pass that as a second file to the custom builder.

You could also use the various build tags to build telegraf as the customization docs show using BUILDTAGS:

BUILDTAGS="custom,aggregators,inputs,outputs,parsers,processors,serializers" make

If you do start to go this route, please ensure you include everything you actually need ;) It is easy to forget or not realize you are using a serializer for example. This is why I like the custom builder + an actual config better.

@elangovanseshan
Copy link
Author

@powersj our initial testing is working fine with custom Telegraf with limited input and output plugin and no evidence of dbus process .

also i would like to know that how can we add the serializers to custom build? I added the required input,output,aggregators,processors through the example conf but not sure about serializers .

Do we need to pass it through conf file or do we have any other option?

@powersj
Copy link
Contributor

powersj commented Jul 27, 2023

Do we need to pass it through conf file or do we have any other option?

You can reference any of the serializers the same way. For example, if you want only the JSON serialier you can add serializers.json to the build tags.

The way to determine these build tags is to look in each plugin's all folder and look at the build tags at the top of a file. This is the JSON all file and you can see that the JSON serializer is imported if this is not a custom build, if a user specifies serializers, which pulls in all serializers, or if they specify serializers.json.

Does that help?

@elangovanseshan
Copy link
Author

Thank you @powersj let me try this out

one more thing for your information, initially i updated like dbus issue happening only in RHEL6 servers but we had an issue with RHEL7/8 as well .

So we are planning to go with custom telegraf with limited plugins .

@powersj
Copy link
Contributor

powersj commented Mar 19, 2024

@elangovanseshan, @crflanigan,

but we had an issue with RHEL7/8 as well .

Sorry I never responded to this. Looking at the mentioned gosnowflake issue it looks like a workaround is setting DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus in the environment as well.

For Telegraf, I am inclined to document this and link to the still open upstream issue. Thoughts?

@powersj powersj added the waiting for response waiting for response from contributor label Mar 19, 2024
@crflanigan
Copy link
Contributor

Hi @powersj,

Sorry for the delayed response.

I actually commented on one of these issues for keyring and got a notification this morning that they may have resolved it? Seems like a lot of people use this library.

99designs/keyring#103

What do you think?

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 28, 2024
@powersj
Copy link
Contributor

powersj commented Mar 29, 2024

Hey @crflanigan,

Did someone delete their comment? Latest I see is from Apr 12, 2023.

@powersj powersj added the upstream bug or issues that rely on dependency fixes label May 10, 2024
@Hipska
Copy link
Contributor

Hipska commented May 31, 2024

@powersj I think @crflanigan was referring to snowflakedb/gosnowflake#773 (comment)

BTW, I now have that message even when not using outputs.sql at all..

WARN[0000]log.go:244 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null.
2024-05-31T14:00:14Z I! Loading config: test.toml
2024-05-31T14:00:14Z I! Starting Telegraf 1.31.0-35bff98f brought to you by InfluxData the makers of InfluxDB
2024-05-31T14:00:14Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-05-31T14:00:14Z I! Loaded inputs: snmp
2024-05-31T14:00:14Z I! Loaded aggregators:
2024-05-31T14:00:14Z I! Loaded processors:
2024-05-31T14:00:14Z I! Loaded secretstores:
2024-05-31T14:00:14Z W! Outputs are not used in testing mode!
2024-05-31T14:00:14Z I! Tags enabled:

This does not happen with telegraf 1.30.3

@trauta
Copy link

trauta commented Jun 13, 2024

After upgrading telegraf to 1.31.0 all of our hosts seem to report the Warn DBUS_SESSION_BUS_ADDRESS log message. Here is an example log output from Telegraf 1.31.0 on a Ubuntu 24.04 hosts:

Jun 13 11:05:22 host.example.com systemd[1]: Starting telegraf.service - Telegraf...
Jun 13 11:05:22 host.example.com (telegraf)[495164]: telegraf.service: Referenced but unset environment variable evaluates to an empty string: TELEGRAF_OPTS
Jun 13 11:05:22 host.example.com telegraf[495164]: time="2024-06-13T11:05:22+02:00" level=warning msg="DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null." func="gosnowflake.(*defaultLogger).Warn" file="log.go:244"
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Loading config: /etc/telegraf/telegraf.conf
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Starting Telegraf 1.31.0 brought to you by InfluxData the makers of InfluxDB
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Loaded inputs: cpu disk diskio kernel mem net processes swap system
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Loaded aggregators:
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Loaded processors:
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Loaded secretstores:
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Loaded outputs: graphite
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! Tags enabled: host=host
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:"host", Flush Interval:10s
Jun 13 11:05:22 host.example.com telegraf[495164]: 2024-06-13T09:05:22Z W! DeprecationWarning: Value "false" for option "ignore_protocol_stats" of plugin "inputs.net" deprecated since version 1.27.3 and will be removed in 1.36.0: use the 'inputs.nstat' plugin instead for protocol stats
Jun 13 11:05:22 host.example.com systemd[1]: Started telegraf.service - Telegraf.

@Hipska
Copy link
Contributor

Hipska commented Jun 13, 2024

@trauta Indeed, also seems to happen on RHEL and warned maintainers about it already 2 weeks ago: https://influxcommunity.slack.com/archives/C019JDRJAE7/p1717146621896149

@powersj
Copy link
Contributor

powersj commented Jun 17, 2024

When this was only on RHEL 6/7/8 I was less concerned about this especially given the upcoming EOL date. However, this is also appearing on newer releases as well (e.g. Ubuntu Noble).

The root cause is from the keyring dependency. We use this for secret store, but it appears the snowflake library we use also does. The keyring library has not been updated and does not appear to be planned to update anytime soon. Even if we moved to a fork, the snowflake library also uses it. I haven't played with the go replace enough, but it may be possible to use it?

The warning message from snowflake does tell you what to do to get rid of the message, so I don't consider this critical to fix, but it is something we are looking to figure out how to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes
Projects
None yet
Development

No branches or pull requests

5 participants