Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf not writing data continously, needs to be restarted to write kafka data. #9856

Closed
pratikdas44 opened this issue Oct 4, 2021 · 7 comments
Labels
area/kafka bug unexpected problem or unintended behavior

Comments

@pratikdas44
Copy link

Relevant telegraf.conf:

  brokers = ["${IIT_BROKER_ONE}","${IIT_BROKER_TWO}"]
  topics = ["${BROKER_TOPIC}"]
  offset = "newest"
  balance_strategy = "roundrobin"
  max_message_len = 1000000
  max_undelivered_messages = 1000
  #consumer_group = "iit_metrics_consumers"
  ## Data format to consume.
  data_format = "json"
  json_time_key = "beginTime"
  json_time_format = "unix"
  tag_keys = [
   "swVersion",
   "senderType",
   "VNFID",
   "RUMAC",
   "GNBNAME",
   "cluster_id"
  ]
  json_string_fields = ["counters_*"]
  name_override = "${Measurement}"
  interval = "3s"
  [inputs.kafka_consumer.tags]
    setup = "${SETUP_TAG_ONE}"

[[inputs.kafka_consumer]]
  brokers = ["${SVT_BROKER_ONE}","${SVT_BROKER_TWO}"]
  topics = ["Onecell_PM_Data_Stream"]
  offset = "newest"
  balance_strategy = "roundrobin"
  #consumer_group = "svt_metrics_consumers"
  max_message_len = 1000000
  max_undelivered_messages = 1000
  ## Data format to consume.
  data_format = "json"
  json_time_key = "beginTime"
  json_time_format = "unix"
  tag_keys = [
   "swVersion",
   "senderType",
   "VNFID",
   "RUMAC",
   "GNBNAME",
   "cluster_id"
  ]
  json_string_fields = ["counters_*"]
  name_override = "${Measurement}"
  interval = "6s"
  [inputs.kafka_consumer.tags]
    setup = "${SETUP_TAG_TWO}"

System info:

Docker Telegraf version - 1.18.0; Os- ec2 instance (type - t2.large)

Steps to reproduce:

  1. ...Telegraf not writing data continuously with above configuration.

Expected behavior:

Telegraf not writing data continuously with above configuration.

Actual behavior:

Telegraf should be writing data continously.

Additional info:

The kafka logs are coming every 5 mins. I also tried with offset oldest but since data comes in bulk, most of metrics get dropped, hence opted for offset newest. Also please explain how this offset with value "newest" and "oldest" behave, does offset gets auto committed after some time.

@pratikdas44 pratikdas44 added the bug unexpected problem or unintended behavior label Oct 4, 2021
@MyaLongmire
Copy link
Contributor

Can you please upload your logs as this will give us a better idea of what is going on?

@pratikdas44
Copy link
Author

Most of time i get this only:
image

But whenever data is written
image

And no other message comes up; like failed to push or anything else. Can you also comment on offset part; how the offset with value oldest and newest behaves.

Thanks

@powersj
Copy link
Contributor

powersj commented Oct 6, 2021

Can you provide the full logs, please? When it comes to helping with sending metrics it is good to see what is going on between writes of batches. Also do you have any agent configuration options set? Thanks!

@pratikdas44
Copy link
Author

pratikdas44 commented Oct 7, 2021

Yeah i have agent configuration set.

[agent]
metric_batch_size = 200000
metric_buffer_limit = 200000
debug = true
omit_hostname = true

Logs

2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-2.iitdms-cs-msk.wktmib.c3.kafka (unregistered)
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #2 at b-2.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #1 at b-1.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] Successfully initialized new client
2021-10-11T10:34:31Z D! [sarama] Initializing new client
2021-10-11T10:34:31Z D! [sarama] client/metadata fetching metadata for all topics from broker b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/metadata fetching metadata for [Onecell_PM_Data_Stream] from broker b-2.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] client/coordinator requesting coordinator for consumergroup telegraf_metrics_consumers from b-2.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] client/coordinator coordinator for consumergroup telegraf_metrics_consumers is #1 (b-1.iitdms-cs-msk.wktmib.c3.kafka)
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-1.cs-msk.9lbliq.c2.kafka (unregistered)
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #1 at b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #4 at b-4.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #2 at b-2.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #3 at b-3.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] Successfully initialized new client
2021-10-11T10:34:31Z D! [sarama] client/metadata fetching metadata for [Onecell_PM_Data_Stream] from broker b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-1.iitdms-cs-msk.wktmib.c3.kafka (registered as #1)
2021-10-11T10:34:31Z D! [sarama] client/coordinator requesting coordinator for consumergroup telegraf_metrics_consumers from b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/coordinator coordinator for consumergroup telegraf_metrics_consumers is #3 (b-3.cs-msk.9lbliq.c2.kafka)
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-3.cs-msk.9lbliq.c2.kafka(registered as #3)
ubuntu@ip-15-0-11-48:~/Chronos_5g/onecell-chronos-5g$ docker logs --tail 20 telegraf
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-2.iitdms-cs-msk.wktmib.c3.kafka (unregistered)
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #2 at b-2.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #1 at b-1.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] Successfully initialized new client
2021-10-11T10:34:31Z D! [sarama] Initializing new client
2021-10-11T10:34:31Z D! [sarama] client/metadata fetching metadata for all topics from broker b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/metadata fetching metadata for [Onecell_PM_Data_Stream] from broker b-2.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] client/coordinator requesting coordinator for consumergroup telegraf_metrics_consumers from b-2.iitdms-cs-msk.wktmib.c3.kafka
2021-10-11T10:34:31Z D! [sarama] client/coordinator coordinator for consumergroup telegraf_metrics_consumers is #1 (b-1.iitdms-cs-msk.wktmib.c3.kafka)
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-1.cs-msk.9lbliq.c2.kafka (unregistered)
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #1 at b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #4 at b-4.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #2 at b-2.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/brokers registered new broker #3 at b-3.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] Successfully initialized new client
2021-10-11T10:34:31Z D! [sarama] client/metadata fetching metadata for [Onecell_PM_Data_Stream] from broker b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-1.iitdms-cs-msk.wktmib.c3.kafka(registered as #1)
2021-10-11T10:34:31Z D! [sarama] client/coordinator requesting coordinator for consumergroup telegraf_metrics_consumers from b-1.cs-msk.9lbliq.c2.kafka
2021-10-11T10:34:31Z D! [sarama] client/coordinator coordinator for consumergroup telegraf_metrics_consumers is #3 (b-3.cs-msk.9lbliq.c2.kafka)
2021-10-11T10:34:31Z D! [sarama] Connected to broker at b-3.cs-msk.9lbliq.c2.kafka (registered as #3)
2021-10-11T10:34:31Z D! [sarama] Buffer fullness: 0 / 200000 metrics
2021-10-11T10:34:31Z D! [sarama] Buffer fullness: 0 / 200000 metrics

@powersj
Copy link
Contributor

powersj commented Oct 7, 2021

Is there a reason why you set the buffer batch limit and size to 200000? The limit configuration variable determines the maximum number of unwritten metrics. The larger it is, the larger the period of output silence will occur, which is exactly what you are seeing.

This really is a support request and not a bug in Telegraf. I suggest you drop your agent parameters and try again as you should then see metrics being sent much faster.

@powersj powersj closed this as completed Oct 7, 2021
@powersj
Copy link
Contributor

powersj commented Oct 7, 2021

I would also suggest following up on our Community Slack or Community Page. Thank you!

@pratikdas44
Copy link
Author

The reason for opting buffer batch limit to 200000, was to avoid metrics getting dropped. Can you share some info on how i can calculate the metrics coming and what should be ideal batch size and batch limit. Getting confused on that part. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kafka bug unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants