Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sarama Producer hangs when broker dies no errors are thrown. #765

Closed
chaithrar opened this issue Oct 17, 2016 · 13 comments
Closed

Sarama Producer hangs when broker dies no errors are thrown. #765

chaithrar opened this issue Oct 17, 2016 · 13 comments

Comments

@chaithrar
Copy link

Versions

Sarama Version: Latest
Kafka Version:2.11-0
Go Version:1.5.3

Configuration

config.Producer.Retry.Max = 2
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
config.Producer.RequiredAcks = 1
config.Producer.Timeout = 10

Logs

No logs are present. When broker dies prodcuer.SendMessage just hangs does not return.

Problem Description

We are using sync producer to produce stats to Kafka. It works fine if the kafka cluster is up and running.
If in between kafka cluster dies then "Producer.SendMessage"/"Prodcuer.Close" Just hangs and never returns.

I have to restart the producer instance in order to come out if this issue. Is there any setting which is missing?

Regards
Chaitra

@eapache
Copy link
Contributor

eapache commented Oct 17, 2016

Try looking at config.Net.ReadTimeout and config.Net.WriteTimeout?

@chaithrar
Copy link
Author

chaithrar commented Oct 17, 2016

I did try by adding Read, Write and Dial timeout to 30 seconds , even then once the cluster is gone its still hung

@eapache
Copy link
Contributor

eapache commented Oct 17, 2016

No logs are present.

If nothing else there will be logs on producer startup, please provide whatever logs there are.

Sarama Version: Latest

Please provide an actual version number or git SHA, "Latest" changes all the time.

@chaithrar
Copy link
Author

chaithrar commented Oct 17, 2016

The One I took belongs to Git Sha "02452b3987ac0e7b93702149e96d4f9b2e6521ec"

I dont see any logs either while using Async producer or sycn producers . Is there any way to produce debug logs so that I can provide the same to you

@eapache
Copy link
Contributor

eapache commented Oct 17, 2016

You can set sarama.Logger to a log.Logger to capture Sarama debug output.

@chaithrar
Copy link
Author

# //// START OF THE ASYNC PRODUCER

[sarama] 2016/10/17 19:09:47 Initializing new client
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 client/metadata fetching metadata for all topics from broker 10.203.31.230:9092
[sarama] 2016/10/17 19:09:47 Connected to broker at 10.203.31.230:9092 (unregistered)
[sarama] 2016/10/17 19:09:47 client/brokers registered new broker #5 at 10.203.31.230:9092
[sarama] 2016/10/17 19:09:47 client/brokers registered new broker #1 at 10.203.29.83:9092
[sarama] 2016/10/17 19:09:47 client/brokers registered new broker #2 at 10.203.14.96:9092
[sarama] 2016/10/17 19:09:47 client/brokers registered new broker #3 at 10.203.17.161:9092
[sarama] 2016/10/17 19:09:47 client/brokers registered new broker #4 at 10.203.41.233:9092
[sarama] 2016/10/17 19:09:47 Successfully initialized new client
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 producer/broker/3 starting up
[sarama] 2016/10/17 19:09:47 producer/broker/3 state change to [open] on my_topic/8
[sarama] 2016/10/17 19:09:47 Connected to broker at 10.203.17.161:9092 (registered as #3)
[sarama] 2016/10/17 19:09:47 producer/broker/3 state change to [open] on my_topic/3
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 producer/broker/2 starting up
[sarama] 2016/10/17 19:09:47 producer/broker/2 state change to [open] on my_topic/2
[sarama] 2016/10/17 19:09:47 producer/broker/2 state change to [open] on my_topic/17
[sarama] 2016/10/17 19:09:47 producer/broker/2 state change to [open] on my_topic/12
[sarama] 2016/10/17 19:09:47 producer/broker/2 state change to [open] on my_topic/7
[sarama] 2016/10/17 19:09:47 producer/broker/5 starting up
[sarama] 2016/10/17 19:09:47 producer/broker/5 state change to [open] on my_topic/5
[sarama] 2016/10/17 19:09:47 producer/broker/5 state change to [open] on my_topic/10
[sarama] 2016/10/17 19:09:47 producer/broker/5 state change to [open] on my_topic/0
[sarama] 2016/10/17 19:09:47 producer/broker/3 state change to [open] on my_topic/13
[sarama] 2016/10/17 19:09:47 producer/broker/3 state change to [open] on my_topic/18
[sarama] 2016/10/17 19:09:47 producer/broker/4 starting up
[sarama] 2016/10/17 19:09:47 producer/broker/4 state change to [open] on my_topic/19
[sarama] 2016/10/17 19:09:47 producer/broker/4 state change to [open] on my_topic/14
[sarama] 2016/10/17 19:09:47 producer/broker/4 state change to [open] on my_topic/4
[sarama] 2016/10/17 19:09:47 producer/broker/4 state change to [open] on my_topic/9
[sarama] 2016/10/17 19:09:47 Connected to broker at 10.203.29.83:9092 (registered as #1)
[sarama] 2016/10/17 19:09:47 producer/broker/1 starting up
[sarama] 2016/10/17 19:09:47 producer/broker/1 state change to [open] on my_topic/6
[sarama] 2016/10/17 19:09:47 producer/broker/1 state change to [open] on my_topic/1
[sarama] 2016/10/17 19:09:47 producer/broker/1 state change to [open] on my_topic/16
[sarama] 2016/10/17 19:09:47 producer/broker/1 state change to [open] on my_topic/11
[sarama] 2016/10/17 19:09:47 Connected to broker at 10.203.14.96:9092 (registered as #2)
[sarama] 2016/10/17 19:09:47 producer/broker/5 state change to [closing] because kafka: broker not connected
[sarama] 2016/10/17 19:09:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:09:47 Connected to broker at 10.203.41.233:9092 (registered as #4)
[sarama] 2016/10/17 19:09:47 producer/leader/my_topic/10 state change to [retrying-1]
[sarama] 2016/10/17 19:09:47 producer/broker/5 starting up
[sarama] 2016/10/17 19:09:47 producer/broker/5 state change to [open] on my_topic/15
[sarama] 2016/10/17 19:09:47 producer/leader/my_topic/10 abandoning broker 5
[sarama] 2016/10/17 19:09:47 Connected to broker at 10.203.31.230:9092 (registered as #5)
[sarama] 2016/10/17 19:09:47 producer/leader/my_topic/0 state change to [retrying-1]
[sarama] 2016/10/17 19:09:47 producer/leader/my_topic/5 state change to [retrying-1]
[sarama] 2016/10/17 19:09:47 producer/leader/my_topic/5 abandoning broker 5
[sarama] 2016/10/17 19:09:47 producer/leader/my_topic/0 abandoning broker 5
[sarama] 2016/10/17 19:09:47 producer/broker/5 shut down

# WHEN KAFKA CLUSTER GOES DOWN

[sarama] 2016/10/17 19:10:17 client/metadata fetching metadata for [my_topic] from broker 10.203.31.230:9092
[sarama] 2016/10/17 19:10:17 client/metadata got error from broker while fetching metadata: EOF
[sarama] 2016/10/17 19:10:17 Closed connection to broker 10.203.31.230:9092
[sarama] 2016/10/17 19:10:17 client/metadata fetching metadata for [my_topic] from broker 10.203.31.230:9092
[sarama] 2016/10/17 19:10:17 client/metadata got error from broker while fetching metadata: kafka: broker not connected
[sarama] 2016/10/17 19:10:17 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:10:17 client/metadata fetching metadata for [my_topic] from broker 10.203.14.96:9092
[sarama] 2016/10/17 19:10:17 client/brokers deregistered broker #-1 at 10.203.31.230:9092
[sarama] 2016/10/17 19:10:17 client/metadata fetching metadata for [my_topic] from broker 10.203.14.96:9092
[sarama] 2016/10/17 19:10:17 client/metadata fetching metadata for [my_topic] from broker 10.203.31.230:9092
[sarama] 2016/10/17 19:10:17 client/metadata got error from broker while fetching metadata: kafka: broker not connected
[sarama] 2016/10/17 19:10:17 client/brokers deregistered broker #-1 at 10.203.31.230:9092
[sarama] 2016/10/17 19:10:17 client/metadata fetching metadata for [my_topic] from broker 10.203.14.96:9092
[sarama] 2016/10/17 19:10:47 Failed to connect to broker 10.203.14.96:9092: dial tcp 10.203.14.96:9092: i/o timeout
[sarama] 2016/10/17 19:10:47 client/metadata got error from broker while fetching metadata: dial tcp 10.203.14.96:9092: i/o timeout
[sarama] 2016/10/17 19:10:47 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 19:10:47 client/metadata fetching metadata for [my_topic] from broker 10.203.29.83:9092
[sarama] 2016/10/17 19:10:47 client/metadata got error from broker while fetching metadata: dial tcp 10.203.14.96:9092: i/o timeout
[sarama] 2016/10/17 19:10:47 client/brokers deregistered broker #-1 at 10.203.14.96:9092
[sarama] 2016/10/17 19:10:47 client/metadata fetching metadata for [my_topic] from broker 10.203.29.83:9092
[sarama] 2016/10/17 19:10:47 client/metadata got error from broker while fetching metadata: dial tcp 10.203.14.96:9092: i/o timeout
[sarama] 2016/10/17 19:10:47 client/brokers deregistered broker #-1 at 10.203.14.96:9092
[sarama] 2016/10/17 19:10:47 client/metadata fetching metadata for [my_topic] from broker 10.203.29.83:9092

@chaithrar
Copy link
Author

chaithrar commented Oct 17, 2016

Please find the start of producer and issue logs for the scenario

@chaithrar
Copy link
Author

chaithrar commented Oct 17, 2016

Seems like metadata fetching is going on for ever I did set "Metadata Retry to 2 and backoff time to 30sec"

@chaithrar
Copy link
Author

I tried closing the producer after receiving message "Brokers are not available" but Below logs continues to come.. even though I invoke Close() any config setting which is required for this.
I use 5 node kafka cluster

[sarama] 2016/10/17 20:07:50 client/metadata got error from broker while fetching metadata: dial tcp 10.203.29.89:9092: getsockopt: no route to host
[sarama] 2016/10/17 20:07:50 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 20:07:50 client/metadata fetching metadata for [my_topic] from broker 10.203.41.239:9092
[sarama] 2016/10/17 20:07:50 client/metadata got error from broker while fetching metadata: dial tcp 10.203.29.89:9092: getsockopt: no route to host
[sarama] 2016/10/17 20:07:50 client/brokers deregistered broker #-1 at 10.203.29.89:9092
[sarama] 2016/10/17 20:07:50 client/metadata fetching metadata for [my_topic] from broker 10.203.41.239:9092
[sarama] 2016/10/17 20:07:53 Failed to connect to broker 10.203.41.239:9092: dial tcp 10.203.41.239:9092: getsockopt: no route to host
[sarama] 2016/10/17 20:07:53 client/metadata got error from broker while fetching metadata: dial tcp 10.203.41.239:9092: getsockopt: no route to host
[sarama] 2016/10/17 20:07:53 ClientID is the default of 'sarama', you should consider setting it to something application-specific.
[sarama] 2016/10/17 20:07:53 client/metadata fetching metadata for [my_topic] from broker 10.203.31.234:9092
[sarama] 2016/10/17 20:07:53 client/metadata got error from broker while fetching metadata: dial tcp 10.203.41.239:9092: getsockopt: no route to host
[sarama] 2016/10/17 20:07:53 client/brokers deregistered broker #-1 at 10.203.41.239:9092
[sarama] 2016/10/17 20:07:53 client/metadata fetching metadata for [my_topic] from broker 10.203.31.234:9092

@chaithrar
Copy link
Author

chaithrar commented Oct 17, 2016

config.Producer.Retry.Max = 2
config.Producer.Retry.Backoff = (10 * time.Second)
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
config.Producer.RequiredAcks = 1
config.Producer.Timeout = (10 * time.Second)
config.Net.ReadTimeout = (10 * time.Second)
config.Net.DialTimeout = (10 * time.Second)
config.Net.WriteTimeout = (10 * time.Second)

config.Metadata.Retry.Max = 1
config.Metadata.Retry.Backoff = (10 * time.Second)
config.Metadata.RefreshFrequency = (15 * time.Minute)

With the above configuration able to work around broker failure scenarios. The Debug logs really helped thank you.

@eapache
Copy link
Contributor

eapache commented Oct 24, 2016

So in conclusion, not a bug, we just default to a lot of retries and long timeouts.

@eapache eapache closed this as completed Oct 24, 2016
@xinyu7git
Copy link

after reading the codes, i I have summarized a formula:

if all brokers can't be connected, the total timeout is:
Timeout = conf.Net.DiailTimeout*(1+(2+config.Metadata.Retry.Max)*count(brokers))

the default conf.Net.DialTimeout is 30s, and the default conf.Metadata.Retry.Max is 3.
so, the total default timeout = 30 * (1 + (2 + 3 ) * 3 ) = 30 * 16 = 480s.
the client will get an error response after 480s.

@3AceShowHand
Copy link

The timeout is also related to the number of messages buffered in the producer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants