Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with producing messages to topic with unavailable partitions #213

Closed
wkuranowski opened this issue Nov 27, 2014 · 13 comments · Fixed by #218
Closed

Problem with producing messages to topic with unavailable partitions #213

wkuranowski opened this issue Nov 27, 2014 · 13 comments · Fixed by #218
Labels

Comments

@wkuranowski
Copy link

I have a cluster with 5 brokers and topic with replication factor 2 and 10 partitions.

When I shutdown 2 or 3 brokers, producer has problem to use available partitions only (I am using Round Robin partitioner). I am unable to continuously send new messages to Input() channel. There are only short windows when this is possible (in attached log look for producing string).

[Sarama] 2014/11/27 14:48:43 producer/aggregator hit maximum request size, forcing blocking flush
[Sarama] 2014/11/27 14:48:44 producer/aggregator hit maximum request size, forcing blocking flush
[Sarama] 2014/11/27 14:48:44 producer/leader state change to [retrying] on adstream2/9
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:44 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:44 producer/aggregator hit maximum request size, forcing blocking flush
[Sarama] 2014/11/27 14:48:44 producer/leader state change to [retrying] on adstream2/5
[Sarama] 2014/11/27 14:48:44 producer/leader state change to [retrying] on adstream2/0
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20873cb40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2119f3c80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:44 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20aebe540 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee8400 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:44 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:44 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20ff11ec0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:44 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20b25de40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:44 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:44 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:44 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:44 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:44 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:44 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20bee8b80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee8b40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:45 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee9480 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:45 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:45 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:45 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:45 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:45 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20bee8bc0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:45 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:45 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:45 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:45 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:45 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:46 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:46 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:46 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:46 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20bee9b00 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:46 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (7 retries remaining)
[Sarama] 2014/11/27 14:48:46 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:46 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:46 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:46 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:46 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:46 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20bee9b40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:46 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:46 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:46 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (7 retries remaining)
[Sarama] 2014/11/27 14:48:46 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:46 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:46 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (7 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (6 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc209194800 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:47 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (6 retries remaining)
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (6 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (5 retries remaining)
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:47 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc209194300 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20bee94c0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:47 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc209195340 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:47 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:47 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:47 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:47 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (5 retries remaining)
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:47 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:47 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (5 retries remaining)
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:48 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (4 retries remaining)
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:48 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:48 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:48 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:48 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:48 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:48 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919c080 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:48 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919c100 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:48 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:48 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:48 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:48 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:48 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:48 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:48 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:48 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (4 retries remaining)
[Sarama] 2014/11/27 14:48:48 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (4 retries remaining)
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919cd40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919cd80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:49 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (3 retries remaining)
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:49 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919d640 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:49 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919d680 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:49 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:49 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:49 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:49 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:49 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:49 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (3 retries remaining)
[Sarama] 2014/11/27 14:48:49 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (3 retries remaining)
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20919dc80 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (2 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a4480 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (2 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (2 retries remaining)
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a4980 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (1 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc20919c0c0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:50 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:50 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20919dcc0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:50 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a5140 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:50 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:50 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:50 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (1 retries remaining)
[Sarama] 2014/11/27 14:48:50 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:50 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:50 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (1 retries remaining)
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Agent] 2014/11/27 14:48:51 Errors() output: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc2091a5840 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9093
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 2
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6ea400 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9093.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{2 localhost:9093 0 ?reflect.Value? 0xc2091a58c0 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc2091a5880 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Registered new broker #2 at localhost:9093
[Sarama] 2014/11/27 14:48:51 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9095
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Registered new broker #3 at localhost:9094
[Agent] 2014/11/27 14:48:51 Errors() output: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20d6eae40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6eae00 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9093
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9093: connection refused
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9094
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 3
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9096
[Agent] 2014/11/27 14:48:51 Errors() output: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9094.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{3 localhost:9094 0 ?reflect.Value? 0xc20d6eb180 {0 0} } : kafka: broker: not connected
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Sarama] 2014/11/27 14:48:51 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:51 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:51 Fetching metadata from broker localhost:9092
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Agent] 2014/11/27 14:48:51 producing
[Sarama] 2014/11/27 14:48:51 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:51 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6eb600 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:51 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:51 Registered new broker #3 at localhost:9094
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9094
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9094: connection refused
[Sarama] 2014/11/27 14:48:51 Failed to fetch metadata from broker localhost:9095, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:51 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (10 retries remaining)
[Sarama] 2014/11/27 14:48:51 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:51 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:52 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:52 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:52 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6f8180 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:52 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:52 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:52 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:52 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:52 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:52 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:52 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6f8640 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:52 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:52 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:52 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:52 dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:52 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (9 retries remaining)
[Sarama] 2014/11/27 14:48:53 Fetching metadata from broker localhost:9096
[Sarama] 2014/11/27 14:48:53 Unexpected error from GetMetadata, closing broker: dial tcp 127.0.0.1:9096: connection refused
[Sarama] 2014/11/27 14:48:53 Disconnecting Broker 5
[Sarama] 2014/11/27 14:48:53 Fetching metadata from broker localhost:9092
[Sarama] 2014/11/27 14:48:53 Failed to close connection to broker localhost:9096.
[Sarama] 2014/11/27 14:48:53 kafka: broker: not connected
[Sarama] 2014/11/27 14:48:53 Error closing &{5 localhost:9096 0 ?reflect.Value? 0xc20d6f8b40 {0 0} } : kafka: broker: not connected
[Sarama] 2014/11/27 14:48:53 Registered new broker #5 at localhost:9096
[Sarama] 2014/11/27 14:48:53 Failed to fetch metadata from broker localhost:9092, waiting 750ms... (8 retries remaining)
[Sarama] 2014/11/27 14:48:53 Failed to connect to broker localhost:9096
[Sarama] 2014/11/27 14:48:53 dial tcp 127.0.0.1:9096: connection refused

@eapache
Copy link
Contributor

eapache commented Nov 28, 2014

It should recover eventually and start sending messages only on the available partitions. It would recover faster if you reduce your config.MetadataRetries and config.WaitForElection somewhat in the client - 10x750ms seems very high, and will force the producer to wait at least 7.5 seconds per failed message before it can start producing again. The defaults of 3x250ms are fine for us normally.

What happens if you reduce the timeouts, or just let it run for longer?

@wkuranowski
Copy link
Author

I have about 100k - 200k messages per second to my cluster. Even setting 3x250ms for retries doesn't seem to be good. I hope that you can just mark partitions as unavailable and fail all messages assigned to them or just reassign messages to available partitions.

Now I see that you are probably doing this in #215. Tell me when it's ready to test :)

@eapache
Copy link
Contributor

eapache commented Dec 1, 2014

You can try the leader-breaker branch now: https://github.com/Shopify/sarama/tree/leader-breaker

#215 is the first part, which tweaks behaviour so we stop sending new messages to unavailable partitions.

The leader-breaker branch is additional work on top to fail messages quickly when a partition goes down. Once #215 is merged it will have to be rebased on master and then properly vetted. Right now it hardcodes failing-fast for 10 seconds in between retries though I'm open to making that configurable if necessary.

P.S. You can also play with ChannelBufferSize in the producer config, setting it to a large value (perhaps 2048) may or may not help, I'm not sure. In a benchmark scenario (generating messages as fast as possible) it will probably just fill up and not make a difference, but if you're pushing real messages then it might give you some wiggle room.

@eapache eapache mentioned this issue Dec 8, 2014
@wkuranowski
Copy link
Author

I test current leader-breaker branch. It looks much better. But I think that there are still some issues. I get a lot of circuit breaker is open errors. It looks like messages are still sent to unavailable partitions. Is this expected behaviour for this branch? I am using round robin partitioner.

@eapache
Copy link
Contributor

eapache commented Dec 9, 2014

It should stop producing to a partition once Kafka returns LeaderNotAvailable for that partition. If your cluster isn't picking up that the leader is gone for some reason then it would keep trying. I'm not sure why that would be.

@wkuranowski
Copy link
Author

I get 'kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes.' so it looks like Kafka returns LeaderNotAvailable for that partition. And I see retries every 10 seconds.

I look at RoundRobinPartitioner code and I don't know if I understand it correctly. You pick partition number from 0 to numPartitions-1. What if partition number 2 (of 10) is not available?

@wkuranowski
Copy link
Author

Ok, I see that you select partition from WritablePartitions for round robin partitioner, so it seems good... I will try to do some more testing...

@wkuranowski
Copy link
Author

It's strange... I have 10 partitions. When I shut down 2 brokers, partition 4 and 8 become unavailable. But in error channel I see circuit breaker is open for messages with assigned partition = 9. But partition 9 is available...

@eapache do you have any idea?

@eapache
Copy link
Contributor

eapache commented Dec 9, 2014

I pushed a potential fix (f1a6b99) to the leader-breaker branch. I suspect what was happening was that partition 9, while still available, was changing leader (i.e. one of the two brokers you shut down was the old leader of 9). When this happened it was refreshing its metadata, and because of a case I missed the error for the other partitions was trickling up, causing its circuit to open.

@wkuranowski
Copy link
Author

Some more info. Here is my topic description:

Topic: adstream2        Partition: 0    Leader: 3       Replicas: 3,2   Isr: 3,2
Topic: adstream2        Partition: 1    Leader: 4       Replicas: 4,3   Isr: 3,4
Topic: adstream2        Partition: 2    Leader: 5       Replicas: 5,4   Isr: 4,5
Topic: adstream2        Partition: 3    Leader: 1       Replicas: 1,5   Isr: 1,5
Topic: adstream2        Partition: 4    Leader: 2       Replicas: 2,1   Isr: 2,1
Topic: adstream2        Partition: 5    Leader: 3       Replicas: 3,4   Isr: 3,4
Topic: adstream2        Partition: 6    Leader: 4       Replicas: 4,5   Isr: 4,5
Topic: adstream2        Partition: 7    Leader: 5       Replicas: 5,1   Isr: 1,5
Topic: adstream2        Partition: 8    Leader: 1       Replicas: 1,2   Isr: 2,1
Topic: adstream2        Partition: 9    Leader: 2       Replicas: 2,3   Isr: 3,2

When I kill broker 1 and 2, I lose partition 4 and 8 and get circuit breaker is open for messages with assigned partition 9.
When I kill broker 1 and 5, I lose partition 3 and 7 and get circuit breaker is open for messages with assigned partition 2.

Lost partitions (4, 8 or 3, 7) are correctly recognized and are not on the WriteablePartitions list. Kafka topic description also correctly marks unavailable partitions - Leader: -1.

@wkuranowski
Copy link
Author

Ok, I can confirm that f1a6b99 fixes the problem. Almost... because when I kill a broker which is also a controller in a cluster, there are still some problems. If controller was our last replica for a partition, then this partition is not recognized as unavailable. This could be a Kafka bug, because description for that partition still contains that broker as available broker, even after several minutes.

I am going to test this with Kafka 0.8.2-beta...

@wkuranowski
Copy link
Author

Kafka 0.8.2-beta looks good. Killing a controller is handled properly there. So I must say that everything now works really nice. Thanks! :)

I have a question. I want to resend failed messages from error channel. I have a separate goroutine to read from that channel. Is it safe to just construct a new MessageToSend and send to Input() channel? What about deadlocks?

eapache added a commit that referenced this issue Dec 9, 2014
- As I discovered while investigating additional issues from #213, I made a
  stupid in 1b465e7 and kept returning an error in the case where we ran out of
  leader-election retries, despite *explicitly stating in the commit message*
  that the whole point was to stop doing that.
- Simplify and reorganize flow control in several places.
- Clean up and improve log messages.
@eapache
Copy link
Contributor

eapache commented Dec 9, 2014

Is it safe to just construct a new MessageToSend and send to Input() channel? What about deadlocks?

  • you should be able to send the very same MessageToSend back to the input channel, no need to construct a new one
  • you will have to maintain a local buffer and use a select to avoid deadlocks, something like:
select {
case err := <-Errors():
    buffer = append(buffer, err.Msg)
case Input() <- buffer[0]:
    buffer = buffer[1:] // pop message from buffer
}

(edit: be sure to handle the case when the buffer is empty)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants