Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

fix input (particularly kafka-mdm) exit flow #748

Merged
merged 4 commits into from
Jan 30, 2018
Merged

Conversation

Dieterbe
Copy link
Contributor

I want to test this by introducing a kafka failure and watching the exit flow in action,
but before i spend time on that, want to get a code sanity check.

@Dieterbe
Copy link
Contributor Author

ping @woodsaj @replay

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Jan 24, 2018

I was able to trigger the failure scenario in the docker-cluster stack by setting offset to oldest and fiddling a bit with the kafka topic and restarting metrictank1.

this is the output I got:

metrictank1_1  | [Sarama] 2018/01/24 20:55:17 kafka: error while consuming mdm/0: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition.
metrictank1_1  | [Sarama] 2018/01/24 20:55:17 consumer/mdm/0 shutting down because kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition.
metrictank1_1  | 2018/01/24 20:55:17 [kafkamdm.go:306 consumePartition()] [E] kafka-mdm: kafka consumer for mdm:0 has shutdown. stop consuming
metrictank1_1  | 2018/01/24 20:55:17 [I] An input plugin signalled a fatal error. Shutting down
metrictank1_1  | 2018/01/24 20:55:17 [I] CLU manager: HTTPNode metrictank1 has left the cluster
metrictank2_1  | 2018/01/24 20:55:17 [I] CLU manager: HTTPNode metrictank1 has left the cluster
metrictank3_1  | 2018/01/24 20:55:17 [I] CLU manager: HTTPNode metrictank1 has left the cluster
metrictank0_1  | 2018/01/24 20:55:17 [I] CLU manager: HTTPNode metrictank1 has left the cluster
metrictank1_1  | 2018/01/24 20:55:17 [I] API shutdown started.
metrictank1_1  | 2018/01/24 20:55:17 [I] API accept tcp [::]:6060: use of closed network connection
metrictank1_1  | 2018/01/24 20:55:17 [I] Shutting down carbon consumer
metrictank1_1  | 2018/01/24 20:55:17 [I] Shutting down kafka-mdm consumer
metrictank1_1  | 2018/01/24 20:55:17 [I] carbon-in: shutting down.
metrictank1_1  | 2018/01/24 20:55:17 [I] carbon consumer finished shutdown
metrictank1_1  | [Sarama] 2018/01/24 20:55:18 consumer/broker/0 closed dead subscription to mdm/3
metrictank1_1  | 2018/01/24 20:55:18 [I] kafka-mdm consumer for mdm:3 ended.
metrictank2_1  | 2018/01/24 20:55:18 [DEBUG] memberlist: Initiating push/pull sync with: 172.18.0.9:7946
metrictank0_1  | 2018/01/24 20:55:18 [DEBUG] memberlist: Stream connection from=172.18.0.10:48226
metrictank1_1  | [Sarama] 2018/01/24 20:55:19 consumer/broker/0 closed dead subscription to mdm/2
metrictank1_1  | [Sarama] 2018/01/24 20:55:19 consumer/broker/0 closed dead subscription to mdm/1
metrictank1_1  | 2018/01/24 20:55:19 [I] kafka-mdm consumer for mdm:1 ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] kafka-mdm consumer for mdm:2 ended.
metrictank1_1  | [Sarama] 2018/01/24 20:55:19 Closing Client
metrictank1_1  | 2018/01/24 20:55:19 [I] kafka-mdm consumer finished shutdown
metrictank1_1  | 2018/01/24 20:55:19 [I] closing store
metrictank1_1  | [Sarama] 2018/01/24 20:55:19 Closed connection to broker kafka:9092
metrictank1_1  | [Sarama] 2018/01/24 20:55:19 Closed connection to broker kafka:9092
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx stopping
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] cassandra-idx writeQueue handler ended.
metrictank1_1  | 2018/01/24 20:55:19 [I] terminating.
metrictank0_1  | 2018/01/24 20:55:19 [DEBUG] memberlist: Initiating push/pull sync with: 172.18.0.10:7946
metrictank2_1  | 2018/01/24 20:55:19 [DEBUG] memberlist: Stream connection from=172.18.0.9:56566
dockercluster_metrictank1_1 exited with code 0

@Dieterbe
Copy link
Contributor Author

Dieterbe commented Jan 24, 2018

@woodsaj please re-review because since last time I added two commits.
even though we'll switch out our kafka lib, there's some useful stuff here to merge
thanks!

when an input plugin runs into a fatal error, it should signal this
to the caller (main routine) so that it can clean up any resources,
announce its departure to the cluster, etc, before shutting down.

also when kafka consumer closes (due to offset problem typically),
exit cleaner with better error message. fix #520
* the mdm plugin closing the fatal chan in Start()
  was not preventing it from keep trying to initialize
  (while it already failed and while main would try to clean up)
* simpler approach to just return error
* cleaner exit flow for Carbon.Start() failure

note: this now requires the ability to trigger shutdown from earlier in
main, which meant making a standalone function, which meant moving some
parameters to globals, which meant store is now a Store, not a CassandraStore,
and all other stores needed a new SetTracer function
@Dieterbe Dieterbe merged commit 332a8ff into master Jan 30, 2018
@Dieterbe Dieterbe deleted the better-kafka-message branch September 18, 2018 09:07
@Dieterbe Dieterbe added this to the 0.8.1 milestone Dec 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants