Correct readyness #717

Dieterbe · 2017-08-31T20:49:41Z

note that index init (backfill) is blocking, and plugin.MaintainPriority is only called afterwards. so when priority goes down from 10k, index is guaranteed to be loaded.

future nice to have:

if a peer errors 503, try next-best peer that hosts the same shard if it's also ready

Dieterbe · 2017-09-04T21:08:23Z

i've been testing this with docker-cluster, seems to work, but want to test some more cases. meanwhile @woodsaj this is good for reviewing.

woodsaj · 2017-09-05T03:10:34Z

cluster/node.go

+	case NodeUnreachable:
+		return []byte(`"NodeUnreachable"`), nil
+	}
+	panic(fmt.Sprintf("impossible nodestate %v", n))


dont panic. This method can return an error, so return it.

woodsaj · 2017-09-05T03:25:34Z

input/kafkamdm/kafkamdm.go

+			log.Error(3, "kafka-mdm failed to get oldest offset of partition %s:%d. %s", topic, partition, err)
+		}
+	}
+
 	if currentOffset >= 0 {
 		// we cant set the offsetMetrics until we know what offset we are at.


It looks like this statement is no longer true. We do know the offset we are at, it is stored in initialOffset

Dieterbe · 2017-09-05T18:17:36Z

@woodsaj PTAL . an interesting side effect is that what is stored in the offsetMgr database is a bit different, previously we would store -1 and -2 as long as nothing was consumed, now since we resolve the offset to a concrete one, that is what is stored in the db

woodsaj · 2017-09-05T19:56:33Z

input/kafkamdm/kafkamdm.go

+
+	// we must determine the initial offset of our consumer
+	// if we just assign the current newest or oldest to it,
+	// it may actually be a later value then what oldest/newest was when we started the consumer


this doesnt make any sense to me. There is no reason why you cant just fetch the desired offset, whether that be newest, oldest or a specific offset and also fetch the newest offset. This is all the information needed to be able to correctly set the lag.

There is no need to consume the messages here.

i would simply refactor this function to:

Get Newest offset

if currentOffset == sarama.OffsetNewest then set currentOffset to newest
else if currentOffset == sarama.OffsetOldest, then currentOffset = tryGetOffset(Oldest)
else currentOffset remains unchanged or you can call tryGetOffset(currentOffset) to make sure the offset exists. If not then we should fall back to oldest or newest?

set partitionLogSizeMetric, partitionOffsetMetric and partitionLagMetric

call pc, err := k.consumer.ConsumePartition(topic, partition, currentOffset) currentOffset will always be an explicit offset rather then newest or oldest.

consume from pc.messages

I agree with everything you say, except:

or you can call tryGetOffset(currentOffset) to make sure the offset exists. If not then we should fall back to oldest or newest?

a call to tryGetOffset for a concrete offset, followed by then consumer.ConsumePartition is racey (offset might disappear in between the two), and overkill (we might as well just rely on consumer.ConsumePartition telling us wether or not it failed to consume the requested offset)
if it doesn't exist, I think it's best to fatal and draw the operator's attention.

woodsaj · 2017-09-05T19:59:10Z

input/kafkamdm/kafkamdm.go

+			break
+		}
+		if attempt == 7 {
+			log.Fatal(3, "kafka-mdm failed to get %s of partition %s:%d. %s (attempt %d/7)", offsetStr, topic, partition, err)


this is dangerous. if the user has MT confured to resume from "last" but the last offset stored by offsetMger is no longer available in kafka, MT will just enter a crash loop.

I think you should have this method return an error instead and let the caller decide if the error is fatal

if the user has MT confured to resume from "last" but the last offset stored by offsetMger is no longer available in kafka

isn't the behavior here par with what's in master? AFAICT, master will also crash because it has this:

pc, err := k.consumer.ConsumePartition(topic, partition, currentOffset) if err != nil { log.Fatal(4, "kafka-mdm: failed to start partitionConsumer for %s:%d. %s", topic, partition, err) }

I think crashing in this case is the right thing to do.

Internal libraries should (almost) never just crash, they should return an error indicating they can't continue and let the code that's calling them decide what to do about it. What is the rationale for allowing the input processor to just stop the whole program dead in its tracks?

the rationale for why to abort is discussed at #520 ; this is also what i was referring to when i said the behavior is on par with master. (we've been doing this ever since kafka support was introduced)
as for why is the abort coming from the input plugin and not from main, i generally agree with you and am open to refactoring that. but let's do that in a future PR. I want these fixes to get sooner rather than later.

woodsaj · 2017-09-05T20:03:29Z

input/kafkamdm/kafkamdm.go

@@ -218,6 +218,27 @@ func (k *KafkaMdm) Start(handler input.Handler) {
 	}
 }

+// tryGetOffset will query kafka repeatedly for the requested offset, and return it when it has it,
+// or halt the program if can't retrieve the value
+func (k *KafkaMdm) tryGetOffset(topic string, partition int32, offsetStr string, offset int64, attempts int, sleep time.Duration) int64 {


offsetStr in something that should be determined inside this function, not passed in.

var offsetStr string switch offset { case sarama.OffsetNewest: offsetStr = "newest" case sarama.OffsetOldest: offsetStr = "oldest" default: offsetStr = strings.FormatInt(offset, 10) }

woodsaj · 2017-09-05T20:04:10Z

input/kafkamdm/kafkamdm.go

+			break
+		}
+		if attempt == 7 {
+			log.Fatal(3, "kafka-mdm failed to get %s of partition %s:%d. %s (attempt %d/7)", offsetStr, topic, partition, err)


you also look to missing "attempt" from the args passed to log.Fatal()

woodsaj · 2017-09-05T20:04:23Z

input/kafkamdm/kafkamdm.go

+		if attempt == 7 {
+			log.Fatal(3, "kafka-mdm failed to get %s of partition %s:%d. %s (attempt %d/7)", offsetStr, topic, partition, err)
+		}
+		log.Error(3, "kafka-mdm failed to get %s of partition %s:%d. %s (attempt %d/7)", offsetStr, topic, partition, err)


missing "attempt" from the args passed to log.Fatal()

woodsaj · 2017-09-06T14:51:02Z

input/kafkamdm/kafkamdm.go

@@ -262,19 +300,16 @@ func (k *KafkaMdm) consumePartition(topic string, partition int32, currentOffset
 				log.Error(3, "kafka-mdm failed to commit offset for %s:%d, %s", topic, partition, err)
 			}
 			k.lagMonitor.StoreOffset(partition, currentOffset, ts)
-			offset, err := k.client.GetOffset(topic, partition, sarama.OffsetNewest)
+			newest, err := k.client.GetOffset(topic, partition, sarama.OffsetNewest)


for consistancy I think this should use currentOffset, err = k.tryGetOffset(topic, partition, sarama.OffsetOldest, 1, time.Second)

if you mean "use tryGetOffset", that's fine I can do that
but assigning the oldest offset to currentOffset doesn't make sense here. the purpose of this code is update newest to be the currently newest offset.

i meant use k.tryGetOffset instead of calling k.client.GetOffset directly.

woodsaj · 2017-09-06T14:52:50Z

input/kafkamdm/kafkamdm.go

+		if attempt == attempts {
+			break
+		}
+		log.Warn("kafka-mdm %s", err)


You need to log before break'ing, else the last log message will never be written. if "attempts=1" then no log will be written.

the warnings here are only for failed attempts that have subsequent attempts. the final error is returned to the caller and logged there as a critical. (this is what i thought you asked for)

woodsaj · 2017-09-07T15:15:47Z

LGTM

it's up to the input plugin to signify when the priority gets low.

in case a peer node queries us because it thought we are healthy but actually it has outdated readyness/priority information (e.g. we just crashed and rebooted)

the failure mode is as follows: * MT starts up, and starts the input plugin * input is configured to consume from offset latest/oldest * for some reason consumption doesn't start straight away, and the ticker in consumePartition triggers and the offset of -1/-2 is saved. * consumption starts and concrete (high) offset is reported * MaintainPriority() runs, the rate looks ridiculously high, so a low priority score is reported (even though there may be a big lag) * the node effectively announces it can serve queries but in fact it doesn't have the data, leading to incomplete/incorrect query responses.

previously, the lag monitor would report 0 in certain cases (no measurements yet, rollover, kafka recovery), which would lead to an unrealistically low priority, making the node announce it's in great shape, but actually providing incorrect output if it's lagging behind. with this change: * if we don't know what our priority should be, report 10k, same as default unknown priority. * if we had to skip a lag report, just report the previous value (possibly also 10k or something concrete)

BEFORE: { "name": "metrictank3", "version": "0.7.4-85-ga578a40", "primary": false, "primaryChange": "2017-09-04T17:42:39.782065453Z", "state": 0, "priority": 10000, "started": "2017-09-04T17:42:39.780889544Z", "stateChange": "2017-09-04T17:42:39.782065509Z", "partitions": [ 4, 5, 6, 7 ], "apiPort": 6060, "apiScheme": "http", "updated": "2017-09-04T17:42:55.050884771Z", "remoteAddr": "172.21.0.10" } AFTER: { "name": "metrictank0", "version": "0.7.4-85-ga578a40", "primary": true, "primaryChange": "2017-09-04T18:38:35.821909901Z", "state": "NodeReady", "priority": 10000, "started": "2017-09-04T18:38:35.820140791Z", "stateChange": "2017-09-04T18:38:35.821909959Z", "partitions": [ 0, 1, 2, 3 ], "apiPort": 6060, "apiScheme": "http", "updated": "2017-09-04T18:38:53.106347005Z", "remoteAddr": "172.21.0.9" }

if we're starting from oldest/newest and have not consumed anything yet, still report lag and current offset, since we can know both accurately. In particular this fixes a bug where on a kafka cluster where nothing was being produced, nothing was also obviously being consumed, so no lag was reported, so priority was reported as 10k, so the node was not able to become ready.

we need the currentOffset to be concrete-ized instead of -1 (newest) or -2 (oldest) otherwise we cannot properly report our lag, and thus cannot set our priority and function in the cluster this allows us to then further simplify some things further down the code.

Dieterbe force-pushed the correct-readyness branch 2 times, most recently from ffe838a to a578a40 Compare September 4, 2017 17:24

Dieterbe changed the title ~~Correct readyness - not tested yet~~ Correct readyness Sep 4, 2017

Dieterbe requested a review from woodsaj September 4, 2017 21:07

woodsaj reviewed Sep 5, 2017

View reviewed changes

Dieterbe force-pushed the correct-readyness branch from f145ea5 to 19aa45a Compare September 5, 2017 08:39

woodsaj suggested changes Sep 5, 2017

View reviewed changes

Dieterbe force-pushed the correct-readyness branch from 404553e to d75c075 Compare September 6, 2017 11:17

woodsaj reviewed Sep 6, 2017

View reviewed changes

woodsaj approved these changes Sep 7, 2017

View reviewed changes

Dieterbe added 11 commits September 7, 2017 19:30

default to a high priority

502db4c

it's up to the input plugin to signify when the priority gets low.

refuse index, data, render etc calls if server is not ready

073d58d

in case a peer node queries us because it thought we are healthy but actually it has outdated readyness/priority information (e.g. we just crashed and rebooted)

better docs

6f0bccf

sort vars for easier diffing

1acc364

remove artifical warm-up, let cluster figure it out by itself

d3a8e5e

for consistency

65631a5

Dieterbe force-pushed the correct-readyness branch from f2bcce3 to 65631a5 Compare September 7, 2017 17:34

Dieterbe added 2 commits September 7, 2017 19:35

handy dashboard for diving deeper into cluster stats

fba77f9

dashboard fix

b3421fa

Dieterbe merged commit de00028 into master Sep 7, 2017

Dieterbe mentioned this pull request Oct 1, 2017

Panic when nil message comes from kafka #677

Closed

Dieterbe deleted the correct-readyness branch September 18, 2018 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct readyness #717

Correct readyness #717

Dieterbe commented Aug 31, 2017 •

edited

Loading

Dieterbe commented Sep 4, 2017

woodsaj Sep 5, 2017

woodsaj Sep 5, 2017

Dieterbe commented Sep 5, 2017

woodsaj Sep 5, 2017 •

edited

Loading

Dieterbe Sep 6, 2017

woodsaj Sep 5, 2017

woodsaj Sep 5, 2017

Dieterbe Sep 6, 2017

DanCech Sep 6, 2017

Dieterbe Sep 6, 2017

woodsaj Sep 5, 2017

woodsaj Sep 5, 2017

woodsaj Sep 5, 2017

woodsaj Sep 6, 2017

Dieterbe Sep 6, 2017

woodsaj Sep 7, 2017

woodsaj Sep 6, 2017

Dieterbe Sep 6, 2017

woodsaj commented Sep 7, 2017

Correct readyness #717

Correct readyness #717

Conversation

Dieterbe commented Aug 31, 2017 • edited Loading

Dieterbe commented Sep 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dieterbe commented Sep 5, 2017

woodsaj Sep 5, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

woodsaj commented Sep 7, 2017

Dieterbe commented Aug 31, 2017 •

edited

Loading

woodsaj Sep 5, 2017 •

edited

Loading