Use deterministic method to distribute across fuzziness window #559

replay · 2017-03-07T18:27:02Z

We're currently relying on randomness, which makes this function hard to test. It would be better if we could reach distribution across the fuzziness window without involving randomness because then we could unit test if everything works as expected:
https://github.com/raintank/metrictank/blob/7cffa2d3d8eaaa2d59b07c149f5bae6deb674645/idx/cassandra/cassandra.go#L230

woodsaj · 2017-03-07T18:47:39Z

The reason behind the randomness was to ensure we didnt try and update all metricDefs at the same time.
So we could just use a hash of the metricId as an offset value.

eg

updateInterval := 21600
if time.Unix() > (def.LastUpdate + ( updateInterval - (def.LastUpdate % updateInterval)) + (hash(def.Id) % updateInterval) {
 // updated needed
}

assuming updateInterval = 21600 and the hash of our id is 1234

the metric is created at 1488825675, lastUpdate is set.
next update will be at 1488825675 + (21600 - (1488825675 mod 21600 )) + (1234 mod 21600)
which is 5.65hours later at 1488846034
but every update after that will be 6hours after the previous update.
1488846034 + (21600 - (1488846034 mod 21600)) + (1234 mod 21600)

for generating the hash, i recommend https://github.com/huichen/murmur as it is fast and simple to use.

Dieterbe · 2017-03-07T19:45:58Z

can't we simplify this formula a lot?
maybe something like:

if time.Now().Unix() - def.LastUpdate > hash(def.Id) % updateInterval {
    // update

woodsaj · 2017-03-07T21:06:50Z

if time.Now().Unix() - def.LastUpdate > hash(def.Id) % updateInterval

that wont work. That just updates every hash(def.id) % update interval, which could be a really small number.

Dieterbe · 2017-03-09T13:42:47Z

replay · 2017-03-17T15:52:59Z

@awoods in your formula i don't understand why this:

(def.LastUpdate + ( updateInterval - (def.LastUpdate % updateInterval)) + (hash(def.Id) % updateInterval))

is better than:

(def.LastUpdate + (def.LastUpdate % updateInterval) + (hash(def.Id) % updateInterval))

Wouldn't both result in an equal distribution (just upside down), but the second is more simple?

woodsaj · 2017-03-27T02:41:52Z

closed by #574

replay self-assigned this Mar 7, 2017

Dieterbe mentioned this issue Mar 8, 2017

when writing old data, we hammer cassandra index with metricdef updates #561

Closed

replay added a commit that referenced this issue Mar 17, 2017

implement distribution according to suggestion on issue #559

dcb594a

replay mentioned this issue Mar 17, 2017

559 deterministic update distribution #569

Closed

replay added a commit that referenced this issue Mar 20, 2017

implement distribution according to suggestion on issue #559

470a8a9

woodsaj closed this as completed Mar 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use deterministic method to distribute across fuzziness window #559

Use deterministic method to distribute across fuzziness window #559

replay commented Mar 7, 2017 •

edited

Loading

woodsaj commented Mar 7, 2017

Dieterbe commented Mar 7, 2017

woodsaj commented Mar 7, 2017

Dieterbe commented Mar 9, 2017

replay commented Mar 17, 2017

woodsaj commented Mar 27, 2017

Use deterministic method to distribute across fuzziness window #559

Use deterministic method to distribute across fuzziness window #559

Comments

replay commented Mar 7, 2017 • edited Loading

woodsaj commented Mar 7, 2017

Dieterbe commented Mar 7, 2017

woodsaj commented Mar 7, 2017

Dieterbe commented Mar 9, 2017

replay commented Mar 17, 2017

woodsaj commented Mar 27, 2017

replay commented Mar 7, 2017 •

edited

Loading