Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Use deterministic method to distribute across fuzziness window #559

Closed
replay opened this issue Mar 7, 2017 · 6 comments
Closed

Use deterministic method to distribute across fuzziness window #559

replay opened this issue Mar 7, 2017 · 6 comments
Assignees

Comments

@replay
Copy link
Contributor

replay commented Mar 7, 2017

We're currently relying on randomness, which makes this function hard to test. It would be better if we could reach distribution across the fuzziness window without involving randomness because then we could unit test if everything works as expected:
https://github.com/raintank/metrictank/blob/7cffa2d3d8eaaa2d59b07c149f5bae6deb674645/idx/cassandra/cassandra.go#L230

@replay replay self-assigned this Mar 7, 2017
@woodsaj
Copy link
Member

woodsaj commented Mar 7, 2017

The reason behind the randomness was to ensure we didnt try and update all metricDefs at the same time.
So we could just use a hash of the metricId as an offset value.

eg

updateInterval := 21600
if time.Unix() > (def.LastUpdate + ( updateInterval - (def.LastUpdate % updateInterval)) + (hash(def.Id) % updateInterval) {
 // updated needed
}

assuming updateInterval = 21600 and the hash of our id is 1234

the metric is created at 1488825675, lastUpdate is set.
next update will be at 1488825675 + (21600 - (1488825675 mod 21600 )) + (1234 mod 21600)
which is 5.65hours later at 1488846034
but every update after that will be 6hours after the previous update.
1488846034 + (21600 - (1488846034 mod 21600)) + (1234 mod 21600)

for generating the hash, i recommend https://github.com/huichen/murmur as it is fast and simple to use.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 7, 2017

can't we simplify this formula a lot?
maybe something like:

if time.Now().Unix() - def.LastUpdate > hash(def.Id) % updateInterval {
    // update

@woodsaj
Copy link
Member

woodsaj commented Mar 7, 2017

if time.Now().Unix() - def.LastUpdate > hash(def.Id) % updateInterval

that wont work. That just updates every hash(def.id) % update interval, which could be a really small number.

@Dieterbe
Copy link
Contributor

Dieterbe commented Mar 9, 2017

hammer-smiley

@replay
Copy link
Contributor Author

replay commented Mar 17, 2017

@awoods in your formula i don't understand why this:

(def.LastUpdate + ( updateInterval - (def.LastUpdate % updateInterval)) + (hash(def.Id) % updateInterval))

is better than:

(def.LastUpdate + (def.LastUpdate % updateInterval) + (hash(def.Id) % updateInterval))

Wouldn't both result in an equal distribution (just upside down), but the second is more simple?

@woodsaj
Copy link
Member

woodsaj commented Mar 27, 2017

closed by #574

@woodsaj woodsaj closed this as completed Mar 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants