Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider rewriting latency metrics #72

Closed
FZambia opened this issue Feb 20, 2016 · 4 comments
Closed

Consider rewriting latency metrics #72

FZambia opened this issue Feb 20, 2016 · 4 comments

Comments

@FZambia
Copy link
Member

FZambia commented Feb 20, 2016

In Centrifugo 1.4.0 we deprecated timers. Here is an issue to consider making them right and put back in future releases.

This includes: client request time, API request time.

As @banks said in #68:

If we really want them we should use something like https://www.godoc.org/github.com/vulcand/vulcan/metrics#RollingHistogram which uses a robust measurement technique (HDRHistrogram) and supports rolling window of these to have an accurate distribution of the last N minutes etc. If we did that, I'd recommend having something like 15 buckets of 1 min, with each histogram having range 100µs to 10 mins (600mill µs) with 3 significant figures. That gives you very high definition for last 15 mins in a bit over 1MB per timer (on my mac). That seems very reasonable if we have ~20 timers on a modern machine. We could make it configurable if we want to support people on super low memory VPS or something. Then expose in the results from stats at least:
mean
50%ile
99%ile
99.99%ile
max
Looks like it's not threadsafe though so would need to put a mutex around it or send results on a lightly buffered channel and have it update in single goroutine

@FZambia
Copy link
Member Author

FZambia commented Jul 28, 2016

done in #97 , will be released with next Centrifugo version

@FZambia
Copy link
Member Author

FZambia commented Sep 12, 2016

I am not very happy with how metrics currently implemented. This is current NodeInfo struct:

type NodeInfo struct {
    mu         sync.RWMutex
    UID        string `json:"uid"`
    Name       string `json:"name"`
    Goroutines int    `json:"num_goroutine"`
    Clients    int    `json:"num_clients"`
    Unique     int    `json:"num_unique_clients"`
    Channels   int    `json:"num_channels"`
    Started    int64  `json:"started_at"`
    Gomaxprocs int    `json:"gomaxprocs"`
    NumCPU     int    `json:"num_cpu"`
    metrics
    updated int64
}

It's located in metrics.go but actually only part of it are metrics... Something like this would be much better:

type NodeInfo struct {
    mu       sync.RWMutex
    UID      string `json:"uid"`
    Name     string `json:"name"`
    Metrics  map[string]int64 `json:"metrics"`
    updated  int64
}

This will allow to abstract various counters and operate by semantic names in code rather than concrete struct fields (i.e. something like Histogram registry but for counters). Also I think it's a good idea to include latencies into that metrics map as flat keys without extra latencies level as it now. Also I don't think we will need floats for metrics. This will allow to keep metrics specific code into separate package.

This is a breaking change unfortunately:(

@FZambia
Copy link
Member Author

FZambia commented Sep 20, 2016

Current latencies implementation is like quantum measurement:) Asking for metrics affects latency metrics significantly. So per specific API command histograms could really make sense. But this is only possible to insert after net/http and request decoding work already done. Will try to investigate this more.

@FZambia
Copy link
Member Author

FZambia commented Dec 17, 2016

done in v1.6.0

@FZambia FZambia closed this as completed Dec 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant