-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BS: Add metrics for the keepalive messages #3151
Conversation
d32f7b0
to
8c3e535
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @karampok and @lukedirtwalker)
go/beacon_srv/internal/keepalive/handler.go, line 94 at r1 (raw file):
keepalive, ok := h.request.Message.(*ifid.IFID) if !ok { increaseReceiveErrors("unknown")
Use labels as described below in metrics.go
make this a constant.
also unknown_ifid
, otherwise it is ambiguous
use ifid = 0
, since it is a placeholder ifid
go/beacon_srv/internal/keepalive/handler.go, line 102 at r1 (raw file):
ifid, info, err := h.getIntfInfo() if err != nil { increaseReceiveErrors("unknown")
same, probably different err message. e.g. err_processing
go/beacon_srv/internal/keepalive/handler.go, line 109 at r1 (raw file):
h.startPush(ifid) if err := h.dropRevs(ifid, keepalive.OrigIfID, info.TopoInfo().ISD_AS); err != nil { increaseReceiveErrors(ifid.String())
err_db
go/beacon_srv/internal/keepalive/metrics.go, line 25 at r1 (raw file):
) //bs_keepalive_receive_errs_total
I assume these lines are here for easy grepping ?
In that case, please add a comment on top describing the purpose.
- space between
//
andbs
go/beacon_srv/internal/keepalive/metrics.go, line 31 at r1 (raw file):
const ( promNamespace = "bs_keepalive"
Namespace should be bs
and the subsystem keepalive
. see #3104 (comment)
go/beacon_srv/internal/keepalive/metrics.go, line 37 at r1 (raw file):
initOnce sync.Once outMsg *prometheus.CounterVec
instead of having outMsg
and outErr
, have one counter vector with labels that distinguish the results.
This allows for easy sums in promql
e.g. labels: ifid: {1,2,...} ,
result`= { "success", "err_payload", "err_send", "err_db"}
bs_keepalive_sent_total(ifid, result) count
bs_keepalive_received_total(ifid, result) count
go/beacon_srv/internal/keepalive/metrics.go, line 44 at r1 (raw file):
// InitMetrics initializes the metrics func InitMetrics() {
the way this is done now, you could also simply use a init
function.
there is no way of using the package without calling InitMetrics
first anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @karampok and @oncilla)
go/beacon_srv/internal/keepalive/metrics.go, line 25 at r1 (raw file):
Previously, Oncilla wrote…
I assume these lines are here for easy grepping ?
In that case, please add a comment on top describing the purpose.
- space between
//
andbs
I would maybe add them to the variable declaration. So it's a bit closer and would maybe be seen when doing refactorings.
go/beacon_srv/internal/keepalive/handler.go, line 102 at r1 (raw file): Previously, Oncilla wrote…
I have on purpose avoid having label on what error it is. |
go/beacon_srv/internal/keepalive/metrics.go, line 37 at r1 (raw file): Previously, Oncilla wrote…
Having labels with high cardinality will blow up eventually the prometheus server. And i followed the approach of node_exporter which is |
go/beacon_srv/internal/keepalive/metrics.go, line 25 at r1 (raw file): Previously, lukedirtwalker (Lukas Vogel) wrote…
I think this would add more confusion, so I remove them. In all case you can grep by |
go/beacon_srv/internal/keepalive/metrics.go, line 44 at r1 (raw file): Previously, Oncilla wrote…
I like the |
8c3e535
to
02be0ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 5 files reviewed, 7 unresolved discussions (waiting on @oncilla)
go/beacon_srv/internal/keepalive/handler.go, line 94 at r1 (raw file):
Previously, Oncilla wrote…
Use labels as described below in metrics.go
make this a constant.
alsounknown_ifid
, otherwise it is ambiguous
useifid = 0
, since it is a placeholder ifid
Done.
go/beacon_srv/internal/keepalive/handler.go, line 102 at r1 (raw file):
Previously, karampok (Konstantinos) wrote…
I have on purpose avoid having label on what error it is.
The idea is that we have an alert on a metric error that will show the errors on bs on server X are increasing. In that moment ops should go to logs to see details. What do you think?
done
go/beacon_srv/internal/keepalive/handler.go, line 109 at r1 (raw file):
Previously, Oncilla wrote…
err_db
need still to do that
go/beacon_srv/internal/keepalive/metrics.go, line 31 at r1 (raw file):
Previously, Oncilla wrote…
Namespace should be
bs
and the subsystemkeepalive
. see #3104 (comment)
Done.
85a237f
to
fbb7782
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r2.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @karampok and @oncilla)
go/beacon_srv/internal/keepalive/handler.go, line 102 at r1 (raw file):
Previously, karampok (Konstantinos) wrote…
done
ops can just do err_*
to get all error counts.
I think it is nice to a some coarse grained distinction in the errors.
Not in this particular case, but let's assume the following:
You have a handler that needs to respond with a message
and it can either fail in the database access, sending the response or some internal error
if they are 3 different errors, you can do better alerting. E.g. the severity of internal error is probably higher than db error. And the db error is probably higher than the response error.
go/beacon_srv/internal/keepalive/handler_test.go, line 166 at r2 (raw file):
} func waitTimeout(wg *sync.WaitGroup, t *testing.T) {
testing usually should be the first argument.
go/beacon_srv/internal/keepalive/sender.go, line 57 at r2 (raw file):
} var sentIfids []common.IFIDType for ifid, intf := range topo.IFInfoMap {
do l := metrics.KeepaliveLabel{IfID: ifid, Result: metrics.ErrProcess}
here
go/beacon_srv/internal/keepalive/sender.go, line 75 at r2 (raw file):
if err := s.Send(msg, ov); err != nil { logger.Error("[keepalive.Sender] Unable to send packet", "err", err) l := metrics.KeepaliveLabels{Result: metrics.ProcessErr}
drop l := ...
go/beacon_srv/internal/keepalive/sender.go, line 82 at r2 (raw file):
sentIfids = append(sentIfids, ifid) l := metrics.KeepaliveLabels{IfID: ifid, Result: metrics.ProcessErr}
l.Result = metrics.Success
go/beacon_srv/internal/metrics/keepalive.go, line 35 at r2 (raw file):
} func (l *KeepaliveLabels) values() []string {
Values
go/beacon_srv/internal/metrics/keepalive.go, line 56 at r2 (raw file):
} // Transmits returns transmit counter
period at the end.
go/beacon_srv/internal/metrics/keepalive.go, line 61 at r2 (raw file):
} // Receives returns the receive counter
period.
go/beacon_srv/internal/metrics/metrics.go, line 21 at r2 (raw file):
// Success indicates a successful result. Success string = "success"
should be = prom.Success
(you need to rebase on master first)
go/beacon_srv/internal/metrics/metrics.go, line 23 at r2 (raw file):
Success string = "success" // ProcessErr indicates an error during processing. ProcessErr string = "process_err"
Should be ErrProcess string = "process_err"
go/beacon_srv/internal/metrics/keepalive.go, line 35 at r2 (raw file): Previously, Oncilla wrote…
Why do we need to export that? |
42c7fc3
to
87e17ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 8 files reviewed, 11 unresolved discussions (waiting on @oncilla)
go/beacon_srv/internal/keepalive/handler.go, line 102 at r1 (raw file):
Previously, Oncilla wrote…
ops can just do
err_*
to get all error counts.
I think it is nice to a some coarse grained distinction in the errors.Not in this particular case, but let's assume the following:
You have a handler that needs to respond with a message
and it can either fail in the database access, sending the response or some internal errorif they are 3 different errors, you can do better alerting. E.g. the severity of internal error is probably higher than db error. And the db error is probably higher than the response error.
there is no wrong or right thing, but let's try not to pre-optimise.
go/beacon_srv/internal/keepalive/handler.go, line 109 at r1 (raw file):
Previously, karampok (Konstantinos) wrote…
need still to do that
As far as I see the only the dropRevs
, I would let the generic error message.
go/beacon_srv/internal/keepalive/handler_test.go, line 166 at r2 (raw file):
Previously, Oncilla wrote…
testing usually should be the first argument.
Done.
go/beacon_srv/internal/keepalive/sender.go, line 57 at r2 (raw file):
Previously, Oncilla wrote…
do
l := metrics.KeepaliveLabel{IfID: ifid, Result: metrics.ErrProcess}
here
Done.
go/beacon_srv/internal/keepalive/sender.go, line 75 at r2 (raw file):
Previously, Oncilla wrote…
drop
l := ...
Done.
go/beacon_srv/internal/keepalive/sender.go, line 82 at r2 (raw file):
Previously, Oncilla wrote…
l.Result = metrics.Success
Done.
go/beacon_srv/internal/metrics/keepalive.go, line 56 at r2 (raw file):
Previously, Oncilla wrote…
period at the end.
Done.
go/beacon_srv/internal/metrics/keepalive.go, line 61 at r2 (raw file):
Previously, Oncilla wrote…
period.
Done.
go/beacon_srv/internal/metrics/metrics.go, line 21 at r2 (raw file):
Previously, Oncilla wrote…
should be
= prom.Success
(you need to rebase on master first)
Done.
go/beacon_srv/internal/metrics/metrics.go, line 23 at r2 (raw file):
Previously, Oncilla wrote…
Should be
ErrProcess string = "process_err"
This should be also from prom or we prefer to keep that value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 7 files at r3.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @karampok and @oncilla)
go/beacon_srv/internal/keepalive/sender.go, line 57 at r2 (raw file):
Previously, karampok (Konstantinos) wrote…
Done.
ifid is still missing
go/beacon_srv/internal/metrics/keepalive.go, line 35 at r2 (raw file):
Previously, karampok (Konstantinos) wrote…
Why do we need to export that?
for them to be testable.
go/beacon_srv/internal/metrics/metrics.go, line 23 at r2 (raw file):
Previously, karampok (Konstantinos) wrote…
This should be also from prom or we prefer to keep that value?
hm, yeah it is probably general enough to go to prom.
You decide
* remove goconvey from keepalive tests Contributes scionproto#3104
87e17ad
to
4400d62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 9 files reviewed, 1 unresolved discussion (waiting on @oncilla)
go/beacon_srv/internal/keepalive/sender.go, line 57 at r2 (raw file):
Previously, Oncilla wrote…
ifid is still missing
I cannot get used to reviewable :)
go/beacon_srv/internal/metrics/keepalive.go, line 35 at r2 (raw file):
Previously, Oncilla wrote…
for them to be testable.
Maybe we could consider something like that
https://medium.com/@robiplus/golang-trick-export-for-test-aa16cbd7b8cd
The risk is that people might start using it and then we want to refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r4.
Reviewable status: complete! all files reviewed, all discussions resolved
This PR works the first bullet on #3104
This change is