-
Notifications
You must be signed in to change notification settings - Fork 105
Fix crashing when non-configured consolidation function used #839
Conversation
mdata/aggmetric.go
Outdated
return a.sumMetric.Get(from, to) | ||
agg = a.sumMetric | ||
default: | ||
panic(fmt.Sprintf("AggMetric.GetAggregated(): unknown consolidator %q", consolidator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this case be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily. While it can be handled below, it is a valid (and separate) issue that unexpected values got passed in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I wasn't thinking about agg
getting set to a nil value above. In any case, I'd be very keen to see the panic
calls converted to error returns unless there's a compelling reason not to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. I wasn't sure if panics got treated differently than errors, but (aside from runtime error panics) they both get turned into 500
responses. Ideally, a 400
would return from some of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: the "automatic turning panics into http 500 responses" only works when the recover happens (the macaron middleware) in the same goroutine as the panic. in this case not, because we spawn new goroutines for all s.getTargetsLocal calls. (there's one or two other open tickets describing running into this.. unpleasantery. needless to say, significant problem that we need to resolve)
regarding when (not) to panic, we have a policy for that now : https://github.com/grafana/metrictank/blob/master/docs/CONTRIBUTING#L19
if the articulation needs revising, please do tell :)
for this PR, you don't need to worry about cleaning up panics that were already there, we can do that later. (though you're welcome to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched the panics to errors, but I'm not sure at what granularity we need the metrics to be posted. There could be a metric per, or just a generic "GetAggregated" error. I'm also not sure of the naming convention, since the memory index does "idx.memory" and ccache does "cache.metric"
I could have sworn we used to have something that would validate the request early on in the request and avoid that we would try to request an aggregation we know we don't have. not sure where it went, as it doesn't seem to exist anymore...
those metrics are really only for cases where we detect the MT source code doing something unexpected. |
i added commit 1621a55 which should clarify the error handling & recovered_errors metrics. github didn't let me create a new PR because this one already exists (never mind that i couldn't push to it). |
Fixes #835