Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

Fix crashing when non-configured consolidation function used #839

Merged
merged 2 commits into from
Jan 31, 2018

Conversation

shanson7
Copy link
Collaborator

Fixes #835

return a.sumMetric.Get(from, to)
agg = a.sumMetric
default:
panic(fmt.Sprintf("AggMetric.GetAggregated(): unknown consolidator %q", consolidator))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this case be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. While it can be handled below, it is a valid (and separate) issue that unexpected values got passed in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I wasn't thinking about agg getting set to a nil value above. In any case, I'd be very keen to see the panic calls converted to error returns unless there's a compelling reason not to.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I wasn't sure if panics got treated differently than errors, but (aside from runtime error panics) they both get turned into 500 responses. Ideally, a 400 would return from some of these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: the "automatic turning panics into http 500 responses" only works when the recover happens (the macaron middleware) in the same goroutine as the panic. in this case not, because we spawn new goroutines for all s.getTargetsLocal calls. (there's one or two other open tickets describing running into this.. unpleasantery. needless to say, significant problem that we need to resolve)

regarding when (not) to panic, we have a policy for that now : https://github.com/grafana/metrictank/blob/master/docs/CONTRIBUTING#L19
if the articulation needs revising, please do tell :)

for this PR, you don't need to worry about cleaning up panics that were already there, we can do that later. (though you're welcome to)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched the panics to errors, but I'm not sure at what granularity we need the metrics to be posted. There could be a metric per, or just a generic "GetAggregated" error. I'm also not sure of the naming convention, since the memory index does "idx.memory" and ccache does "cache.metric"

@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 30, 2018

I could have sworn we used to have something that would validate the request early on in the request and avoid that we would try to request an aggregation we know we don't have. not sure where it went, as it doesn't seem to exist anymore...

but I'm not sure at what granularity we need the metrics to be posted

those metrics are really only for cases where we detect the MT source code doing something unexpected.
for the "aggregator is not enabled" case, since at this point we don't seem to have the guard mentioned above, then hitting that condition is perfectly normal and not worth incrementing metrics for. so don't worry about that aspect
for the others, maybe i'll just write the code tomorrow.

@Dieterbe Dieterbe merged commit 39274c0 into grafana:master Jan 31, 2018
@Dieterbe
Copy link
Contributor

Dieterbe commented Jan 31, 2018

i added commit 1621a55 which should clarify the error handling & recovered_errors metrics. github didn't let me create a new PR because this one already exists (never mind that i couldn't push to it).
anyway i merged it to master.
there's no clear standard yet on the formatting of those errors. i try to use some of the package/model hierarchy for the first 2 fields, and the type of error for the 3rd.

@Aergonus Aergonus deleted the no_panic_for_cons branch May 7, 2018 22:30
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants