Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API errors are incorrectly reported as peer errors #244

Closed
BRMatt opened this issue Apr 6, 2021 · 1 comment · Fixed by #247
Closed

API errors are incorrectly reported as peer errors #244

BRMatt opened this issue Apr 6, 2021 · 1 comment · Fixed by #247

Comments

@BRMatt
Copy link
Contributor

BRMatt commented Apr 6, 2021

It seems refinery checks the API host option in outgoing events to see if the message is destined for a peer, or the honeycomb API:

if honeycombAPI == apiHost {
// if the API host matches the configured honeycomb API,
// count it as an API error
d.Metrics.Increment(d.Name + counterResponseErrorsAPI)
} else {
// otherwise, it's probably a peer error
d.Metrics.Increment(d.Name + counterResponseErrorsPeer)
}

During a conversation in the pollinators slack we noticed that log lines emitted by refinery for some messages showed that the event's api_host option was empty, even though the event was being sent to honeycomb:

Apr 01 12:13:01 production-refinery-i-00c73eea958d0b46c refinery[11758]: time="2021-04-01T12:13:01Z" level=error msg="non-20x response when sending event" api_host= dataset= error="Post \"https://api.honeycomb.io/1/batch/production.traces\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" event_type= status_code=0 target=

This caused timeout errors in the upstream honeycomb API to be treated as peer errors:

Image 2021-04-06 at 10 36 30 am

This is a bit misleading as it causes the operator to believe there's a problem with their cluster, when the issue is really coming from the honeycomb API.

Given metrics are already prefixed with upstream_ or peer_, I'm wondering whether we really need the _api and _peer suffixes, as it's a bit confusing. e.g. How should upstream_response_errors_peer differ from peer_response_errors_peer?

Would it be possible to remove this suffix so that operators can determine whether a metric is related to the honeycomb API, or the peer, by its prefix?

@vreynolds
Copy link
Contributor

Thanks for letting us know, Matt! 👋 We'll take a look as soon as we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants