Skip to content
This repository has been archived by the owner on Aug 23, 2023. It is now read-only.

cassandra errors matches response.Error interface but not semantics (not http codes) #1678

Closed
Dieterbe opened this issue Feb 17, 2020 · 1 comment
Milestone

Comments

@Dieterbe
Copy link
Contributor

in api/response we often call WrapError to "decorate" any errors we may have and shoehorn them into our response.ErrorResp format which has a code and error message. the code is assumed to be a HTTP response code.
We do this by checking if our error implements this type (in WrapError):

type Error interface {
        Code() int
        Error() string
}

However, the cassandra storage plugin has a Search method which may return errors that originate out of the gocql package.
cassandra has its own list of error codes. for a non definitive/official list, see https://stackoverflow.com/questions/48304330/list-of-cassandra-error-codes
these codes are also defined in gocql (see top of errors.go) and parsed (see framer.parseErrorFrame in frame.go) which will return "enhanced" types such as RequestErrWriteTimeout . but each of these enhanced types embeds this type:

type errorFrame struct {
	frameHeader

	code    int
	message string
}
func (e errorFrame) Code() int {
	return e.code
}
func (e errorFrame) Message() string {
	return e.message
func (e errorFrame) Error() string {
	return e.Message()
}
(...)

Thus, the gocql library implements error types that implement our Error interface by signature, but not by semantics. codes such as timeout. See this example that comes out of a customer's prod environment:

Feb 14 17:16:48 metrictank02 metrictank: 2020-02-14 17:16:48.500 [ERROR] HTTP getData() Operation timed out - received only 0 responses.
Feb 14 17:16:48 metrictank02 metrictank: 2020-02-14 17:16:48.500 [WARNING] Encountered invalid HTTP status code 4608, printing stack
Feb 14 17:16:48 metrictank02 metrictank: goroutine 2286701136 [running]:
Feb 14 17:16:48 metrictank02 metrictank: runtime/debug.Stack(0x8bcbaf, 0xc000130240, 0xfdaa70)
Feb 14 17:16:48 metrictank02 metrictank: /usr/local/go/src/runtime/debug/stack.go:24 +0xa7
Feb 14 17:16:48 metrictank02 metrictank: runtime/debug.PrintStack()
Feb 14 17:16:48 metrictank02 metrictank: /usr/local/go/src/runtime/debug/stack.go:16 +0x22
Feb 14 17:16:48 metrictank02 metrictank: github.com/grafana/metrictank/api/response.(*ErrorResp).ValidateAndFixCode(0xc08711f140)
Feb 14 17:16:48 metrictank02 metrictank: /go/src/github.com/grafana/metrictank/api/response/error.go:97 +0xa7
Feb 14 17:16:48 metrictank02 metrictank: github.com/grafana/metrictank/api/response.WrapError(0x7f3392ba8718, 0xc10905bc80, 0xc114ce7700)
Feb 14 17:16:48 metrictank02 metrictank: /go/src/github.com/grafana/metrictank/api/response/error.go:34 +0xe0
Feb 14 17:16:48 metrictank02 metrictank: github.com/grafana/metrictank/api.(*Server).getData(0xc0000aa240, 0xc08a333be0, 0xc00020b200, 0x3, 0x4)
Feb 14 17:16:48 metrictank02 metrictank: /go/src/github.com/grafana/metrictank/api/cluster.go:278 +0x223
Feb 14 17:16:48 metrictank02 metrictank: github.com/grafana/metrictank/api.(*Server).getData-fm(0xc08a333be0, 0xc00020b200, 0x3, 0x4)
Feb 14 17:16:48 metrictank02 metrictank: /go/src/github.com/grafana/metrictank/api/routes.go:39 +0x52
(...)

this is the cassandra read timeout with code 4608

@shanson7
Copy link
Collaborator

See also #984 and #987 for more history of this issue

fitzoh added a commit that referenced this issue Feb 22, 2020
This was leading to nonsense HTTP status codes as seen in #1678.
@Dieterbe Dieterbe closed this as completed Mar 9, 2020
@Dieterbe Dieterbe added this to the sprint-8 milestone Mar 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants