Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to handle scalar responses #82

Closed
jacksontj opened this issue Jul 10, 2018 · 20 comments
Closed

Unable to handle scalar responses #82

jacksontj opened this issue Jul 10, 2018 · 20 comments
Assignees
Labels
object-proxy affects the object proxy or proxy+cache prometheus Issue affects Prometheus

Comments

@jacksontj
Copy link
Contributor

If I send a simple scalar query such as /api/v1/query?query=5 trickster now returns errors. I bisected this and it seems that the error was introduced in ec4eff3 . The basic problem here is that trickster assumes the response type from prometheus as a vector, when in reality it can be (1) scalar (2) vector or (3) string. The way this worked before was that the unmarshal error was ignored and the response handed back (I have a patch that puts it back to that behavior.

From what I see that looks like caching won't work on those scalar values -- as it wasn't able to unmarshal. Alternatively we could change the model to use model.Value and then add some type-switching. But since that is a potentially large change I figured I'd get others' input first.

@SuperQ
Copy link
Collaborator

SuperQ commented Jul 10, 2018

The main reason we changed to more strict handling was that there were a bunch of cases where talking to Prometheus is interrupted (server restart, crash, etc) and Trickster would cache bad data.

We should probably type switch, like you said.

/cc @juliusv

@jacksontj
Copy link
Contributor Author

If we want to move to type switching I'd suggest consolidating the response and envelope/data to a single set of structs, something like:

type PrometheusResponse struct {
	Status string               `json:"status"`
	Data   PrometheusVectorData `json:"data"`
}

type PrometheusDataEnvelope struct {
	ResultType string       `json:"resultType"`
	Result     model.Value `json:"result"`
}

Then we can change the callers to handle the switching. If you are okay with that plan I'm okay with doing the implementation, I just want to make sure thats something we want before spending the time.

@jacksontj
Copy link
Contributor Author

Something along the lines of master...jacksontj:upstream_type

@jacksontj
Copy link
Contributor Author

Sadly the upstream structs still aren't public, so if we want to attempt to have less code I do have library methods for this in the promxy project, just not sure what feelings are for dependency-- so this approach in my branch might be sufficient.

@juliusv
Copy link
Collaborator

juliusv commented Jul 11, 2018

I haven't looked at anything here yet as I'm still half asleep from a boat trip, but FWIW this is how our API client library does it: https://github.com/prometheus/client_golang/blob/master/api/prometheus/v1/api.go#L160-L200

@SuperQ
Copy link
Collaborator

SuperQ commented Jul 11, 2018

Shouldn't we just use the client_golang API client? 😕

@juliusv
Copy link
Collaborator

juliusv commented Jul 11, 2018

Maybe, yeah :)

@jacksontj
Copy link
Contributor Author

yea, then it'll just return model.Value -- and we can just merge with those. I'll take a look into it more once I get some more time :)

@jacksontj
Copy link
Contributor Author

Although as soon as I started reading, we'll need to make changes to that client, because it only supports sending GET requests as of now.

@juliusv
Copy link
Collaborator

juliusv commented Jul 11, 2018

@jacksontj You at least have my support in updating the API client library to support POST requests (and other maybe still unsupported API features) :)

jacksontj added a commit to jacksontj/trickster that referenced this issue Jul 11, 2018
@jacksontj
Copy link
Contributor Author

To keep this issue up-to-date, I've created prometheus/client_golang#428 upstream on the client to get POST support in there, then we should be ready to go. It seems that this is a new API (not on a released version yet), so we may need to hold off on this "good" fix for a while depending on how the upstream conversation goes.

In light of that (it not being a quick fix) I've opened up #83 to get a workaround in there. That re-introduces the previous behavior (skipping the error and passing the raw response) which was what it was doing before-- so this fixes the regression, but by no means is the long-term fix.

@jranson jranson self-assigned this Apr 1, 2019
@jranson
Copy link
Member

jranson commented Apr 1, 2019

@jacksontj The proxy re-work in 1.0 addresses this, if you want to give it a go. Any Prometheus request that is not known to be a timeseries (e.g., not a query_range) routes through the proxy or the object proxy+cache (depending up on the path; prom requests to /query, as in the example provided, will go through the proxy+cache with a ttl of object_ttl_secs), and is handled generically and an HTTP object. The hang up on this issue w/ 0.1x is that everything coming down from Prom is being unmarshaled, even if it doesn't need to be. @SuperQ & @juliusv Re: the Prom Client, now that we are supporting additional upstream types like InfluxDB, I want to stick to using a generic http client for the proxy whenever possible and only use custom http clients when absolutely necessary.

@jranson jranson added prometheus Issue affects Prometheus object-proxy affects the object proxy or proxy+cache labels Apr 1, 2019
@jacksontj
Copy link
Contributor Author

jacksontj commented Apr 4, 2019

Any Prometheus request that is not known to be a timeseries (e.g., not a query_range) routes through the proxy or the object proxy+cache

To be clear, scalar responses can come from /query or /query_range. As a contrived example, if the promql query is 3 it will return 3-- which is a scalar (regardless of endpoint).

I want to stick to using a generic http client for the proxy whenever possible and only use custom http clients when absolutely necessary.

this will make caching responses difficult, since each timeseries DB has its own format/rules/etc. which will effectively require un-marshaling each of them and re-constructing.

The proxy re-work in 1.0 addresses this, if you want to give it a go.

I'll add it to my list of things to take a look at, unfortunately I've been slammed as of late-- so that will likely take some time for me to get around to, but its on the list!

When I do a quick check of the initial reported query I see the following in the logs:

time=2019-04-04T05:31:14.143123322Z app=trickster caller=proxy/deltaproxycache.go:180 level=error event="proxy object unmarshaling failed" body="{\"status\":\"success\",\"data\":{\"resultType\":\"scalar\",\"result\":[1554355874.142,\"3\"]}}"

Which seems that its just the same behavior as before; namely that it attempts to unmarshal, fails, then returns the un-cached result to the client.

@jranson
Copy link
Member

jranson commented Apr 4, 2019

@jacksontj Are you able to provide the URL to reproduce the log entry you produced? I am unable to get a scalar from a query_range in any of my attempts. Thanks!

Sorry I was not clear on the generic HTTP Proxy - we are implementing custom url/query parsing and response marshaling, per-TSDB type, already (check out that sweet influxdb capability!). What I am trying to ensure is that the actual upstream http requests over-the-wire are handled generically in the proxy, such that a custom client is not making the actual HTTP calls. That way we can enforce things like http headers and what their behavior is, in a single place. This is especially true for Prom where client_golang has parity issues with the API (e..g, not currently supporting POST).

Separately, however, if the client_golang could properly handle both unmarshaling of upstream requests, and marshaling for the downstream response, I'd totally use it. When I last looked at that in Jan '18 when i first developed Trickster, marshaling had issues like capitalization of field names in the json output that made it incompatible with Grafana. Perhaps that has changed and we can revisit? On the unmarshaling side, a quick peek at https://github.com/prometheus/client_golang/blob/master/api/prometheus/v1/api.go#L256 shows that the queryResult type is unexported, and can only be accessed via the client in Query(), QueryRange(), etc. So the golang_client cannot currently decouple object unmarshaling from its originating http request.

@jacksontj
Copy link
Contributor Author

For the log line, I simply used /api/v1/query?query=5 (presumably the same query on a query_range should get the same behavior).

Separately, however, if the client_golang could properly handle both unmarshaling of upstream requests, and marshaling for the downstream response, I'd totally use it.

If this is of interest, I do this marshal/unmarshal in promxy and because of that I already maintain libraries to do the server-side (unfortunately a copy/paste from upstream, but they refuse to make the structs etc. public -- prometheus/prometheus#3615).

@jranson
Copy link
Member

jranson commented Apr 5, 2019

Strange, i tried these (directly against prom):

$ curl http://0:9090/api/v1/query_range?query=5
{"status":"error","errorType":"bad_data","error":"cannot parse \"\" to a valid timestamp"}

$ curl 'http://0:9090/api/v1/query_range?query=5&start=0&end=1'
{"status":"error","errorType":"bad_data","error":"cannot parse \"\" to a valid duration"}

$ curl 'http://0:9090/api/v1/query_range?query=5&start=0&end=1&step=1'
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{},"values":[[0,"5"],[1,"5"]]}]}}

So i am not able to get a scalar from a query_range. Will take a look at all of those options and see if any work better. It does not really make sense for all of us to be rewriting the same structs for the http api body. My inclination would be for the client_golang to provide them if possible, perhaps a PR to that project is in order?

@jacksontj
Copy link
Contributor Author

I had tried this before in prometheus/prometheus#3615 upstream, but I can ask again in client_golang (prometheus/client_golang#571)

@jacksontj
Copy link
Contributor Author

@jranson Also, if you want to get a query_range to return a scalar:

http://localhost:9090/api/v1/query_range?query=5&start=1557282803.584&end=1557286403.584&step=14&_=1557286398170

I simply went to the ui and put 5 in the query then changed to the graph tab.

@jacksontj
Copy link
Contributor Author

@jranson mind chiming in on prometheus/client_golang#571

@jranson
Copy link
Member

jranson commented Jan 17, 2020

I believe this one is completely resolved now, please reopen if 1.0 Beta10 does not resolve. All of the provided URLs in this issue are confirmed to be compatible with Trickster.

@jranson jranson closed this as completed Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-proxy affects the object proxy or proxy+cache prometheus Issue affects Prometheus
Projects
None yet
Development

No branches or pull requests

4 participants