-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP Reset breaks connection without a retry #1020
Comments
This is the related code which decides, whether a request is idempotent https://github.com/golang/go/blob/eca0d44cec58951fb716e540dcc21c0f527686d5/src/net/http/request.go#L1420-L1433 |
Address prometheus#1020. Signed-off-by: Tomáš Dohnálek <dohnto@gmail.com>
* Make Query requests idempotent Address #1020. Signed-off-by: Tomáš Dohnálek <dohnto@gmail.com> * Use empty header Signed-off-by: Tomáš Dohnálek <dohnto@gmail.com> * Document issue with original documentation Signed-off-by: Tomáš Dohnálek <dohnto@gmail.com>
@kakkoyun Give me some time to verify this in production. I am trying to communicate with our teams, whether it would be possible to include this library from master branch rather than a release. |
I believe the fix is working. We deployed a custom build of the Grafana with The @kakkoyun Is there any timeframe to create a release so we could utilize it in production using a regular build process? Thanks |
There will be a release soon, we need to take care of several issues first. Unfortunately, there's no fixed release cadence for the project. |
Use latest released version of client_golang library with prometheus/client_golang#1020 included. This affected Grafana quering a Prometheus datasource behind a flaky load balancer (reseting long running connections) and resulted in user visible error in Grafana. With this change included, Grafana will retry the query.
@kakkoyun - any perspective on when this will be released? I see it was not included in the most recent prometheus/client_golang release (Nov 2). |
@skeilson There will be a release. Hopefully this week. |
@kakkoyun any chance of a timeframe for this to be released? I see v.1.14.0 was released on Nov 8th and it does not seem to include this PR. |
Do we have an ETA on the next release where the fix is included? Thanks |
My setup
We run a Grafana which queries Prometheus. Inbetween Grafana and Prometheus, there is a Google Cloud Load Balancer L7 which terminates SSL:
Issue
Grafana user opens a dashboard and Grafana opens a long running TCP connection. Rarely we observe a TCP Reset sent by the Load Balancer, which Grafana displays to user as 5xx error. The request is not retried.
Background
Grafana utilizes
client_golang
library and in particular itshttpapi.QueryRange
method which relies onapiClientImpl.DoGetFallback
. This method tries to perform the query using POST request and in case of specific issues (405) it fallbacks to GET.Prometheus' client prefers to use POST over GET, but this causes inconsistencies as POST is not consider idempotent.
Quote from
net/http
More information about this retry feature can be found in https://go-review.googlesource.com/c/go/+/3210/9//COMMIT_MSG .
I found it really confusing, that when the implementation of
apiClientImpl.DoGetFallback
fallbacks to GET request it might recover from some connectivity issues while with default POST request, it will just error.Expectations
For me ideally I would prefer making the POST request idempotent (as I believe they are; using one of the
Idempotency-*
header?). I am not sure if this would be acceptable as this changes the behavior of the client a bit.Reproducer
None yet.
The text was updated successfully, but these errors were encountered: