Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Query] Partial response doesn't work if one of stores returned timeout #835

Closed
d-ulyanov opened this issue Feb 11, 2019 · 8 comments
Closed

Comments

@d-ulyanov
Copy link
Contributor

Hi, Thanos team!

What happened:
Thanos Query returns error (context deadline exceeded) if one of requested stores failed with timeout.
Partial response parameter is enabled.

Expected behaviour:
Thanos returns part of successfully fetched data and warning that one of stores failed.

In the same time there is an other case: if one of stores is not available at all (process is not running e.g.) Thanos Query returns data partially and warning.
It seems that behaviour should be the same in both cases.

@bwplotka what do you think?

Thanos v0.2.1

@bwplotka
Copy link
Member

Well, is it because the request to one of the stor actually took too long to wait, and either your client or Thanos Query server timeout killed the request? (: So it's not partial erorr - whole request took too long, even though the root cause was one slow store.

So now is the question what would you expect in this case? (:

@d-ulyanov
Copy link
Contributor Author

d-ulyanov commented Feb 12, 2019

Yes, In this case Thanos Query killed request, but some stores already returned data.
I think its better to support partial degradation and return some data and warning that some of stores timed out.

I think this case should not be different with behaviour if one of stores is down at all

@d-ulyanov d-ulyanov reopened this Feb 12, 2019
@d-ulyanov
Copy link
Contributor Author

d-ulyanov commented Feb 13, 2019

@bwplotka any ideas?)
One of possible solutions is to add additional parameter like store.read_timeout and use it for closing requests to particular stores.

@bwplotka
Copy link
Member

bwplotka commented Feb 25, 2019

Hm.. yes, as long as store.read_timeout is smaller the client timeout - that would make sense. But overall it's a tradeoff between availability & conistency. Partial response can indeed surprise you, especially when you use 3rd paty tools like Grafana where warning is not even there.

I am fine to accept PR with your idea though to have this internal timeout to be configurable for Thanos Querier 👍

@d-ulyanov
Copy link
Contributor Author

@bwplotka Hey! Here it is :)
#895

@bwplotka
Copy link
Member

Hm.. so @povilasv fixed this issue IMO with this: #928

@bwplotka
Copy link
Member

so what #895 is adding? ;p

@d-ulyanov
Copy link
Contributor Author

Moving discussion to #1453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants