Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store: Fix panic on nil resp from response heap #5717

Closed
wants to merge 2 commits into from

Conversation

matej-g
Copy link
Collaborator

@matej-g matej-g commented Sep 23, 2022

Signed-off-by: Matej Gera matejgera@gmail.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

As part of refactoring MultiTSDB to use proxy methods (#5552), I noticed that lazy evaluation strategy is causing the receiver to panic in some cases. Upon closer looker, I realized this is happening when the resp from heap is nil (which can happen if there are no responses, seems like this can happen if e.g. I query for non-existing metric). We still send the nil resp further down the line, causing a panic. This fix checks for nil and skips a resp if this is true.

Verification

Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Matej Gera <matejgera@gmail.com>
@fpetkovski
Copy link
Contributor

Hmm, shouldn’t next return false in this case?

@matej-g
Copy link
Collaborator Author

matej-g commented Sep 26, 2022

Hmm, shouldn’t next return false in this case?

So I haven't dived too much into that new heap code, I assumed this is a valid case since we can return nil from the At() method on the dedup heap (cc @GiedriusS).

@GiedriusS
Copy link
Member

Hmm, shouldn’t next return false in this case?

So I haven't dived too much into that new heap code, I assumed this is a valid case since we can return nil from the At() method on the dedup heap (cc @GiedriusS).

Maybe you could point to exact tests which are failing because of this? It would be interesting to check this out with a debugger.

@matej-g
Copy link
Collaborator Author

matej-g commented Sep 28, 2022

You can pull this branch and just change evaluation strategy at here.

For example, this rule E2E test fails as receiver will panic. But I think you can easily trigger this also with just setting up query+receiver and querying for a non-existing metric.

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look at this and indeed nil needs to be handled but I think at a different place. I think we could either improve lazyRespSet's Empty() to wait for at least one response to be able to determine whether it is empty or we could handle nil's in Less(). I think the latter isn't enough because we might build an invalid heap? In other words, h.Min().rs might point to some respSet that isn't actually the "smallest" one with lazy proxying if my thinking is correct 🤔

GiedriusS added a commit to vinted/thanos that referenced this pull request Sep 30, 2022
With lazy proxying we need to wait for at least one response before we
can build a heap properly.

Closes thanos-io#5717
Fixes thanos-io#5552.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@matej-g
Copy link
Collaborator Author

matej-g commented Sep 30, 2022

Not needed, will be addressed by #5746

@matej-g matej-g closed this Sep 30, 2022
GiedriusS added a commit that referenced this pull request Oct 2, 2022
* store: fix nil panic in proxy heap

With lazy proxying we need to wait for at least one response before we
can build a heap properly.

Closes #5717
Fixes #5552.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: panic if At() called without Next()

This is pretty much a bug if this ever happens so complain loudly.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: adjust test after recent changes

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
matej-g pushed a commit to matej-g/thanos that referenced this pull request Oct 5, 2022
With lazy proxying we need to wait for at least one response before we
can build a heap properly.

Closes thanos-io#5717
Fixes thanos-io#5552.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
matej-g added a commit that referenced this pull request Oct 6, 2022
* Introduce store type

- This commit introduces StoreType and end replaces / expands the Addr() method to be StoreInfo() method.
This method returns information on whether the store is local (case of MultiTSDB) or remote. If store is remote, we also return it's address in the StoreInfo() method
- Adjust relevant parts of code in proxy
- Adjust proxy tests

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Adjust existing proxy.Client implementations

- Adjust (and slightly refactor) endpoint set client
- Adjust test implementations

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Introduce local client implementation of proxy.Client interface

- Introduces new type in MultiTSDB
- Introduce method to obtain client for a tenant. This client leverages the server-as-client store client implementation.
- Adjust tests

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Add receive to interactive test

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Remove old MultiTSDB code

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Fix imports

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Fix formatting

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Remove forgotten dead code

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Remove store type and adjust Addr method instead

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Disable timeout in receiver proxy

Signed-off-by: Matej Gera <matejgera@gmail.com>

* store: fix nil panic in proxy heap

With lazy proxying we need to wait for at least one response before we
can build a heap properly.

Closes #5717
Fixes #5552.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Use lazy retrieval strategy for receive proxy

Signed-off-by: Matej Gera <matejgera@gmail.com>

Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
utukJ pushed a commit to utukJ/thanos that referenced this pull request Oct 13, 2022
* store: fix nil panic in proxy heap

With lazy proxying we need to wait for at least one response before we
can build a heap properly.

Closes thanos-io#5717
Fixes thanos-io#5552.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: panic if At() called without Next()

This is pretty much a bug if this ever happens so complain loudly.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* store: adjust test after recent changes

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: utukj <utukphd@gmail.com>
utukJ pushed a commit to utukJ/thanos that referenced this pull request Oct 13, 2022
* Introduce store type

- This commit introduces StoreType and end replaces / expands the Addr() method to be StoreInfo() method.
This method returns information on whether the store is local (case of MultiTSDB) or remote. If store is remote, we also return it's address in the StoreInfo() method
- Adjust relevant parts of code in proxy
- Adjust proxy tests

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Adjust existing proxy.Client implementations

- Adjust (and slightly refactor) endpoint set client
- Adjust test implementations

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Introduce local client implementation of proxy.Client interface

- Introduces new type in MultiTSDB
- Introduce method to obtain client for a tenant. This client leverages the server-as-client store client implementation.
- Adjust tests

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Add receive to interactive test

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Remove old MultiTSDB code

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Fix imports

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Fix formatting

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Remove forgotten dead code

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Remove store type and adjust Addr method instead

Signed-off-by: Matej Gera <matejgera@gmail.com>

* Disable timeout in receiver proxy

Signed-off-by: Matej Gera <matejgera@gmail.com>

* store: fix nil panic in proxy heap

With lazy proxying we need to wait for at least one response before we
can build a heap properly.

Closes thanos-io#5717
Fixes thanos-io#5552.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Use lazy retrieval strategy for receive proxy

Signed-off-by: Matej Gera <matejgera@gmail.com>

Signed-off-by: Matej Gera <matejgera@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: utukj <utukphd@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants