Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Receive] Fix race condition when adding multiple new tenants at once #7941

Merged
merged 5 commits into from
Dec 3, 2024

Conversation

jnyi
Copy link
Contributor

@jnyi jnyi commented Nov 25, 2024

Update: actually fix the issue instead of reverting the old one, memorize the TSDB client list is valuable to avoid creating thousands of slices in memory, see a6fbb9f

This reverted PR #7782 and fixed Issue #7892

Reproducible by newly added unit tests TestMultiTSDBAddNewTenant:

Screenshot 2024-11-25 at 10 00 54 AM

After this fix, unit test would pass

=== RUN   TestMultiTSDBAddNewTenant
=== PAUSE TestMultiTSDBAddNewTenant
=== CONT  TestMultiTSDBAddNewTenant
--- PASS: TestMultiTSDBAddNewTenant (13.82s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-0
    --- PASS: TestMultiTSDBAddNewTenant/iteration-0 (1.39s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-1
    --- PASS: TestMultiTSDBAddNewTenant/iteration-1 (1.36s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-2
    --- PASS: TestMultiTSDBAddNewTenant/iteration-2 (1.39s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-3
    --- PASS: TestMultiTSDBAddNewTenant/iteration-3 (1.40s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-4
    --- PASS: TestMultiTSDBAddNewTenant/iteration-4 (1.40s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-5
    --- PASS: TestMultiTSDBAddNewTenant/iteration-5 (1.36s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-6
    --- PASS: TestMultiTSDBAddNewTenant/iteration-6 (1.37s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-7
    --- PASS: TestMultiTSDBAddNewTenant/iteration-7 (1.38s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-8
    --- PASS: TestMultiTSDBAddNewTenant/iteration-8 (1.41s)
=== RUN   TestMultiTSDBAddNewTenant/iteration-9
    --- PASS: TestMultiTSDBAddNewTenant/iteration-9 (1.36s)
PASS
  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

@jnyi jnyi changed the title [Receive] Fix issue-7892 race condition [Receive] Fix race condition when adding multiple new tenants at once Nov 25, 2024
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's not a fix really but a revert. Thanks for the test which reproduces the bug, it will be much easier to fix. I think we should fix this instead of reverting because the cost of creating this slice is not trivial when you have thousands of selects per second.

@jnyi
Copy link
Contributor Author

jnyi commented Nov 27, 2024

hi @GiedriusS

I got your point, this client list should be relatively stable most of the time (wasteful of memory to create slices), I spent some time to actually fix it, appreciate another review

pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
@jnyi
Copy link
Contributor Author

jnyi commented Nov 27, 2024

I think the e2e test failure is tranisent, but i don't have permission to rerun

@jnyi jnyi requested a review from GiedriusS November 27, 2024 18:21
@jnyi jnyi force-pushed the issue-7892 branch 2 times, most recently from 519b937 to ea3c2a0 Compare November 27, 2024 18:50
jnyi added 4 commits November 28, 2024 12:04
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
pkg/receive/multitsdb.go Outdated Show resolved Hide resolved
@jnyi jnyi requested a review from GiedriusS December 2, 2024 18:51
@jnyi jnyi force-pushed the issue-7892 branch 4 times, most recently from 8582bfc to 982408e Compare December 2, 2024 20:36
GiedriusS
GiedriusS previously approved these changes Dec 2, 2024
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 thanks a lot for this and sorry for the problems

@GiedriusS GiedriusS enabled auto-merge (squash) December 2, 2024 20:39
auto-merge was automatically disabled December 2, 2024 21:04

Head branch was pushed to by a user without write access

@jnyi jnyi force-pushed the issue-7892 branch 2 times, most recently from 28580fd to 83b09f5 Compare December 2, 2024 21:30
Signed-off-by: Yi Jin <yi.jin@databricks.com>
@jnyi
Copy link
Contributor Author

jnyi commented Dec 2, 2024

🚀 thanks a lot for this and sorry for the problems

no worries, I've tried to fix the tests, might help merge it since all checks pass

@jnyi jnyi requested a review from GiedriusS December 2, 2024 22:06
Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use force pushes in the future as it makes it hard to follow what changes between reviews, thanks!

@GiedriusS GiedriusS merged commit 1ea4e69 into thanos-io:main Dec 3, 2024
22 checks passed
saswatamcode pushed a commit to saswatamcode/thanos that referenced this pull request Dec 3, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
saswatamcode pushed a commit to saswatamcode/thanos that referenced this pull request Dec 3, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
saswatamcode added a commit that referenced this pull request Dec 4, 2024
* Merge pull request #7674 from didukh86/query_frontend_tls_redis_fix

Query-frontend: Fix connection to Redis cluster with TLS.
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Capnp: Use segment from existing message (#7945)

* Capnp: Use segment from existing message

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

* Downgrade capnproto

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* [Receive] Fix race condition when adding multiple new tenants at once (#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Cut patch release v0.37.1

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Update promql-engine for subquery fix (#7953)

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Sidecar: Ensure limit param is positive for compatibility with older Prometheus (#7954)

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Update changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

* Fix changelog

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Yi Jin <yi.jin@databricks.com>
Co-authored-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Yi Jin <96499497+jnyi@users.noreply.github.com>
GiedriusS pushed a commit to vinted/thanos that referenced this pull request Dec 12, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
jnyi added a commit to jnyi/thanos that referenced this pull request Dec 13, 2024
…thanos-io#7941)

* [Receive] fix race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* add a change log

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* memorize tsdb local clients without race condition

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* fix data race in testing with some concurrent safe helper functions

Signed-off-by: Yi Jin <yi.jin@databricks.com>

* address comments

Signed-off-by: Yi Jin <yi.jin@databricks.com>

---------

Signed-off-by: Yi Jin <yi.jin@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants