Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis cluster refresh of large clusters keeps I/O threads busy #2045

Closed
be-hase opened this issue Mar 15, 2022 · 7 comments
Closed

Redis cluster refresh of large clusters keeps I/O threads busy #2045

be-hase opened this issue Mar 15, 2022 · 7 comments
Labels
type: feature A new feature
Milestone

Comments

@be-hase
Copy link
Contributor

be-hase commented Mar 15, 2022

Bug Report

Current Behavior

For large redis cluster, using enablePeriodicRefresh causes serious performance problems.

The processing time required for DefaultClusterTopologyRefresh.getNodeSpecificViews increases in proportion to the cluster size.
Unfortunately, this is executed in the NIO event loop thread.

See the repository README below for details.
https://github.com/be-hase/lettuce-with-large-cluster

Input Code

I have prepared the code that can reproduce this problem.
https://github.com/be-hase/lettuce-with-large-cluster

Expected behavior/code

Even if we use enablePeriodicRefresh with large redis cluster, the performance will not deteriorate.

Environment

  • Lettuce version(s): 6.1.6.RELEASE
  • Redis version: 6.2.0

Possible Solution

Idea 1

Stop running DefaultClusterTopologyRefresh.getNodeSpecificViews on the NIO event loop.

I won't go into details, but if I customized DefaultClusterTopologyRefresh and ran it on another thread, the
performance improved.

My team is considering adopting this method as a workaround.

Idea 2

Looking at the previous framegraph, it seems that the overhead is large in the processing using BitSet.
Why not change to a more primitive method like boolean[] ?

However, I'm not sure if the performance will improve.

Additional context

NONE

@be-hase
Copy link
Contributor Author

be-hase commented Mar 16, 2022

In my actual environment, I use both periodic refresh and adaptive refresh.

Should I use only adaptive refresh in a large cluster?
Are there any disadvantages when using only adaptive refresh ?

@be-hase
Copy link
Contributor Author

be-hase commented Mar 16, 2022

Ah, it seems good to use dynamicRefreshSources(false) for a large cluster.

@mp911de
Copy link
Collaborator

mp911de commented Mar 16, 2022

BitSet has a much smaller memory footprint than boolean[] and it is easier in usage. Generally, if you increase the periodic refresh period and rely more on adaptive refresh, then you will likely experience less refreshes.

It would be possible to use a different scheduler in DefaultClusterTopologyRefresh.loadViews(…) as we're using future chaining. What is the underlying cause for the ticket, how did you find out about this issue? Was it motivated by an incident or something similar?

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Mar 16, 2022
@mp911de mp911de changed the title Large redis cluster causes serious performance problem due to periodic cluster topology refresh Redis cluster refresh of large clusters keeps I/O threads busy Mar 16, 2022
@be-hase
Copy link
Contributor Author

be-hase commented Mar 16, 2022

I had this problem when I upgraded from lettuce5 to lettuce6.
We are using a large redis cluster(96 nodes).

After investigating the cause, it was determined that it was due to DefaultClusterTopologyRefresh.getNodeSpecificViews running on the NIO event loop thread.
Asynchronous Cluster Topology Refresh has appeared from lettuce6, so I think it is the influence.

@mp911de
Copy link
Collaborator

mp911de commented Mar 16, 2022

Feel free to submit a pull request to use ClientResources.eventExecutorGroup() in the future chaining in DefaultClusterTopologyRefresh.getNodeSpecificViews

@mp911de mp911de added type: feature A new feature and removed status: waiting-for-feedback We need additional information before we can continue labels Mar 16, 2022
@mp911de mp911de added this to the 6.1.7 milestone Mar 16, 2022
@be-hase
Copy link
Contributor Author

be-hase commented Mar 16, 2022

OK. I'll try to submit PR later.

@be-hase
Copy link
Contributor Author

be-hase commented Mar 17, 2022

@mp911de
Submited PR. Please check 🙏
#2048

mp911de pushed a commit that referenced this issue Mar 18, 2022
mp911de added a commit that referenced this issue Mar 18, 2022
Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests.

Original pull request: #2048.
mp911de pushed a commit that referenced this issue Mar 18, 2022
mp911de added a commit that referenced this issue Mar 18, 2022
Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests.

Original pull request: #2048.
@mp911de mp911de closed this as completed Mar 18, 2022
GilboaAWS pushed a commit to GilboaAWS/lettuce-core-1 that referenced this issue Apr 3, 2022
GilboaAWS pushed a commit to GilboaAWS/lettuce-core-1 that referenced this issue Apr 3, 2022
Use thenApplyAsync(…) instead of supplyAsync(…) to reduce allocations. Adopt tests.

Original pull request: redis#2048.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature A new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants