Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

feat: dynamic channel pool scaled by number of outstanding request #1569

Merged
merged 28 commits into from
Feb 15, 2022

Conversation

igorbernstein2
Copy link
Contributor

@igorbernstein2 igorbernstein2 commented Nov 22, 2021

This introduces the ability for gRPC channel pools to resize based on the number of outstanding RPCs.
Previously the pool was statically sized when the client was instantiated. It should automate the process described here: https://cloud.google.com/bigtable/docs/configure-connection-pools

The feature added in this PR are opt-in and shouldn't change or break any existing behavior. To enable dynamic channel pools, google clients or customers would have to set ChannelPoolSettings in StubSettings.

This PR adds the following features:

  • Each channel in the pool tracks the maximum number of concurrent RPC it has seen
  • every minute, a background thread loops through all of the channels and averages the max of each channel.
  • the average is compared to a threshold range. If the number falls outside of the range, the pool is resized to be in the middle of the range

Please note that autoscaling will not play well with gcp clients that use affinity (pubsub & spanner). If the 2 features need to be combined, then we have a couple of options: we can use consistent hashing or maybe a proper api for affinity can be added where the lifecycle if affinity can be leased

…nelPool.

The eventual goal is to allow channel pool to safely add remove channels. The code has been refactored roughly as:
- SafeShutdownManagedChannel is now ChannelPool.Entry and ReleasingClientCall
- RefreshingManagedChannel has been merged into ChannelPool as a pair of functions scheduleNextRefresh & refresh
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 22, 2021
@chanseokoh
Copy link
Contributor

Let me know once this gets out of the draft status for review.

@igorbernstein2 igorbernstein2 changed the title feat: dynamic channel pool scaled by number of outstanding request (WIP) feat: dynamic channel pool scaled by number of outstanding request Jan 7, 2022
@igorbernstein2 igorbernstein2 marked this pull request as ready for review January 7, 2022 17:36
@igorbernstein2 igorbernstein2 requested review from a team as code owners January 7, 2022 17:36
@igorbernstein2
Copy link
Contributor Author

Ok, I rebased on the channel refactor, this should be ready for review

Copy link
Contributor

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing, but releasing some comments early.


// Number of channels if each channel operated at minimum capacity
int maxChannels =
(int) Math.ceil(actualOutstandingRpcs / (double) settings.getMinRpcsPerChannel());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see by default getMinRpcsPerChannel() returns 0, so we are diving by 0. That doesn't fail, but the result is

  • if actualOutstandingRpcs == 0, then maxChannel is 0.
  • if actualOutstandingRpcs != 0, then it's 2147483647.

This is still OK, since you're bounding maxChannels below. However, the division-by-0 computation is not trivial to verify and the computation result of 2147483647 seems atypical. The code gives the impression that the author probably failed to anticipate possible division by 0. At least I want to add a comment like "getMinRpcsPerChannel() can return 0, but division by 0 shouldn't cause a problem."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

public abstract int getMinRpcsPerChannel();

/**
* Threshold to start scaling up the channel pool.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth mentioning that setting there's a breaking point at 100 regarding performance (if I understand it correctly)? For example, if set to too high value, a lot of RPCs may be pending because gRPC's limit of 100 concurrent RPCs per channel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg

# Conflicts:
#	gax-grpc/src/main/java/com/google/api/gax/grpc/InstantiatingGrpcChannelProvider.java
@igorbernstein2
Copy link
Contributor Author

Sorry for the delay. I finally found time to address your feedback. PTAL

@chanseokoh
Copy link
Contributor

BTW, tests are failing.

@igorbernstein2
Copy link
Contributor Author

igorbernstein2 commented Feb 14, 2022

fixed .. it was a stray import that leaked from the merge with main

Copy link
Contributor

@chanseokoh chanseokoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending a few minor comments. (For quick search, please check all previous comments that have not been marked resolved and closed.)

@igorbernstein2
Copy link
Contributor Author

Ok I think everything has been addressed. If you are ok with this, please merge it when you have a chance.

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 22 Code Smells

74.4% 74.4% Coverage
0.0% 0.0% Duplication

@chanseokoh chanseokoh merged commit fff2bab into googleapis:main Feb 15, 2022
@igorbernstein2 igorbernstein2 deleted the dynamic-channel-pool branch February 15, 2022 16:49
@meltsufin
Copy link
Member

@igorbernstein2 Do any of the code smells detected by SonarCloud make sense?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants