Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added connection_pool_size configuration property (preview) #276

Merged
merged 4 commits into from
Aug 10, 2023
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion databricks/sdk/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,7 @@ class Config:
metadata_service_url = ConfigAttribute(env='DATABRICKS_METADATA_SERVICE_URL',
auth='metadata-service',
sensitive=True)
connection_pool_size: int = ConfigAttribute()

def __init__(self,
*,
Expand Down Expand Up @@ -893,7 +894,29 @@ def __init__(self, cfg: Config = None):

self._session = requests.Session()
self._session.auth = self._authenticate
self._session.mount("https://", HTTPAdapter(max_retries=retry_strategy))

# Number of urllib3 connection pools to cache before discarding the least
# recently used pool. Python requests default value is 10.
pool_connections = cfg.connection_pool_size
if pool_connections is None:
pool_connections = 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need to increase the connection pool size to support concurrent requests. However, why do we need multiple connection pools? Looking at requests, HttpAdapter's underlying PoolManager uses the same pool based on the <host,port,scheme> tuple, which for a single client will only be one endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do think that we need just one pool, but somehow the default client was using the same value for both.
how about exposing two configuration properties, marking them experimental, and figure out the best defaults later? single SDK client will query only one host.

how should we name pool_maxsize? connection_pool_max?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default uses the same value for both because it uses the same constant for each parameter as default values in its constructor:

    def __init__(
        self,
        pool_connections=DEFAULT_POOLSIZE,
        pool_maxsize=DEFAULT_POOLSIZE,
        max_retries=DEFAULT_RETRIES,
        pool_block=DEFAULT_POOLBLOCK,
    ):

DEFAULT_POOLSIZE is set to 10.


# The maximum number of connections to save in the pool. Improves performance
# in multithreaded situations. For now, we're setting it to the same value
# as connection_pool_size.
pool_maxsize = pool_connections

# If pool_block is False, then more connections will are created,
# but not saved after the first use. Blocks when no free connections are available.
# urllib3 ensures that no more than pool_maxsize connections are used at a time.
# Prevents platform from flooding. By default, requests library doesn't block.
pool_block = True

http_adapter = HTTPAdapter(max_retries=retry_strategy,
pool_connections=pool_connections,
pool_maxsize=pool_maxsize,
pool_block=pool_block)
self._session.mount("https://", http_adapter)

@property
def account_id(self) -> str:
Expand Down
Loading