Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added connection_pool_size configuration property (preview) #276

Merged
merged 4 commits into from
Aug 10, 2023

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Aug 8, 2023

We have three things that are configurable from requests connection pooling perspective:

  • pool_connections - Number of urllib3 connection pools to cache before discarding the least recently used pool. Python requests default value is 10. This PR increases it to 20.
  • pool_maxsize - The maximum number of connections to save in the pool. Improves performance in multithreaded situations. For now, we're setting it to the same value as connection_pool_size.
  • pool_block - If pool_block is False, then more connections will are created, but not saved after the first use. Block when no free connections are available. urllib3 ensures that no more than pool_maxsize connections are used at a time. Prevents platform from flooding. By default, requests library doesn't block. We start with blocking.

This PR introduces two new configuration properties:

  • connection_pool_size configuration property, which sets pool_connections. Defaults to 20.
  • connection_pool_max_size, which sets pool_maxsize. Defaults to the same value as connection_pool_size.

nfx added 2 commits August 8, 2023 21:01
We have three things that are configurable from `requests` connection pooling perspective:

* `pool_connections` - Number of urllib3 connection pools to cache before discarding the least recently used pool. Python requests default value is 10. This PR increases it to 20.
* `pool_maxsize` - The maximum number of connections to save in the pool. Improves performance in multithreaded situations. For now, we're setting it to the same value as connection_pool_size.
* `pool_block` - If pool_block is False, then more connections will are created, but not saved after the first use. Block when no free connections are available. urllib3 ensures that no more than pool_maxsize connections are used at a time. Prevents platform from flooding. By default, requests library doesn't block. We start with blocking.

This PR introduces `connection_pool_size` configuration property, which sets `pool_connections` and `pool_maxsize` to the same value. If not set, it uses 20 instead of default 10.
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on increasing connection pool size & blocking. However, I don't know about increasing # of connection pools.

Comment on lines 898 to 902
# Number of urllib3 connection pools to cache before discarding the least
# recently used pool. Python requests default value is 10.
pool_connections = cfg.connection_pool_size
if pool_connections is None:
pool_connections = 20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the need to increase the connection pool size to support concurrent requests. However, why do we need multiple connection pools? Looking at requests, HttpAdapter's underlying PoolManager uses the same pool based on the <host,port,scheme> tuple, which for a single client will only be one endpoint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do think that we need just one pool, but somehow the default client was using the same value for both.
how about exposing two configuration properties, marking them experimental, and figure out the best defaults later? single SDK client will query only one host.

how should we name pool_maxsize? connection_pool_max?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default uses the same value for both because it uses the same constant for each parameter as default values in its constructor:

    def __init__(
        self,
        pool_connections=DEFAULT_POOLSIZE,
        pool_maxsize=DEFAULT_POOLSIZE,
        max_retries=DEFAULT_RETRIES,
        pool_block=DEFAULT_POOLBLOCK,
    ):

DEFAULT_POOLSIZE is set to 10.

@nfx nfx requested a review from mgyucht August 9, 2023 16:10
Copy link
Contributor

@mgyucht mgyucht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit on naming, but otherwise the change seems fine.

Comment on lines 494 to 495
connection_pool_size: int = ConfigAttribute()
connection_pool_max_size: int = ConfigAttribute()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my suggestion for the naming:

  • max_connection_pools: maximum number of connection pools used by the HttpAdapter.
  • max_connections_per_pool: maximum number of connections per pool used by the HttpAdapter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. let's get it shipped

@nfx nfx enabled auto-merge August 10, 2023 11:38
@nfx nfx added this pull request to the merge queue Aug 10, 2023
@codecov-commenter
Copy link

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.02% 🎉

Comparison is base (b82fa01) 52.55% compared to head (6ed5b70) 52.57%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #276      +/-   ##
==========================================
+ Coverage   52.55%   52.57%   +0.02%     
==========================================
  Files          33       33              
  Lines       19823    19833      +10     
==========================================
+ Hits        10418    10428      +10     
  Misses       9405     9405              
Files Changed Coverage Δ
databricks/sdk/core.py 67.14% <100.00%> (+0.43%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Merged via the queue into main with commit d801265 Aug 10, 2023
7 checks passed
nfx added a commit that referenced this pull request Aug 11, 2023
* Fixed OAuth M2M corner case in `WorkspaceClient` where `DATABRICKS_ACCOUNT_ID` is present in the environment ([#273](#273)).
* Added `connection_pool_size` configuration property (preview) ([#276](#276)).
nfx added a commit that referenced this pull request Aug 11, 2023
* Fixed OAuth M2M corner case in `WorkspaceClient` where `DATABRICKS_ACCOUNT_ID` is present in the environment ([#273](#273)).
* Added `connection_pool_size` configuration property (preview) ([#276](#276)).
@nfx nfx mentioned this pull request Aug 11, 2023
mgyucht added a commit that referenced this pull request Aug 11, 2023
* Added `connection_pool_size` configuration property (preview) ([#276](#276)).
* Fixed OAuth M2M corner case in `WorkspaceClient` where `DATABRICKS_ACCOUNT_ID` is present in the environment ([#273](#273)).

API Changes:

 * Changed `create()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass.
 * Changed `get()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass.
 * Changed `update()` method for [a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html) account-level service to return `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass.
 * Changed `create()` method for [w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html) workspace-level service with new required argument order.
 * Changed `update()` method for [w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html) workspace-level service with new required argument order.
 * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Removed `properties_kvpairs` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Added `options` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Added `properties` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Added `provisioning_state` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Added `securable_kind` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Added `securable_type` field for `databricks.sdk.service.catalog.ConnectionInfo`.
 * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.CreateConnection`.
 * Removed `properties_kvpairs` field for `databricks.sdk.service.catalog.CreateConnection`.
 * Added `options` field for `databricks.sdk.service.catalog.CreateConnection`.
 * Added `properties` field for `databricks.sdk.service.catalog.CreateConnection`.
 * Changed `algorithm` field for `databricks.sdk.service.catalog.SseEncryptionDetails` to no longer be required.
 * Removed `options_kvpairs` field for `databricks.sdk.service.catalog.UpdateConnection`.
 * Added `options` field for `databricks.sdk.service.catalog.UpdateConnection`.
 * Added `databricks.sdk.service.catalog.AccountsStorageCredentialInfo` dataclass.
 * Added `databricks.sdk.service.catalog.ConnectionInfoSecurableKind` dataclass.
 * Added `databricks.sdk.service.catalog.ProvisioningState` dataclass.
 * Added `data_security_mode` field for `databricks.sdk.service.compute.CreateCluster`.
 * Added `docker_image` field for `databricks.sdk.service.compute.CreateCluster`.
 * Added `single_user_name` field for `databricks.sdk.service.compute.CreateCluster`.
 * Removed `schema` field for `databricks.sdk.service.iam.PartialUpdate`.
 * Added `schemas` field for `databricks.sdk.service.iam.PartialUpdate`.

OpenAPI SHA: 1e3533f94335f0e6c5d9262bc1fea95b3ddcb0e1, Date: 2023-08-11
@mgyucht mgyucht mentioned this pull request Aug 11, 2023
github-merge-queue bot pushed a commit that referenced this pull request Aug 11, 2023
* Added `connection_pool_size` configuration property (preview)
([#276](#276)).
* Fixed OAuth M2M corner case in `WorkspaceClient` where
`DATABRICKS_ACCOUNT_ID` is present in the environment
([#273](#273)).

API Changes:

* Changed `create()` method for
[a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html)
account-level service to return
`databricks.sdk.service.catalog.AccountsStorageCredentialInfo`
dataclass.
* Changed `get()` method for
[a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html)
account-level service to return
`databricks.sdk.service.catalog.AccountsStorageCredentialInfo`
dataclass.
* Changed `update()` method for
[a.account_storage_credentials](https://databricks-sdk-py.readthedocs.io/en/latest/account/account_storage_credentials.html)
account-level service to return
`databricks.sdk.service.catalog.AccountsStorageCredentialInfo`
dataclass.
* Changed `create()` method for
[w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html)
workspace-level service with new required argument order.
* Changed `update()` method for
[w.connections](https://databricks-sdk-py.readthedocs.io/en/latest/workspace/connections.html)
workspace-level service with new required argument order.
* Removed `options_kvpairs` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Removed `properties_kvpairs` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Added `options` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Added `properties` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Added `provisioning_state` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Added `securable_kind` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Added `securable_type` field for
`databricks.sdk.service.catalog.ConnectionInfo`.
* Removed `options_kvpairs` field for
`databricks.sdk.service.catalog.CreateConnection`.
* Removed `properties_kvpairs` field for
`databricks.sdk.service.catalog.CreateConnection`.
* Added `options` field for
`databricks.sdk.service.catalog.CreateConnection`.
* Added `properties` field for
`databricks.sdk.service.catalog.CreateConnection`.
* Changed `algorithm` field for
`databricks.sdk.service.catalog.SseEncryptionDetails` to no longer be
required.
* Removed `options_kvpairs` field for
`databricks.sdk.service.catalog.UpdateConnection`.
* Added `options` field for
`databricks.sdk.service.catalog.UpdateConnection`.
* Added `databricks.sdk.service.catalog.AccountsStorageCredentialInfo`
dataclass.
* Added `databricks.sdk.service.catalog.ConnectionInfoSecurableKind`
dataclass.
 * Added `databricks.sdk.service.catalog.ProvisioningState` dataclass.
* Added `data_security_mode` field for
`databricks.sdk.service.compute.CreateCluster`.
* Added `docker_image` field for
`databricks.sdk.service.compute.CreateCluster`.
* Added `single_user_name` field for
`databricks.sdk.service.compute.CreateCluster`.
* Removed `schema` field for `databricks.sdk.service.iam.PartialUpdate`.
 * Added `schemas` field for `databricks.sdk.service.iam.PartialUpdate`.

OpenAPI SHA: 1e3533f94335f0e6c5d9262bc1fea95b3ddcb0e1, Date: 2023-08-11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants