Skip to content

Commit

Permalink
Revise proxy configuration, add integration testing (#325)
Browse files Browse the repository at this point in the history
## Problem

We want to expose proxy configuration fields as top-level config params
and move away from users passing an `OpenApiConfiguration` instance with
`openapi_config` (as was done in 2.x versions of the sdk). Passing these
objects is not something we documented or had shown as a named
configuration param, but is something people assumed would work as a
hangover from the way things were done in the old client.

Currently OpenApi generated code is part of our implementation, but we
may want to swap out something else in the future so it doesn't make
sense to use this `OpenApiConfiguration` object as part of our public
interface. And since the openapi config object also needs to be
configured with api key and host, it competes with these other config
params in a way that is pretty confusing.

## Solution

- Add new configuration parameters for `proxy_url`, `proxy_headers`,
`ssl_ca_certs`, and `ssl_verify`.
- Document the use of these properties in the README and doc strings
- Emit a warning message when users are passing `openapi_config`
encouraging them to use the new properties.
- Implement integration tests that run with
[mitmproxy](https://mitmproxy.org/) docker image. For now these are only
run in local development.

## Usage

```python
from pinecone import Pinecone
import urllib3 import make_headers

pc = Pinecone(
    api_key="YOUR_API_KEY",
    proxy_url='https://your-proxy.com',
    proxy_headers=make_headers(proxy_basic_auth='username:password'),
    ssl_ca_certs='path/to/cert-bundle.pem'
)
```

This will work, but should emit a deprecation warning:

```python
from pinecone import Pinecone
from pinecone.core.client.configuration import OpenApiConfiguration
import urllib3 import make_headers

config = OpenApiConfiguration()
config.ssl_ca_certs = 'path/to/cert-bundle.pem'
config.proxy_headers = make_headers(proxy_basic_auth='username:password')
config.proxy_url = 'https://your-proxy.com'

pc = Pinecone(
    api_key="YOUR_API_KEY",
    openapi_config=config
) # emits deprecation notice

```

## Future work
We will need to figure out how to deploy the mitmproxy elsewhere (cloud
run?) before running tests in CI because Github Actions seems to have a
lot of issues with running these docker containers and debugging the
errors has proven quite difficult. While GHA can run docker containers,
mounting a volume with the required certs causes the container to crash.
I'm guessing this is some kind of file permissions issue, but don't have
much visibility for debugging.

## Type of Change

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [x] This change requires a documentation update
- [ ] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan

Run new tests locally with `PINECONE_API_KEY='foo'
PINECONE_INDEX_NAME='bar' poetry run pytest
tests/integration/proxy_config/ -s -v`
  • Loading branch information
jhamon authored Mar 21, 2024
1 parent a201cb6 commit 6fa0447
Show file tree
Hide file tree
Showing 27 changed files with 738 additions and 36 deletions.
16 changes: 16 additions & 0 deletions .github/workflows/testing-dependency.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,19 @@ jobs:
index_name: '${{ needs.dependency-matrix-setup.outputs.index_name }}'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
urllib3_version: '${{ matrix.urllib3_version }}'

deps-cleanup:
name: Deps cleanup
runs-on: ubuntu-latest
needs:
- dependency-matrix-setup
- dependency-matrix-grpc
- dependency-matrix-grpc-312
- dependency-matrix-rest
- dependency-matrix-rest-312
steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/delete-index
with:
index_name: '${{ needs.dependency-matrix-setup.outputs.index_name }}'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
49 changes: 48 additions & 1 deletion .github/workflows/testing-integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,54 @@ name: "Integration Tests"
workflow_call: {}

jobs:
# setup-index:
# name: Setup proxyconfig test index
# runs-on: ubuntu-latest
# outputs:
# index_name: ${{ steps.create-index.outputs.index_name }}
# steps:
# - uses: actions/checkout@v4
# - name: Create index
# id: create-index
# uses: ./.github/actions/create-index
# with:
# PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
# NAME_PREFIX: 'proxyconfig-'
# REGION: 'us-west-2'
# CLOUD: 'aws'
# DIMENSION: 1536
# METRIC: 'cosine'

# proxy-config:
# name: Proxy config tests
# needs: [setup-index]
# runs-on: ubuntu-latest
# steps:
# - uses: actions/checkout@v4
# - uses: actions/setup-python@v5
# with:
# python-version: 3.9
# - name: Setup Poetry
# uses: ./.github/actions/setup-poetry
# - name: 'Run integration tests (proxy config)'
# run: |
# poetry run pytest tests/integration/proxy_config -s -v
# env:
# PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
# PINECONE_INDEX_NAME: ${{ needs.setup-index.outputs.index_name }}
# - name: Upload logs
# if: always()
# uses: actions/upload-artifact@v4
# with:
# name: proxy_config_test_logs
# path: tests/integration/proxy_config/logs
# - name: Cleanup index
# if: always()
# uses: ./.github/actions/delete-index
# with:
# PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
# INDEX_NAME: ${{ needs.setup-index.outputs.index_name }}

data-plane-serverless:
name: Data plane serverless integration tests
runs-on: ubuntu-latest
Expand All @@ -27,7 +75,6 @@ jobs:
spec: '${{ matrix.spec }}'
PINECONE_API_KEY: '${{ secrets.PINECONE_API_KEY }}'
freshness_timeout_seconds: 600

# data-plane-pod:
# name: Data plane pod integration tests
# runs-on: ubuntu-latest
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -154,3 +154,5 @@ dmypy.json
# Datasets
*.hdf5
*~

tests/integration/proxy_config/logs
79 changes: 77 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,14 @@ from pinecone import Pinecone
pc = Pinecone() # This reads the PINECONE_API_KEY env var
```

#### Using a configuration object
#### Using configuration keyword params

If you prefer to pass configuration in code, for example if you have a complex application that needs to interact with multiple different Pinecone projects, the constructor accepts a keyword argument for `api_key`.

If you pass configuration in this way, you can have full control over what name to use for the environment variable, sidestepping any issues that would result
from two different client instances both needing to read the same `PINECONE_API_KEY` variable that the client implicitly checks for.

Configuration passed with keyword arguments takes precedent over environment variables.
Configuration passed with keyword arguments takes precedence over environment variables.

```python
import os
Expand All @@ -87,6 +87,81 @@ from pinecone import Pinecone
pc = Pinecone(api_key=os.environ.get('CUSTOM_VAR'))
```

### Proxy configuration

If your network setup requires you to interact with Pinecone via a proxy, you will need
to pass additional configuration using optional keyword parameters. These optional parameters are forwarded to `urllib3`, which is the underlying library currently used by the Pinecone client to make HTTP requests. You may find it helpful to refer to the [urllib3 documentation on working with proxies](https://urllib3.readthedocs.io/en/stable/advanced-usage.html#http-and-https-proxies) while troubleshooting these settings.

Here is a basic example:

```python
from pinecone import Pinecone

pc = Pinecone(
api_key='YOUR_API_KEY',
proxy_url='https://your-proxy.com'
)

pc.list_indexes()
```

If your proxy requires authentication, you can pass those values in a header dictionary using the `proxy_headers` parameter.

```python
from pinecone import Pinecone
import urllib3 import make_headers

pc = Pinecone(
api_key='YOUR_API_KEY',
proxy_url='https://your-proxy.com',
proxy_headers=make_headers(proxy_basic_auth='username:password')
)

pc.list_indexes()
```

### Using proxies with self-signed certificates

By default the Pinecone Python client will perform SSL certificate verification
using the CA bundle maintained by Mozilla in the [certifi](https://pypi.org/project/certifi/) package.

If your proxy server is using a self-signed certificate, you will need to pass the path to the certificate in PEM format using the `ssl_ca_certs` parameter.

```python
from pinecone import Pinecone
import urllib3 import make_headers

pc = Pinecone(
api_key="YOUR_API_KEY",
proxy_url='https://your-proxy.com',
proxy_headers=make_headers(proxy_basic_auth='username:password'),
ssl_ca_certs='path/to/cert-bundle.pem'
)

pc.list_indexes()
```

### Disabling SSL verification

If you would like to disable SSL verification, you can pass the `ssl_verify`
parameter with a value of `False`. We do not recommend going to production with SSL verification disabled.

```python
from pinecone import Pinecone
import urllib3 import make_headers

pc = Pinecone(
api_key='YOUR_API_KEY',
proxy_url='https://your-proxy.com',
proxy_headers=make_headers(proxy_basic_auth='username:password'),
ssl_ca_certs='path/to/cert-bundle.pem',
ssl_verify=False
)

pc.list_indexes()

```

### Working with GRPC (for improved performance)

If you've followed instructions above to install with optional `grpc` extras, you can unlock some performance improvements by working with an alternative version of the client imported from the `pinecone.grpc` subpackage.
Expand Down
40 changes: 31 additions & 9 deletions pinecone/config/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
from typing import NamedTuple, Optional, Dict
import os
import copy

from pinecone.exceptions import PineconeConfigurationError
from pinecone.config.openapi import OpenApiConfigFactory
Expand All @@ -10,7 +9,10 @@
class Config(NamedTuple):
api_key: str = ""
host: str = ""
openapi_config: Optional[OpenApiConfiguration] = None
proxy_url: Optional[str] = None
proxy_headers: Optional[Dict[str, str]] = None
ssl_ca_certs: Optional[str] = None
ssl_verify: Optional[bool] = None
additional_headers: Optional[Dict[str, str]] = {}

class ConfigBuilder:
Expand All @@ -34,7 +36,10 @@ class ConfigBuilder:
def build(
api_key: Optional[str] = None,
host: Optional[str] = None,
openapi_config: Optional[OpenApiConfiguration] = None,
proxy_url: Optional[str] = None,
proxy_headers: Optional[Dict[str, str]] = None,
ssl_ca_certs: Optional[str] = None,
ssl_verify: Optional[bool] = None,
additional_headers: Optional[Dict[str, str]] = {},
**kwargs,
) -> Config:
Expand All @@ -47,11 +52,28 @@ def build(
if not host:
raise PineconeConfigurationError("You haven't specified a host.")

return Config(api_key, host, proxy_url, proxy_headers, ssl_ca_certs, ssl_verify, additional_headers)

@staticmethod
def build_openapi_config(
config: Config, openapi_config: Optional[OpenApiConfiguration] = None, **kwargs
) -> OpenApiConfiguration:
if openapi_config:
openapi_config = copy.deepcopy(openapi_config)
openapi_config.host = host
openapi_config.api_key = {"ApiKeyAuth": api_key}
else:
openapi_config = OpenApiConfigFactory.build(api_key=api_key, host=host)
openapi_config = OpenApiConfigFactory.copy(openapi_config=openapi_config, api_key=config.api_key, host=config.host)
elif openapi_config is None:
openapi_config = OpenApiConfigFactory.build(api_key=config.api_key, host=config.host)

return Config(api_key, host, openapi_config, additional_headers)
# Check if value passed before overriding any values present
# in the openapi_config. This means if the user has passed
# an openapi_config object and a kwarg for the same setting,
# the kwarg will take precedence.
if (config.proxy_url):
openapi_config.proxy = config.proxy_url
if (config.proxy_headers):
openapi_config.proxy_headers = config.proxy_headers
if (config.ssl_ca_certs):
openapi_config.ssl_ca_cert = config.ssl_ca_certs
if (config.ssl_verify != None):
openapi_config.verify_ssl = config.ssl_verify

return openapi_config
28 changes: 27 additions & 1 deletion pinecone/config/openapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

import certifi
import socket
import copy
import warnings

from urllib3.connection import HTTPConnection

Expand All @@ -17,11 +19,35 @@ class OpenApiConfigFactory:
@classmethod
def build(cls, api_key: str, host: Optional[str] = None, **kwargs):
openapi_config = OpenApiConfiguration()
openapi_config.api_key = {"ApiKeyAuth": api_key}
openapi_config.host = host
openapi_config.ssl_ca_cert = certifi.where()
openapi_config.socket_options = cls._get_socket_options()
openapi_config.api_key = {"ApiKeyAuth": api_key}
return openapi_config

@classmethod
def copy(cls, openapi_config: OpenApiConfiguration, api_key: str, host: str) -> OpenApiConfiguration:
'''
Copy a user-supplied openapi configuration and update it with the user's api key and host.
If they have not specified other socket configuration, we will use the default values.
We expect these objects are being passed mainly a vehicle for proxy configuration, so
we don't modify those settings.
'''
copied = copy.deepcopy(openapi_config)
warnings.warn("Passing openapi_config is deprecated and will be removed in a future release. Please pass settings such as proxy_url, proxy_headers, ssl_ca_certs, and ssl_verify directly to the Pinecone constructor as keyword arguments. See the README at https://github.com/pinecone-io/pinecone-python-client for examples.", DeprecationWarning)

copied.api_key = {"ApiKeyAuth": api_key}
copied.host = host

# Set sensible defaults if the user hasn't set them
if not copied.socket_options:
copied.socket_options = cls._get_socket_options()

# We specifically do not modify the user's ssl_ca_cert or proxy settings, as
# they may have set them intentionally. This is the main reason somebody would
# pass an openapi_config in the first place.

return copied

@classmethod
def _get_socket_options(
Expand Down
Loading

0 comments on commit 6fa0447

Please sign in to comment.