Skip to content

Commit

Permalink
Tutorial and documentation for config-based connectors (#15027)
Browse files Browse the repository at this point in the history
* 5-step tutorial

* move

* tiny bit of editing

* Update tutorial

* update docs

* reset

* move files

* record selector, request options, and more links

* update

* update

* connector definition

* link

* links

* update example

* footnote

* typo

* document string interpolation

* note on string interpolation

* update

* fix code sample

* fix

* update sample

* fix

* use the actual config

* Update as per comments

* write as yaml

* typo

* Clarify options overloading

* clarify that docker must be running

* remove extra footnote

* use venv directly

* Apply suggestions from code review

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>

* signup instructions

* update

* clarify that both dot and bracket notations are interchangeable

* Clarify how check works

* create spec and config before updating connector definition

* clarify what now_local() is

* rename to yaml structure

* Go through tutorial and update end of section code samples

* fix link

* update

* update code samples

* Update code samples

* Update to bracket notation

* remove superfluous comments

* Update docs/connector-development/config-based/tutorial/2-install-dependencies.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* Update docs/connector-development/config-based/tutorial/4-reading-data.md

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* fix path

* update

* motivation blurp

* warning

* warning

* fix code block

* update code samples

* update code sample

* update code samples

* small updates

* update yaml structure

* custom class example

* language annotations

* update warning

* Update tutorial to use dpath extractor

* Update record selector docs

* unit test

* link to contributing

* tiny update

* $ in front of commands

* $ in front of commands

* More readings

* link to existing config-based connectors

* index

* update

* delete broken link

* supported features

* update

* Add some links

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/record-selector.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* Update docs/connector-development/config-based/overview.md

Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>

* mention the unit

* headers

* remove mentions of interpolating on stream slice, etc.

* update

* exclude config-based docs

Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Augustin <augustin.lafanechere@gmail.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
  • Loading branch information
4 people authored Aug 12, 2022
1 parent 1b1448d commit 288c3ca
Show file tree
Hide file tree
Showing 22 changed files with 2,123 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def token(self) -> str:
@dataclass
class BasicHttpAuthenticator(AbstractHeaderAuthenticator):
"""
Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using bas64
Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using base64
https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
The header is of the form
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from typing import Mapping, Type

from airbyte_cdk.sources.declarative.auth.oauth import DeclarativeOauth2Authenticator
from airbyte_cdk.sources.declarative.auth.token import ApiKeyAuthenticator, BasicHttpAuthenticator, BearerAuthenticator
from airbyte_cdk.sources.declarative.datetime.min_max_datetime import MinMaxDatetime
from airbyte_cdk.sources.declarative.declarative_stream import DeclarativeStream
Expand Down Expand Up @@ -56,6 +57,7 @@
"ListStreamSlicer": ListStreamSlicer,
"MinMaxDatetime": MinMaxDatetime,
"NoPagination": NoPagination,
"OAuthAuthenticator": DeclarativeOauth2Authenticator,
"OffsetIncrement": OffsetIncrement,
"RecordSelector": RecordSelector,
"RemoveFields": RemoveFields,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ class DeclarativeComponentFactory:
If the component definition is a mapping with neither a "class_name" nor a "type" field,
the factory will do a best-effort attempt at inferring the component type by looking up the parent object's constructor type hints.
If the type hint is an interface present in `DEFAULT_IMPLEMENTATIONS_REGISTRY`,
then the factory will create an object of it's default implementation.
then the factory will create an object of its default implementation.
If the component definition is a list, then the factory will iterate over the elements of the list,
instantiate its subcomponents, and return a list of instantiated objects.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ class YamlParser(ConnectionDefinitionParser):
"""
Parses a Yaml string to a ConnectionDefinition
In addition to standard Yaml parsing, the input_string can contain refererences to values previously defined.
In addition to standard Yaml parsing, the input_string can contain references to values previously defined.
This parser will dereference these values to produce a complete ConnectionDefinition.
References can be defined using a *ref(<arg>) string.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
[
("test_extract_from_array", ["data"], {"data": [{"id": 1}, {"id": 2}]}, [{"id": 1}, {"id": 2}]),
("test_extract_single_record", ["data"], {"data": {"id": 1}}, [{"id": 1}]),
("test_extract_single_record_from_root", [], {"id": 1}, [{"id": 1}]),
("test_extract_from_root_array", [], [{"id": 1}, {"id": 2}], [{"id": 1}, {"id": 2}]),
("test_nested_field", ["data", "records"], {"data": {"records": [{"id": 1}, {"id": 2}]}}, [{"id": 1}, {"id": 2}]),
("test_field_in_config", ["{{ config['field'] }}"], {"record_array": [{"id": 1}, {"id": 2}]}, [{"id": 1}, {"id": 2}]),
Expand Down
73 changes: 73 additions & 0 deletions docs/connector-development/config-based/authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# Authentication

The `Authenticator` defines how to configure outgoing HTTP requests to authenticate on the API source.

## Authenticators

### ApiKeyAuthenticator

The `ApiKeyAuthenticator` sets an HTTP header on outgoing requests.
The following definition will set the header "Authorization" with a value "Bearer hello":

```yaml
authenticator:
type: "ApiKeyAuthenticator"
header: "Authorization"
token: "Bearer hello"
```
### BearerAuthenticator
The `BearerAuthenticator` is a specialized `ApiKeyAuthenticator` that always sets the header "Authorization" with the value "Bearer {token}".
The following definition will set the header "Authorization" with a value "Bearer hello"

```yaml
authenticator:
type: "BearerAuthenticator"
token: "hello"
```

More information on bearer authentication can be found [here](https://swagger.io/docs/specification/authentication/bearer-authentication/)

### BasicHttpAuthenticator

The `BasicHttpAuthenticator` set the "Authorization" header with a (USER ID/password) pair, encoded using base64 as per [RFC 7617](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme).
The following definition will set the header "Authorization" with a value "Basic {encoded credentials}"

```yaml
authenticator:
type: "BasicHttpAuthenticator"
username: "hello"
password: "world"
```

The password is optional. Authenticating with APIs using Basic HTTP and a single API key can be done as:

```yaml
authenticator:
type: "BasicHttpAuthenticator"
username: "hello"
```

### OAuth

OAuth authentication is supported through the `OAuthAuthenticator`, which requires the following parameters:

- token_refresh_endpoint: The endpoint to refresh the access token
- client_id: The client id
- client_secret: The client secret
- refresh_token: The token used to refresh the access token
- scopes (Optional): The scopes to request. Default: Empty list
- token_expiry_date (Optional): The access token expiration date formatted as RFC-3339 ("%Y-%m-%dT%H:%M:%S.%f%z")
- access_token_name (Optional): The field to extract access token from in the response. Default: "access_token".
- expires_in_name (Optional): The field to extract expires_in from in the response. Default: "expires_in"
- refresh_request_body (Optional): The request body to send in the refresh request. Default: None

```yaml
authenticator:
type: "OAuthAuthenticator"
token_refresh_endpoint: "https://api.searchmetrics.com/v4/token"
client_id: "{{ config['api_key'] }}"
client_secret: "{{ config['client_secret'] }}"
refresh_token: ""
```
177 changes: 177 additions & 0 deletions docs/connector-development/config-based/error-handling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Error handling

By default, only server errors (HTTP 5XX) and too many requests (HTTP 429) will be retried up to 5 times with exponential backoff.
Other HTTP errors will result in a failed read.

Other behaviors can be configured through the `Requester`'s `error_handler` field.

## Defining errors

### From status code

Response filters can be used to define how to handle requests resulting in responses with a specific HTTP status code.
For instance, this example will configure the handler to also retry responses with 404 error:

```yaml
requester:
<...>
error_handler:
response_filters:
- http_codes: [ 404 ]
action: RETRY
```
Response filters can be used to specify HTTP errors to ignore.
For instance, this example will configure the handler to ignore responses with 404 error:
```yaml
requester:
<...>
error_handler:
response_filters:
- http_codes: [ 404 ]
action: IGNORE
```
### From error message
Errors can also be defined by parsing the error message.
For instance, this error handler will ignores responses if the error message contains the string "ignorethisresponse"
```yaml
requester:
<...>
error_handler:
response_filters:
- error_message_contain: "ignorethisresponse"
action: IGNORE
```
This can also be done through a more generic string interpolation strategy with the following parameters:
- response: the decoded response
This example ignores errors where the response contains a "code" field:
```yaml
requester:
<...>
error_handler:
response_filters:
- predicate: "{{ 'code' in response }}"
action: IGNORE
```
The error handler can have multiple response filters.
The following example is configured to ignore 404 errors, and retry 429 errors:
```yaml
requester:
<...>
error_handler:
response_filters:
- http_codes: [ 404 ]
action: IGNORE
- http_codes: [ 429 ]
action: RETRY
```
## Backoff Strategies
The error handler supports a few backoff strategies, which are described in the following sections.
### Exponential backoff
This is the default backoff strategy. The requester will backoff with an exponential backoff interval
### Constant Backoff
When using the `ConstantBackoffStrategy`, the requester will backoff with a constant interval.

### Wait time defined in header

When using the `WaitTimeFromHeaderBackoffStrategy`, the requester will backoff by an interval specified in the response header.
In this example, the requester will backoff by the response's "wait_time" header value:

```yaml
requester:
<...>
error_handler:
<...>
backoff_strategies:
- type: "WaitTimeFromHeaderBackoffStrategy"
header: "wait_time"
```

Optionally, a regular expression can be configured to extract the wait time from the header value.

```yaml
requester:
<...>
error_handler:
<...>
backoff_strategies:
- type: "WaitTimeFromHeaderBackoffStrategy"
header: "wait_time"
regex: "[-+]?\d+"
```

### Wait until time defined in header

When using the `WaitUntilTimeFromHeaderBackoffStrategy`, the requester will backoff until the time specified in the response header.
In this example, the requester will wait until the time specified in the "wait_until" header value:

```yaml
requester:
<...>
error_handler:
<...>
backoff_strategies:
- type: "WaitUntilTimeFromHeaderBackoffStrategy"
header: "wait_until"
regex: "[-+]?\d+"
min_wait: 5
```

The strategy accepts an optional regular expression to extract the time from the header value, and a minimum time to wait.

## Advanced error handling

The error handler can have multiple backoff strategies, allowing it to fallback if a strategy cannot be evaluated.
For instance, the following defines an error handler that will read the backoff time from a header, and default to a constant backoff if the wait time could not be extracted from the response:

```yaml
requester:
<...>
error_handler:
<...>
backoff_strategies:
- type: "WaitTimeFromHeaderBackoffStrategy"
header: "wait_time"
- type: "ConstantBackoffStrategy"
backoff_time_in_seconds: 5
```

The `requester` can be configured to use a `CompositeErrorHandler`, which sequentially iterates over a list of error handlers, enabling different retry mechanisms for different types of errors.

In this example, a constant backoff of 5 seconds, will be applied if the response contains a "code" field, and an exponential backoff will be applied if the error code is 403:

```yaml
requester:
<...>
error_handler:
type: "CompositeErrorHandler"
error_handlers:
- response_filters:
- predicate: "{{ 'code' in response }}"
action: RETRY
backoff_strategies:
- type: "ConstantBackoffStrategy"
backoff_time_in_seconds: 5
- response_filters:
- http_codes: [ 403 ]
action: RETRY
backoff_strategies:
- type: "ExponentialBackoffStrategy"
```
26 changes: 26 additions & 0 deletions docs/connector-development/config-based/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Index

## From scratch

- [Overview](overview.md)
- [Yaml structure](overview.md)
- [Reference docs](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html)

## Concepts

- [Authentication](authentication.md)
- [Error handling](error-handling.md)
- [Pagination](pagination.md)
- [Record selection](record-selector.md)
- [Request options](request-options.md)
- [Stream slicers](stream-slicers.md)

## Tutorial

0. [Getting started](tutorial/0-getting-started.md)
1. [Creating a source](tutorial/1-create-source.md)
2. [Installing dependencies](tutorial/2-install-dependencies.md)
3. [Connecting to the API](tutorial/3-connecting-to-the-API-source.md)
4. [Reading data](tutorial/4-reading-data.md)
5. [Incremental reads](tutorial/5-incremental-reads.md)
6. [Testing](tutorial/6-testing.md)
Loading

0 comments on commit 288c3ca

Please sign in to comment.