From 4855a722090e685df1fc0639d88cdc4f7834264a Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 25 Jul 2022 16:02:52 -0700 Subject: [PATCH 01/92] 5-step tutorial --- .../config-based/0-getting-started.md | 42 +++ .../config-based/1-create-source.md | 27 ++ .../config-based/2-install-dependencies.md | 45 ++++ .../config-based/3-connecting.md | 180 +++++++++++++ .../config-based/4-reading-data.md | 180 +++++++++++++ .../config-based/5-incremental-reads.md | 255 ++++++++++++++++++ .../tutorials/cdk-api-source/intro.md | 8 + 7 files changed, 737 insertions(+) create mode 100644 docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md create mode 100644 docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md create mode 100644 docs/connector-development/tutorials/cdk-api-source/config-based/2-install-dependencies.md create mode 100644 docs/connector-development/tutorials/cdk-api-source/config-based/3-connecting.md create mode 100644 docs/connector-development/tutorials/cdk-api-source/config-based/4-reading-data.md create mode 100644 docs/connector-development/tutorials/cdk-api-source/config-based/5-incremental-reads.md create mode 100644 docs/connector-development/tutorials/cdk-api-source/intro.md diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md b/docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md new file mode 100644 index 000000000000..a1992c8ded37 --- /dev/null +++ b/docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md @@ -0,0 +1,42 @@ +# Getting Started + +## Summary + +Throughout this tutorial, we'll walk you through the creation an Airbyte source to read data from an HTTP API. + +We'll build a connector reading data from the Exchange Rates API, but the steps we'll go through will apply to other HTTP APIs you might be interested in integrating with. + +The API documentations can be found [here](https://exchangeratesapi.io/documentation/). +In this tutorial, we will read data from the following endpoints: + +- `Latest Rates Endpoint` +- `Historical Rates Endpoint` + +With the end goal of implementing a Source with a single `Stream` containing exchange rates going from a base currency to many other currencies. +The output schema of our stream will look like + +```json +{ + "base": "USD", + "date": "2022-07-15", + "rates": { + "CAD": 1.28, + "EUR": 0.98 + } +} +``` + +## Exchange Rates API Setup + +Before we can get started, you'll need to generate an API access key for the Exchange Rates API. +This can be done by signing up for the Free tier plan on [Exchange Rates API](https://exchangeratesapi.io/). + +## Requirements + +- Python >= 3.9 +- Docker +- NodeJS + +## Next Steps + +Next, we'll [create a Source using the connector generator.](./1-create-source.md) \ No newline at end of file diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md b/docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md new file mode 100644 index 000000000000..551d78aef6b7 --- /dev/null +++ b/docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md @@ -0,0 +1,27 @@ +# Step 1: Create the Source + +Let's start by cloning the Airbyte repository + +``` +git clone git@github.com:airbytehq/airbyte.git +``` + +Airbyte provides a code generator which bootstraps the scaffolding for our connector. + +``` +cd airbyte-integrations/connector-templates/generator +./generate.sh +``` + +This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Configuration Based Source` template and then input the name of your connector. The application will create a new directory in `airbyte/airbyte-integrations/connectors/` with the name of your new connector. + +``` +Configuration Based Source +Source name: exchange-rates-tutorial +``` + +For this walkthrough, we'll refer to our source as `exchange-rates-tutorial`. The complete source code for this tutorial can be found here [args]`. +The module's generated `README.md` contains more details on the supported commands. + +## Next steps + +Next, we'll [connect to the API source](./3-connecting.md) + +## More readings + +- [Basic Concepts](https://docs.airbyte.com/connector-development/cdk-python/basic-concepts) +- [Defining Stream Schemas](https://docs.airbyte.com/connector-development/cdk-python/schemas) +- The module's generated `README.md` contains more details on the supported commands. \ No newline at end of file diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/3-connecting.md b/docs/connector-development/tutorials/cdk-api-source/config-based/3-connecting.md new file mode 100644 index 000000000000..355e6aedc41d --- /dev/null +++ b/docs/connector-development/tutorials/cdk-api-source/config-based/3-connecting.md @@ -0,0 +1,180 @@ +# Step 3: Connecting to the API + +We're now ready to start implementing the connector. + +The code generator already created a boilerplate connector definition in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` + +``` +schema_loader: + type: JsonSchema + file_path: "./source_exchange_rates_tutorial/schemas/{{ name }}.json" +selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "_" +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: TODO "your_api_base_url" + http_method: "GET" + authenticator: + type: TokenAuthenticator + token: "{{ config['api_key'] }}" +retriever: + type: SimpleRetriever + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + ref: "*ref(selector)" + paginator: + type: NoPagination + state: + class_name: airbyte_cdk.sources.declarative.states.dict_state.DictState +customers_stream: + type: DeclarativeStream + options: + name: "customers" + primary_key: "id" + schema_loader: + ref: "*ref(schema_loader)" + retriever: + ref: "*ref(retriever)" + requester: + ref: "*ref(requester)" + path: TODO "your_endpoint_path" +streams: + - "*ref(customers_stream)" +check: + type: CheckStream + stream_names: ["customers_stream"] +``` + +Let's fill this out these TODOs with the information found in the exchange rates api docs https://exchangeratesapi.io/documentation/ + +1. First, let's rename the stream from `customers` to `rates. + +``` +rates_stream: + type: DeclarativeStream + options: + name: "rates" +``` + +and update the references in the streams list and check block + +``` +streams: + - "*ref(rates_stream)" +check: + type: CheckStream + stream_names: ["rates_stream"] +``` + +2. Next we'll set the base url. + According to the API documentation, the base url is "https://api.exchangeratesapi.io/v1/". + This can be set in the requester definition. + +``` +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" +``` + +3. We can fetch the latest data by submitting a request to "/latest". This path is specific to the stream, so we'll set within the `rates_stream` definition. + +``` +rates_stream: + type: DeclarativeStream + options: + name: "rates" + primary_key: "id" + schema_loader: + ref: "*ref(schema_loader)" + retriever: + ref: "*ref(retriever)" + requester: + ref: "*ref(requester)" + path: "/latest" +``` + +4. Next, we'll set up the authentication. + The Exchange Rates API requires an access key, which we'll need to make accessible to our connector. + We'll configure the connector to use this access key by setting the access key in a request parameter and pointing to a field in the config, which we'll populate in the next step: + +``` +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" +``` + +5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter: + +``` +request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" + base: "{{ config.base }}" +``` + +6. Let's populate the config so the connector can access the access key and base currency. + First, we'll add these properties to the connector spec in + `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` + +``` +documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi +connectionSpecification: + $schema: http://json-schema.org/draft-07/schema# + title: exchangeratesapi.io Source Spec + type: object + required: + - access_key + - base + additionalProperties: false + properties: + access_key: + type: string + description: >- + Your API Access Key. See here. The key is + case sensitive. + airbyte_secret: true + base: + type: string + description: >- + ISO reference currency. See here. + examples: + - EUR + - USD +``` + +7. We also need to fill in the connection config in the `secrets/config.json` + Because of the sensitive nature of the access key, we recommend storing this config in the `secrets` directory because it is ignored by git. + +``` +echo '{"access_key": "", "base": "USD"}' > secrets/config.json +``` + +We can now run the `check` operation, which verifies the connector can connect to the API source. + +``` +python main.py check --config secrets/config.json +``` + +which should now succeed with logs similar to: + +``` +{"type": "LOG", "log": {"level": "INFO", "message": "Check succeeded"}} +{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "SUCCEEDED"}} +``` + +## Next steps + +Next, we'll [extract the records from the response](4-reading-data.md) \ No newline at end of file diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/4-reading-data.md b/docs/connector-development/tutorials/cdk-api-source/config-based/4-reading-data.md new file mode 100644 index 000000000000..e7d0ae38c5c8 --- /dev/null +++ b/docs/connector-development/tutorials/cdk-api-source/config-based/4-reading-data.md @@ -0,0 +1,180 @@ +# Step 3: Reading data + +Now that we're able to authenticate to the source API, we'll want to extract data from the responses. +Let's first add the stream to the configured catalog in `source-exchange_rates-tutorial/integration_tests/configured_catalog.json` + +``` +{ + "streams": [ + { + "stream": { + "name": "rates", + "json_schema": {}, + "supported_sync_modes": [ + "full_refresh" + ] + }, + "sync_mode": "full_refresh", + "destination_sync_mode": "overwrite" + } + ] +} +``` + +The configured catalog declares the sync modes supported by the stream \(full refresh or incremental\). +See the [catalog tutorial](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) for more information. + +Let's define the stream schema in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/schemas/rates.json` +Note that the code sample below only contains CAD, EUR, and USD for simplicity. Other currencies can be added to the schema as needed. + +``` +{ + "type": "object", + "required": [ + "base", + "date", + "rates" + ], + "properties": { + "base": { + "type": "string" + }, + "date": { + "type": "string" + }, + "rates": { + "type": "object", + "properties": { + "CAD": { + "type": [ + "null", + "number" + ] + }, + "EUR": { + "type": [ + "null", + "number" + ] + }, + "USD": { + "type": [ + "null", + "number" + ] + } + } + } + } +} +``` + +You can download the JSON file describing the output schema with all currencies [here](https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json) for convenience and place it in `schemas/`. + +``` +curl https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json > source_exchange_rates_tutorial/schemas/rates.json +``` + +We can also delete the boilerplate schema files + +``` +rm source_exchange_rates_tutorial/schemas/customers.json +rm source_exchange_rates_tutorial/schemas/employees.json +``` + +Next, we'll update the record selection to wrap the single record returned by the source in an array. + +``` +selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" +``` + +The transform is defined using the `Jello` syntax, which is a Python-based JQ alternative. More details on Jello can be found [here](https://github.com/kellyjonbrazil/jello). + +We'll also set the primary key to `date`. + +``` +rates_stream: + type: DeclarativeStream + options: + name: "rates" + primary_key: "date" +``` + +Here is the complete connector definition for convenience: + +``` +schema_loader: + type: JsonSchema + file_path: "./source_exchange_rates_tutorial/schemas/{{ name }}.json" +selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" + base: "{{ config.base }}" + authenticator: + type: TokenAuthenticator + token: "{{ config['api_key'] }}" +retriever: + type: SimpleRetriever + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + ref: "*ref(selector)" + paginator: + type: NoPagination + state: + class_name: airbyte_cdk.sources.declarative.states.dict_state.DictState +rates_stream: + type: DeclarativeStream + options: + name: "rates" + primary_key: "date" + schema_loader: + ref: "*ref(schema_loader)" + retriever: + ref: "*ref(retriever)" + requester: + ref: "*ref(requester)" + path: "/latest" +streams: + - "*ref(rates_stream)" +check: + type: CheckStream + stream_names: ["rates"] +``` + +Reading from the source can be done by running the `read` operation + +``` +python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json +``` + +The logs should show that 1 record was read from the stream. + +``` +{"type": "LOG", "log": {"level": "INFO", "message": "Read 1 records from rates stream"}} +{"type": "LOG", "log": {"level": "INFO", "message": "Finished syncing rates"}} +``` + +The `--debug` flag can be set to print out debug information, including the outgoing request and its associated response + +```python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --debug``` + +## Next steps + +We now have a working implementation of a connector reading the latest exchange rates for a given currency. +We're however limited to only reading the latest exchange rate value. +Next, we'll ([enhance the connector to read data for a given date, which will enable us to backfill the stream with historical data.](5-incremental-reads.md) \ No newline at end of file diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/5-incremental-reads.md b/docs/connector-development/tutorials/cdk-api-source/config-based/5-incremental-reads.md new file mode 100644 index 000000000000..007f47f88e40 --- /dev/null +++ b/docs/connector-development/tutorials/cdk-api-source/config-based/5-incremental-reads.md @@ -0,0 +1,255 @@ +# Step 5: Incremental Reads + +We now have a working implementation of a connector reading the latest exchange rates for a given currency. +In this section, we'll update the source to read historical data instead of only reading the latest exchange rates. + +According to the API documentation, we can read the exchange rate for a specific date by querying the "/{date}" endpoint instead of "/latest". + +We'll now add a `start_date` property to the connector. + +First we'll update the spec `source_exchange_rates_tutorial/spec.yaml` + +``` +documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi +connectionSpecification: + $schema: http://json-schema.org/draft-07/schema# + title: exchangeratesapi.io Source Spec + type: object + required: + - start_date + - access_key + - base + additionalProperties: false + properties: + start_date: + type: string + description: Start getting data from that date. + pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}$ + examples: + - YYYY-MM-DD + access_key: + type: string + description: >- + Your API Access Key. See here. The key is + case sensitive. + airbyte_secret: true + base: + type: string + description: >- + ISO reference currency. See here. + examples: + - EUR + - USD +``` + +Then we'll set the `start_date` to last week our connection config in `secrets/config.json`. +The following `echo` command will update your config with a start date set at 7 days prior to today. + +``` +echo "{\"access_key\": \"\", \"start_date\": \"$(date -v -7d '+%Y-%m-%d')\", \"base\": \"USD\"}" > secrets/config.json +``` + +And we'll update the `path` in the connector definition to point to `/{{ config.start_date }}`. +Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/latest`: + +``` +retriever: + requester: + path: + type: "InterpolatedString" + string: "{{ stream_slice.start_date }}" + default: "/latest" +``` + +You can test the connector by executing the `read` operation: + +```python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json``` + +By reading the output record, you should see that we read historical data instead of the latest exchange rate. +For example: +> "historical": true, "base": "USD", "date": "2022-07-18" + +The connector will now always read data for the start date, which is not exactly what we want. +Instead, we would like to iterate over all the dates between the start_date and today and read data for each day. + +We can do this by adding a `DatetimeStreamSlicer` to the connector definition, and update the `path` to point to the stream_slice's `start_date`: +More details on the stream slicers can be found [here](./link-to-stream-slicers.md) + +Let's first define a stream slicer at the top level of the connector definition: + +``` +stream_slicer: + type: "DatetimeStreamSlicer" + start_datetime: + datetime: "{{ config.start_date }}" + datetime_format: "%Y-%m-%d" + end_datetime: + datetime: "{{ now_local() }}" + datetime_format: "%Y-%m-%d %H:%M:%S.%f" + step: "1d" + datetime_format: "%Y-%m-%d" +``` + +and refer to it in the stream's retriever. Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: + +``` +rates_stream: + type: DeclarativeStream + options: + name: "rates" + cursor_field: "date + primary_key: "date" + schema_loader: + ref: "*ref(schema_loader)" + retriever: + ref: "*ref(retriever)" + stream_slicer: + ref: "*ref(stream_slicer)" +``` + +And we'll update the path to point to the `stream_slice`'s start_date + +``` +requester: + ref: "*ref(requester)" + path: "{{ stream_slice.start_date }}" +``` + +The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: + +``` +schema_loader: + class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema + name: "rates" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" +selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" + base: "{{ config.base }}" +stream_slicer: + type: "DatetimeStreamSlicer" + start_datetime: + datetime: "{{ config.start_date }}" + datetime_format: "%Y-%m-%d" + end_datetime: + datetime: "{{ now_local() }}" + datetime_format: "%Y-%m-%d %H:%M:%S.%f" + step: "1d" + datetime_format: "%Y-%m-%d" + cursor_field: "{{ options.cursor_field }}" +retriever: + type: SimpleRetriever + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + ref: "*ref(selector)" + paginator: + type: NoPagination +rates_stream: + type: DeclarativeStream + $options: + name: "rates" + cursor_field: "date" + primary_key: "date" + schema_loader: + ref: "*ref(schema_loader)" + retriever: + ref: "*ref(retriever)" + stream_slicer: + ref: "*ref(stream_slicer)" + requester: + ref: "*ref(requester)" + path: + type: "InterpolatedString" + string: "{{ stream_slice.start_date }}" + default: "/latest" +streams: + - "*ref(rates_stream)" +check: + type: CheckStream + stream_names: ["rates"] +``` + +Running the `read` operation will now read all data for all days between start_date and now: + +``` +python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json +``` + +The operation should now output more than one record: + +``` +{"type": "LOG", "log": {"level": "INFO", "message": "Read 8 records from rates stream"}} +``` + +## Supporting incremental syncs + +Instead of always reading data for all dates, we would like the connector to only read data for dates we haven't read yet. +This can be achieved by updating the catalog to run in incremental mode (`integration_tests/configured_catalog.json`): + +``` +{ + "streams": [ + { + "stream": { + "name": "rates", + "json_schema": {}, + "supported_sync_modes": [ + "full_refresh", + "incremental" + ] + }, + "sync_mode": "incremental", + "destination_sync_mode": "overwrite" + } + ] +} +``` + +In addition to records, the `read` operation now also outputs state messages: + +``` +{"type": "STATE", "state": {"data": {"rates": {"date": "2022-07-15"}}}} +``` + +Where the date ("2022-07-15") should be replaced by today's date. + +We can simulate incremental syncs by creating a state file containing the last state produced by the `read` operation. +`source-exchange-rates-tutorial/integration_tests/sample_state.json`: + +``` +{ + "rates": { + "date": "2022-07-15" + } +} +``` + +Running the `read` operation will now only read data for dates later than the given state: + +``` +python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --state integration_tests/sample_state.json +``` + +There shouldn't be any data read if the state is today's date: + +``` +{"type": "LOG", "log": {"level": "INFO", "message": "Setting state of rates stream to {'date': '2022-07-15'}"}} +{"type": "LOG", "log": {"level": "INFO", "message": "Read 0 records from rates stream"}} +``` + +## Next steps: + +Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). diff --git a/docs/connector-development/tutorials/cdk-api-source/intro.md b/docs/connector-development/tutorials/cdk-api-source/intro.md new file mode 100644 index 000000000000..7d9fc548906c --- /dev/null +++ b/docs/connector-development/tutorials/cdk-api-source/intro.md @@ -0,0 +1,8 @@ +# Getting Started +## Summary +This is a step-by-step guide for how to create an Airbyte source to read data from an HTTP API. +There are multiple ways to implement connectors for HTTP APIs depending on your needs and your toolset of choice. +In general, we recommend people build config-based connectors, and fallback to the Python CDK if more customization is needed. +The CDK is also available in C# .NET and in TypeScript/Javascript, but these implementations are not actively maintained by Airbyte. + +### TODO: Here be a quick guide to help decide whether the dev should follow the low-code or python CDK tutorial... \ No newline at end of file From 138bd52c21c562d232d4fd12ac27299e1c7b3a86 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 25 Jul 2022 17:59:13 -0700 Subject: [PATCH 02/92] move --- .../tutorials/cdk-api-source/intro.md | 8 -------- .../config-based/0-getting-started.md | 2 +- .../{cdk-api-source => }/config-based/1-create-source.md | 2 +- .../config-based/2-install-dependencies.md | 2 +- .../{cdk-api-source => }/config-based/3-connecting.md | 0 .../{cdk-api-source => }/config-based/4-reading-data.md | 0 .../config-based/5-incremental-reads.md | 0 7 files changed, 3 insertions(+), 11 deletions(-) delete mode 100644 docs/connector-development/tutorials/cdk-api-source/intro.md rename docs/connector-development/tutorials/{cdk-api-source => }/config-based/0-getting-started.md (93%) rename docs/connector-development/tutorials/{cdk-api-source => }/config-based/1-create-source.md (97%) rename docs/connector-development/tutorials/{cdk-api-source => }/config-based/2-install-dependencies.md (97%) rename docs/connector-development/tutorials/{cdk-api-source => }/config-based/3-connecting.md (100%) rename docs/connector-development/tutorials/{cdk-api-source => }/config-based/4-reading-data.md (100%) rename docs/connector-development/tutorials/{cdk-api-source => }/config-based/5-incremental-reads.md (100%) diff --git a/docs/connector-development/tutorials/cdk-api-source/intro.md b/docs/connector-development/tutorials/cdk-api-source/intro.md deleted file mode 100644 index 7d9fc548906c..000000000000 --- a/docs/connector-development/tutorials/cdk-api-source/intro.md +++ /dev/null @@ -1,8 +0,0 @@ -# Getting Started -## Summary -This is a step-by-step guide for how to create an Airbyte source to read data from an HTTP API. -There are multiple ways to implement connectors for HTTP APIs depending on your needs and your toolset of choice. -In general, we recommend people build config-based connectors, and fallback to the Python CDK if more customization is needed. -The CDK is also available in C# .NET and in TypeScript/Javascript, but these implementations are not actively maintained by Airbyte. - -### TODO: Here be a quick guide to help decide whether the dev should follow the low-code or python CDK tutorial... \ No newline at end of file diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md b/docs/connector-development/tutorials/config-based/0-getting-started.md similarity index 93% rename from docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md rename to docs/connector-development/tutorials/config-based/0-getting-started.md index a1992c8ded37..8c7fb19809f9 100644 --- a/docs/connector-development/tutorials/cdk-api-source/config-based/0-getting-started.md +++ b/docs/connector-development/tutorials/config-based/0-getting-started.md @@ -39,4 +39,4 @@ This can be done by signing up for the Free tier plan on [Exchange Rates API](ht ## Next Steps -Next, we'll [create a Source using the connector generator.](./1-create-source.md) \ No newline at end of file +Next, we'll [create a Source using the connector generator.](1-create-source.md) \ No newline at end of file diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md b/docs/connector-development/tutorials/config-based/1-create-source.md similarity index 97% rename from docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md rename to docs/connector-development/tutorials/config-based/1-create-source.md index 551d78aef6b7..4f46c605f7c1 100644 --- a/docs/connector-development/tutorials/cdk-api-source/config-based/1-create-source.md +++ b/docs/connector-development/tutorials/config-based/1-create-source.md @@ -24,4 +24,4 @@ For this walkthrough, we'll refer to our source as `exchange-rates-tutorial`. Th ## Next steps -Next, [we'll install dependencies required to run the connector](./2-install-dependencies.md) +Next, [we'll install dependencies required to run the connector](2-install-dependencies.md) diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/2-install-dependencies.md b/docs/connector-development/tutorials/config-based/2-install-dependencies.md similarity index 97% rename from docs/connector-development/tutorials/cdk-api-source/config-based/2-install-dependencies.md rename to docs/connector-development/tutorials/config-based/2-install-dependencies.md index 390e97528401..30297beadf35 100644 --- a/docs/connector-development/tutorials/cdk-api-source/config-based/2-install-dependencies.md +++ b/docs/connector-development/tutorials/config-based/2-install-dependencies.md @@ -36,7 +36,7 @@ The module's generated `README.md` contains more details on the supported comman ## Next steps -Next, we'll [connect to the API source](./3-connecting.md) +Next, we'll [connect to the API source](3-connecting.md) ## More readings diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/3-connecting.md b/docs/connector-development/tutorials/config-based/3-connecting.md similarity index 100% rename from docs/connector-development/tutorials/cdk-api-source/config-based/3-connecting.md rename to docs/connector-development/tutorials/config-based/3-connecting.md diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/4-reading-data.md b/docs/connector-development/tutorials/config-based/4-reading-data.md similarity index 100% rename from docs/connector-development/tutorials/cdk-api-source/config-based/4-reading-data.md rename to docs/connector-development/tutorials/config-based/4-reading-data.md diff --git a/docs/connector-development/tutorials/cdk-api-source/config-based/5-incremental-reads.md b/docs/connector-development/tutorials/config-based/5-incremental-reads.md similarity index 100% rename from docs/connector-development/tutorials/cdk-api-source/config-based/5-incremental-reads.md rename to docs/connector-development/tutorials/config-based/5-incremental-reads.md From 637c2a766849653f2ad7f65427a072caac5b9b9f Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 25 Jul 2022 18:03:55 -0700 Subject: [PATCH 03/92] tiny bit of editing --- .../tutorials/config-based/0-getting-started.md | 4 ++-- .../tutorials/config-based/3-connecting.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/connector-development/tutorials/config-based/0-getting-started.md b/docs/connector-development/tutorials/config-based/0-getting-started.md index 8c7fb19809f9..aa990f8a501f 100644 --- a/docs/connector-development/tutorials/config-based/0-getting-started.md +++ b/docs/connector-development/tutorials/config-based/0-getting-started.md @@ -4,7 +4,7 @@ Throughout this tutorial, we'll walk you through the creation an Airbyte source to read data from an HTTP API. -We'll build a connector reading data from the Exchange Rates API, but the steps we'll go through will apply to other HTTP APIs you might be interested in integrating with. +We'll build a connector reading data from the Exchange Rates API, but the steps will apply to other HTTP APIs you might be interested in integrating with. The API documentations can be found [here](https://exchangeratesapi.io/documentation/). In this tutorial, we will read data from the following endpoints: @@ -13,7 +13,7 @@ In this tutorial, we will read data from the following endpoints: - `Historical Rates Endpoint` With the end goal of implementing a Source with a single `Stream` containing exchange rates going from a base currency to many other currencies. -The output schema of our stream will look like +The output schema of our stream will look like the following: ```json { diff --git a/docs/connector-development/tutorials/config-based/3-connecting.md b/docs/connector-development/tutorials/config-based/3-connecting.md index 355e6aedc41d..45ac9fd6511c 100644 --- a/docs/connector-development/tutorials/config-based/3-connecting.md +++ b/docs/connector-development/tutorials/config-based/3-connecting.md @@ -50,7 +50,7 @@ check: stream_names: ["customers_stream"] ``` -Let's fill this out these TODOs with the information found in the exchange rates api docs https://exchangeratesapi.io/documentation/ +Let's fill this out these TODOs with the information found in the [Exchange Rates API docs](https://exchangeratesapi.io/documentation/) 1. First, let's rename the stream from `customers` to `rates. @@ -72,7 +72,7 @@ check: ``` 2. Next we'll set the base url. - According to the API documentation, the base url is "https://api.exchangeratesapi.io/v1/". + According to the API documentation, the base url is `"https://api.exchangeratesapi.io/v1/"`. This can be set in the requester definition. ``` From ff775e3dce9e7ec2e9f86374a4ad90c77234ee2a Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Thu, 28 Jul 2022 11:03:46 -0700 Subject: [PATCH 04/92] Update tutorial --- .../tutorials/config-based/3-connecting.md | 15 +++-- .../tutorials/config-based/4-reading-data.md | 60 ++----------------- .../config-based/5-incremental-reads.md | 27 +++++---- 3 files changed, 27 insertions(+), 75 deletions(-) diff --git a/docs/connector-development/tutorials/config-based/3-connecting.md b/docs/connector-development/tutorials/config-based/3-connecting.md index 45ac9fd6511c..bbc0d0114d5c 100644 --- a/docs/connector-development/tutorials/config-based/3-connecting.md +++ b/docs/connector-development/tutorials/config-based/3-connecting.md @@ -26,28 +26,27 @@ retriever: name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: - ref: "*ref(selector)" + $ref: "*ref(selector)" paginator: type: NoPagination - state: - class_name: airbyte_cdk.sources.declarative.states.dict_state.DictState customers_stream: type: DeclarativeStream - options: + $options: name: "customers" primary_key: "id" schema_loader: - ref: "*ref(schema_loader)" + $ref: "*ref(schema_loader)" retriever: - ref: "*ref(retriever)" + $ref: "*ref(retriever)" requester: - ref: "*ref(requester)" + $ref: "*ref(requester)" path: TODO "your_endpoint_path" streams: - "*ref(customers_stream)" check: type: CheckStream stream_names: ["customers_stream"] + ``` Let's fill this out these TODOs with the information found in the [Exchange Rates API docs](https://exchangeratesapi.io/documentation/) @@ -136,7 +135,7 @@ connectionSpecification: required: - access_key - base - additionalProperties: false + additionalProperties: true properties: access_key: type: string diff --git a/docs/connector-development/tutorials/config-based/4-reading-data.md b/docs/connector-development/tutorials/config-based/4-reading-data.md index e7d0ae38c5c8..dbfecd2501da 100644 --- a/docs/connector-development/tutorials/config-based/4-reading-data.md +++ b/docs/connector-development/tutorials/config-based/4-reading-data.md @@ -25,49 +25,6 @@ The configured catalog declares the sync modes supported by the stream \(full re See the [catalog tutorial](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) for more information. Let's define the stream schema in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/schemas/rates.json` -Note that the code sample below only contains CAD, EUR, and USD for simplicity. Other currencies can be added to the schema as needed. - -``` -{ - "type": "object", - "required": [ - "base", - "date", - "rates" - ], - "properties": { - "base": { - "type": "string" - }, - "date": { - "type": "string" - }, - "rates": { - "type": "object", - "properties": { - "CAD": { - "type": [ - "null", - "number" - ] - }, - "EUR": { - "type": [ - "null", - "number" - ] - }, - "USD": { - "type": [ - "null", - "number" - ] - } - } - } - } -} -``` You can download the JSON file describing the output schema with all currencies [here](https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json) for convenience and place it in `schemas/`. @@ -109,7 +66,7 @@ Here is the complete connector definition for convenience: ``` schema_loader: type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ name }}.json" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" selector: type: RecordSelector extractor: @@ -124,30 +81,25 @@ requester: request_parameters: access_key: "{{ config.access_key }}" base: "{{ config.base }}" - authenticator: - type: TokenAuthenticator - token: "{{ config['api_key'] }}" retriever: type: SimpleRetriever name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: - ref: "*ref(selector)" + $ref: "*ref(selector)" paginator: type: NoPagination - state: - class_name: airbyte_cdk.sources.declarative.states.dict_state.DictState rates_stream: type: DeclarativeStream - options: + $options: name: "rates" primary_key: "date" schema_loader: - ref: "*ref(schema_loader)" + $ref: "*ref(schema_loader)" retriever: - ref: "*ref(retriever)" + $ref: "*ref(retriever)" requester: - ref: "*ref(requester)" + $ref: "*ref(requester)" path: "/latest" streams: - "*ref(rates_stream)" diff --git a/docs/connector-development/tutorials/config-based/5-incremental-reads.md b/docs/connector-development/tutorials/config-based/5-incremental-reads.md index 007f47f88e40..6ae4101f1767 100644 --- a/docs/connector-development/tutorials/config-based/5-incremental-reads.md +++ b/docs/connector-development/tutorials/config-based/5-incremental-reads.md @@ -19,7 +19,7 @@ connectionSpecification: - start_date - access_key - base - additionalProperties: false + additionalProperties: true properties: start_date: type: string @@ -59,7 +59,7 @@ retriever: requester: path: type: "InterpolatedString" - string: "{{ stream_slice.start_date }}" + string: "{{ config.start_date }}" default: "/latest" ``` @@ -114,15 +114,17 @@ And we'll update the path to point to the `stream_slice`'s start_date ``` requester: ref: "*ref(requester)" - path: "{{ stream_slice.start_date }}" + path: + type: "InterpolatedString" + string: "{{ stream_slice.start_date }}" + default: "/latest" ``` The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: ``` schema_loader: - class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema - name: "rates" + type: JsonSchema file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" selector: type: RecordSelector @@ -148,15 +150,16 @@ stream_slicer: datetime_format: "%Y-%m-%d %H:%M:%S.%f" step: "1d" datetime_format: "%Y-%m-%d" - cursor_field: "{{ options.cursor_field }}" retriever: type: SimpleRetriever name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: - ref: "*ref(selector)" + $ref: "*ref(selector)" paginator: type: NoPagination + stream_slicer: + $ref: "*ref(stream_slicer)" rates_stream: type: DeclarativeStream $options: @@ -164,13 +167,11 @@ rates_stream: cursor_field: "date" primary_key: "date" schema_loader: - ref: "*ref(schema_loader)" + $ref: "*ref(schema_loader)" retriever: - ref: "*ref(retriever)" - stream_slicer: - ref: "*ref(stream_slicer)" + $ref: "*ref(retriever)" requester: - ref: "*ref(requester)" + $ref: "*ref(requester)" path: type: "InterpolatedString" string: "{{ stream_slice.start_date }}" @@ -252,4 +253,4 @@ There shouldn't be any data read if the state is today's date: ## Next steps: -Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). +Next, we'll run the [Source Acceptance Tests suite to ensure the connector invariants are respected](6-testing.md). From 6ebee74a19c695b64151297e928ca7fdf532a59b Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 07:17:11 -0700 Subject: [PATCH 05/92] update docs --- .../sources/declarative/auth/token.py | 2 +- .../parsers/class_types_registry.py | 2 + .../declarative/parsers/yaml_parser.py | 2 +- .../requesters/paginators/limit_paginator.py | 6 +- .../test_datetime_stream_slicer.py | 11 ++ .../config-based/0-getting-started.md | 9 +- .../tutorials/config-based/1-create-source.md | 2 + .../tutorials/config-based/3-connecting.md | 16 +- .../tutorials/config-based/4-reading-data.md | 13 +- .../config-based/5-incremental-reads.md | 12 ++ .../tutorials/config-based/6-testing.md | 108 +++++++++++ .../config-based/concepts/authenticator.md | 70 +++++++ .../config-based/concepts/error-handling.md | 173 ++++++++++++++++++ .../config-based/concepts/overview.md | 77 ++++++++ .../config-based/concepts/pagination.md | 100 ++++++++++ .../config-based/concepts/stream-slicers.md | 165 +++++++++++++++++ 16 files changed, 758 insertions(+), 10 deletions(-) create mode 100644 docs/connector-development/tutorials/config-based/6-testing.md create mode 100644 docs/connector-development/tutorials/config-based/concepts/authenticator.md create mode 100644 docs/connector-development/tutorials/config-based/concepts/error-handling.md create mode 100644 docs/connector-development/tutorials/config-based/concepts/overview.md create mode 100644 docs/connector-development/tutorials/config-based/concepts/pagination.md create mode 100644 docs/connector-development/tutorials/config-based/concepts/stream-slicers.md diff --git a/airbyte-cdk/python/airbyte_cdk/sources/declarative/auth/token.py b/airbyte-cdk/python/airbyte_cdk/sources/declarative/auth/token.py index 2b28f7f85941..8b4b26e5c222 100644 --- a/airbyte-cdk/python/airbyte_cdk/sources/declarative/auth/token.py +++ b/airbyte-cdk/python/airbyte_cdk/sources/declarative/auth/token.py @@ -70,7 +70,7 @@ def token(self) -> str: class BasicHttpAuthenticator(AbstractHeaderAuthenticator): """ - Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using bas64 + Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using base64 https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme The header is of the form diff --git a/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py b/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py index ad0c268e1ac1..d35438a5a94e 100644 --- a/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py +++ b/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py @@ -4,6 +4,7 @@ from typing import Mapping, Type +from airbyte_cdk.sources.declarative.auth.oauth import DeclarativeOauth2Authenticator from airbyte_cdk.sources.declarative.auth.token import ApiKeyAuthenticator, BasicHttpAuthenticator, BearerAuthenticator from airbyte_cdk.sources.declarative.datetime.min_max_datetime import MinMaxDatetime from airbyte_cdk.sources.declarative.declarative_stream import DeclarativeStream @@ -56,6 +57,7 @@ "ListStreamSlicer": ListStreamSlicer, "MinMaxDatetime": MinMaxDatetime, "NoPagination": NoPagination, + "OAuthAuthenticator": DeclarativeOauth2Authenticator, "OffsetIncrement": OffsetIncrement, "RecordSelector": RecordSelector, "RemoveFields": RemoveFields, diff --git a/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/yaml_parser.py b/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/yaml_parser.py index b9885c6e1043..31518c74849b 100644 --- a/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/yaml_parser.py +++ b/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/yaml_parser.py @@ -15,7 +15,7 @@ class YamlParser(ConnectionDefinitionParser): """ Parses a Yaml string to a ConnectionDefinition - In addition to standard Yaml parsing, the input_string can contain refererences to values previously defined. + In addition to standard Yaml parsing, the input_string can contain references to values previously defined. This parser will dereference these values to produce a complete ConnectionDefinition. References can be defined using a *ref() string. diff --git a/airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/limit_paginator.py b/airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/limit_paginator.py index 6a41b4cf1b86..0659a75818b2 100644 --- a/airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/limit_paginator.py +++ b/airbyte-cdk/python/airbyte_cdk/sources/declarative/requesters/paginators/limit_paginator.py @@ -24,7 +24,7 @@ class LimitPaginator(Paginator): * updates the request path with "{{ response._metadata.next }}" paginator: type: "LimitPaginator" - limit_value: 10 + page_size: 10 limit_option: option_type: request_parameter field_name: page_size @@ -41,7 +41,7 @@ class LimitPaginator(Paginator): ` paginator: type: "LimitPaginator" - limit_value: 5 + page_size: 5 limit_option: option_type: header field_name: page_size @@ -58,7 +58,7 @@ class LimitPaginator(Paginator): ` paginator: type: "LimitPaginator" - limit_value: 5 + page_size: 5 limit_option: option_type: request_parameter field_name: page_size diff --git a/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py b/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py index c34f4ada975d..470dccdd7417 100644 --- a/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py +++ b/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py @@ -255,6 +255,17 @@ def mock_datetime_now(monkeypatch): {"start_date": "2021-01-05T00:00:00.000000+0000", "end_date": "2021-01-05T00:00:00.000000+0000"}, ], ), + ( + "test_docs", + None, + MinMaxDatetime("2021-02-01T00:00:00.000000+0000"), + MinMaxDatetime("2021-03-01T00:00:00.000000+0000"), + "1d", + cursor_field, + "31d", + datetime_format, + [], + ), ( "test_start_is_after_stream_state", {cursor_field: "2021-01-05T00:00:00.000000+0000"}, diff --git a/docs/connector-development/tutorials/config-based/0-getting-started.md b/docs/connector-development/tutorials/config-based/0-getting-started.md index aa990f8a501f..d072a21ce4d0 100644 --- a/docs/connector-development/tutorials/config-based/0-getting-started.md +++ b/docs/connector-development/tutorials/config-based/0-getting-started.md @@ -39,4 +39,11 @@ This can be done by signing up for the Free tier plan on [Exchange Rates API](ht ## Next Steps -Next, we'll [create a Source using the connector generator.](1-create-source.md) \ No newline at end of file +Next, we'll [create a Source using the connector generator.](1-create-source.md) + +## More readings + +- Source +- Stream +- Schema +- \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/1-create-source.md b/docs/connector-development/tutorials/config-based/1-create-source.md index 4f46c605f7c1..3a988b33e7c4 100644 --- a/docs/connector-development/tutorials/config-based/1-create-source.md +++ b/docs/connector-development/tutorials/config-based/1-create-source.md @@ -25,3 +25,5 @@ For this walkthrough, we'll refer to our source as `exchange-rates-tutorial`. Th ## Next steps Next, [we'll install dependencies required to run the connector](2-install-dependencies.md) + +## Note - Maybe this should be combined with 0 or 2 \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/3-connecting.md b/docs/connector-development/tutorials/config-based/3-connecting.md index bbc0d0114d5c..a0429c141813 100644 --- a/docs/connector-development/tutorials/config-based/3-connecting.md +++ b/docs/connector-development/tutorials/config-based/3-connecting.md @@ -7,7 +7,7 @@ The code generator already created a boilerplate connector definition in `sourc ``` schema_loader: type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ name }}.json" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" selector: type: RecordSelector extractor: @@ -176,4 +176,16 @@ which should now succeed with logs similar to: ## Next steps -Next, we'll [extract the records from the response](4-reading-data.md) \ No newline at end of file +Next, we'll [extract the records from the response](4-reading-data.md) + +## More readings + +- +- declarative stream +- check stream +- http requester +- authentication +- request options providers +- config +- spec file +- check operation \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/4-reading-data.md b/docs/connector-development/tutorials/config-based/4-reading-data.md index dbfecd2501da..ff024c257cc3 100644 --- a/docs/connector-development/tutorials/config-based/4-reading-data.md +++ b/docs/connector-development/tutorials/config-based/4-reading-data.md @@ -1,6 +1,6 @@ # Step 3: Reading data -Now that we're able to authenticate to the source API, we'll want to extract data from the responses. +Now that we're able to authenticate to the source API, we'll want to select data from the HTTP responses. Let's first add the stream to the configured catalog in `source-exchange_rates-tutorial/integration_tests/configured_catalog.json` ``` @@ -129,4 +129,13 @@ The `--debug` flag can be set to print out debug information, including the outg We now have a working implementation of a connector reading the latest exchange rates for a given currency. We're however limited to only reading the latest exchange rate value. -Next, we'll ([enhance the connector to read data for a given date, which will enable us to backfill the stream with historical data.](5-incremental-reads.md) \ No newline at end of file +Next, we'll ([enhance the connector to read data for a given date, which will enable us to backfill the stream with historical data.](5-incremental-reads.md) + +## More readings + +- record selectors +- catalog tutorial +- jello +- read operation +- primary key +- declarative stream \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/5-incremental-reads.md b/docs/connector-development/tutorials/config-based/5-incremental-reads.md index 6ae4101f1767..2d55d46b6396 100644 --- a/docs/connector-development/tutorials/config-based/5-incremental-reads.md +++ b/docs/connector-development/tutorials/config-based/5-incremental-reads.md @@ -254,3 +254,15 @@ There shouldn't be any data read if the state is today's date: ## Next steps: Next, we'll run the [Source Acceptance Tests suite to ensure the connector invariants are respected](6-testing.md). + +## More readings + +- incrementals (general guide) +- incrementals (low-code specific +- spec file +- config +- stream slicer +- datetime stream slicer +- cursor +- options +- requester \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/6-testing.md b/docs/connector-development/tutorials/config-based/6-testing.md new file mode 100644 index 000000000000..8c58e76e31f2 --- /dev/null +++ b/docs/connector-development/tutorials/config-based/6-testing.md @@ -0,0 +1,108 @@ +# Step 5: Testing + +We should make sure the connector respects the Airbyte specifications before we start using it in production. +This can be done by executing the Source-Acceptance Tests (SAT). + +These tests will assert the most basic functionalities work as expected and are configured in `acceptance-test-config`. + +Before running the tests, we'll create an invalid config to make sure the `check` operation fails if the credentials are wrong, and an abnormal state to verify the connector's behavior when running with an abnormal state. + +Update `integration_tests/invalid_config.json` with this content + +``` +{"access_key": "", "start_date": "2022-07-21", "base": "USD"} +``` + +and `integration_tests/abnormal_state.json` with + +``` +{ + "rates": { + "date": "2999-12-31" + } +} + +``` + +You can build the connector's docker image and run the acceptance tests by running the following commands: + +``` +docker build . -t airbyte/source-exchange-rates-tutorial:dev +python -m pytest integration_tests -p integration_tests.acceptance +``` + +1 test should be failing + +``` +airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py:183 TestConnection.test_check[inputs1] +``` + +This test is failing because the `check` operation is succeeding even with invalid credentials. +This can be confirmed by running + +``` +python main.py check --config integration_tests/invalid_config.json +``` + +The `--debug` flag can be used to inspect the response: + +``` +python main.py check --debug --config integration_tests/invalid_config.json +``` + +You should see a message similar to this one: + +``` +{"type": "DEBUG", "message": "Receiving response", "data": {"headers": "{'Date': 'Thu, 28 Jul 2022 17:56:31 GMT', 'Content-Type': 'application/json; Charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'access-control-allow-methods': 'GET, HEAD, POST, PUT, PATCH, DELETE, OPTIONS', 'access-control-allow-origin': '*', 'x-blocked-at-loadbalancer': '1', 'CF-Cache-Status': 'DYNAMIC', 'Expect-CT': 'max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"', 'Report-To': '{\"endpoints\":[{\"url\":\"https:\\\\/\\\\/a.nel.cloudflare.com\\\\/report\\\\/v3?s=MpyuXqiuxH%2FEA1%2F75CQiP4bPOt0DKeg9utWdBShkseCK9f4G8R9K126fe65nIvsKWQVGMTou%2BeTRCq%2FCzgoxr2B1BT%2Bm3l6i0DFDu5sYAqHAWzd9pSoqJZ6jktjQgB5D%2BqG7jQvhIDnK\"}],\"group\":\"cf-nel\",\"max_age\":604800}', 'NEL': '{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}', 'Server': 'cloudflare', 'CF-RAY': '731f7df109709e68-SJC', 'Content-Encoding': 'gzip'}", "status": "200", "body": "{\n \"success\": false,\n \"error\": {\n \"code\": 101,\n \"type\": \"invalid_access_key\",\n \"info\": \"You have not supplied a valid API Access Key. [Technical Support: support@apilayer.com]\"\n }\n}\n"}} +``` + +The endpoint is returning a 200 HTTP response, but the message contains an error, which our connector isn't handling. + +This can be fixed by adding an error handler to the requester: + +``` +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" + base: "{{ config.base }}" + error_handler: + response_filters: + - action: FAIL + predicate: "{{ 'error' in response }}" +``` + +The `check` operation should now fail + +``` +python main.py check --debug --config integration_tests/invalid_config.json +``` + +and the acceptance tests should pass + +``` +docker build . -t airbyte/source-exchange-rates-tutorial:dev +python -m pytest integration_tests -p integration_tests.acceptance +``` + +## Next steps: + +Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). + +## Read more: + +- acceptance tests +- check operation +- building the image +- error handling +- interpolation + +missing: + +- custom code +- pagination +- transformation \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/concepts/authenticator.md b/docs/connector-development/tutorials/config-based/concepts/authenticator.md new file mode 100644 index 000000000000..1eb6a1efc873 --- /dev/null +++ b/docs/connector-development/tutorials/config-based/concepts/authenticator.md @@ -0,0 +1,70 @@ +# Authenticator + +The `Authenticator` defines how to configure outgoing HTTP requests to authenticate on the API source. + +## Authenticators + +### ApiKeyAuthenticator + +The `ApiKeyAuthenticator` sets an HTTP header on outgoing requests. +The following definition will set the header "Authorization" with a value "Bearer hello": + +``` +authenticator: + type: "ApiKeyAuthenticator" + header: "Authorization" + token: "Bearer hello" +``` + +### BearerAuthenticator + +The `BearerAuthenticator` is a specialized `ApiKeyAuthenticator` that always sets the header "Authorization" with the value "Bearer {token}". +The following definition will set the header "Authorization" with a value "Bearer hello" + +``` +authenticator: + type: "BearerAuthenticator" + token: "hello" +``` + +More information on bearer authentication can be found [here](https://swagger.io/docs/specification/authentication/bearer-authentication/) + +### BasicHttpAuthenticator + +The `BasicHttpAuthenticator` set the "Authorization" header with a (USER ID/password) pair, encoded using base64 as per [RFC 7617](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme). +The following definition will set the header "Authorization" with a value "Basic " + +The encoding scheme is: + +1. concatenate the username and the password with `":"` in between +2. Encode the resulting string in base 64 +3. Decode the result in utf8 + +``` +authenticator: + type: "BasicHttpAuthenticator" + username: "hello" + password: "world" +``` + +The password is optional. Authenticating with APIs using Basic HTTP and a single API key can be done as: + +``` +authenticator: + type: "BasicHttpAuthenticator" + username: "hello" +``` + +### OAuth + +OAuth authentication is supported through the `OAuthAuthenticator`, which requires the following parameters: + +- token_refresh_endpoint: The endpoint to refresh the access token +- client_id: The client id +- client_secret: Client secret +- refresh_token: The token used to refresh the access token +- scopes: The scopes to request +- token_expiry_date: The access token expiration date +- access_token_name: THe field to extract access token from in the response +- expires_in_name:The field to extract expires_in from in the response +- refresh_request_body: The request body to send in the refresh request \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/concepts/error-handling.md b/docs/connector-development/tutorials/config-based/concepts/error-handling.md new file mode 100644 index 000000000000..6aa7136493aa --- /dev/null +++ b/docs/connector-development/tutorials/config-based/concepts/error-handling.md @@ -0,0 +1,173 @@ +# Error handling + +By default, only retry server errors (HTTP 5XX) and too many requests (HTTP 429) will be retried up to 5 times with exponential backoff. +Other HTTP errors will result in a failed read. + +Other behaviors can be configured through the `Requester`'s `error_handler` field. + +## Defining errors + +Response filters can be used to define how to handle requests resulting in responses with a specific HTTP status code. +For instance, this example will configure the handler to also retry responses with 404 error: + +``` +requester: + <...> + error_handler: + response_filters: + - http_codes: [404] + action: RETRY +``` + +Response filters can be used to specify HTTP errors to ignore instead of retrying. +For instance, this example will configure the handler to ignore responses with 404 error: + +``` +requester: + <...> + error_handler: + response_filters: + - http_codes: [404] + action: IGNORE +``` + +Errors can also be defined by parsing the error message. +For instance, this error handler will ignores responses if the error message contains the string "ignorethisresponse" + +``` +requester: + <...> + error_handler: + response_filters: + - error_message_contain: "ignorethisresponse" + action: IGNORE +``` + +This can also be done through a more generic string interpolation strategy with the following parameters: + +- response: + +This example ignores errors where the response contains a "code" field: + +``` +requester: + <...> + error_handler: + response_filters: + - predicate: "{{ 'code' in response }}" + action: IGNORE +``` + +The error handler can have multiple response filters. +The following example is configured to ignore 404 errors, and retry 429 errors: + +``` +requester: + <...> + error_handler: + response_filters: + - http_codes: [404] + action: IGNORE + - http_codes: [429] + action: RETRY +``` + +## Backoff Strategies + +The error handle supports a few backoff strategies, which are described in the following sections. + +### Exponential backoff + +This is the default backoff strategy. The requester will backoff with an exponential backoff interval + +### Constant Backoff + +When using the `ConstantBackoffStrategy`, the requester will backoff with a constant interval. + +### Wait time defined in header + +When using the `WaitTimeFromHeaderBackoffStrategy`, the requester will backoff by an interval specified in the response header. +In this example, the requester will backoff by the response's "wait_time" header value: + +``` +requester: + <...> + error_handler: + <...> + backoff_strategies: + - type: "WaitTimeFromHeaderBackoffStrategy" + header: "wait_time" +``` + +Optionally, a regex can be configured to extract the wait time from the header value. + +``` +requester: + <...> + error_handler: + <...> + backoff_strategies: + - type: "WaitTimeFromHeaderBackoffStrategy" + header: "wait_time" + regex: "[-+]?\d+" +``` + +### Wait until time defined in header + +When using the `WaitUntilTimeFromHeaderBackoffStrategy`, the requester will backoff until the time specified in the response header. +In this example, the requester will wait until the time specified in the "wait_until" header value: + +``` +requester: + <...> + error_handler: + <...> + backoff_strategies: + - type: "WaitUntilTimeFromHeaderBackoffStrategy" + header: "wait_until" + regex: "[-+]?\d+" + min_wait: 5 +``` + +The strategy accepts an optional regex to extract the time from the header value, and a minimum time to wait. + +## Advanced error handling + +The error handler can have multiple backoff strategies, allowing it to fallback if a strategy cannot be evaluated. +For instance, the following defines an error handler that will read the backoff time from a header, and default to a constant backoff if the wait time could not be extracted from the response: + +``` +requester: + <...> + error_handler: + <...> + backoff_strategies: + - type: "WaitTimeFromHeaderBackoffStrategy" + header: "wait_time" + - type: "ConstantBackoffStrategy" + backoff_time_in_seconds: 5 + +``` + +The `requester` can be configured to use a `CompositeErrorHandler`, which sequentially iterates over a list of error handlers, enabling different retry mechanisms for different types of errors. + +In this example, a constant backoff of 5 seconds, will be applied if the response contains a "code" field, and an exponential backoff will be applied if the error code is 403: + +``` +requester: + <...> + error_handler: + type: "CompositeErrorHandler" + error_handlers: + - response_filters: + - predicate: "{{ 'code' in response }}" + action: RETRY + backoff_strategies: + - type: "ConstantBackoffStrategy" + backoff_time_in_seconds: 5 + - response_filters: + - http_codes: [ 403 ] + action: RETRY + backoff_strategies: + - type: "ExponentialBackoffStrategy" +``` \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/concepts/overview.md b/docs/connector-development/tutorials/config-based/concepts/overview.md new file mode 100644 index 000000000000..a47bc65cbe09 --- /dev/null +++ b/docs/connector-development/tutorials/config-based/concepts/overview.md @@ -0,0 +1,77 @@ +# Config-based connectors overview + +The goal of this document is to give enough technical specifics to understand how config-based connectors work. +When you're ready to start building a connector, you can start with [the tutorial](../0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) + +## Source + +Config-based connectors are a declarative way to define HTTP API sources. + +A source is defined by 2 components: + +1. The source's `Stream`s, which define the data to read +2. A `ConnectionChecker`, which describes how to run the `check` operation to test the connection to the API source + +## Stream + +Streams define the schema of the data of interest, as well as how to read it from the underlying API source. +A stream generally corresponds to a resource within the API. They are analogous to tables for a RDMS source. + +A stream is defined by: + +1. Its name +2. A primary key: used to uniquely identify records, enabling deduplication +3. A schema: describes the data to sync +4. A data retriever: describes how to retrieve the data from the API +5. A cursor field: used to identify the stream's state from a record +6. A set of transformations to be applied on the records read from the source before emitting them to the destination +7. A checkpoint interval: defines when to checkpoint syncs. + +More details on streams and sources can be found in the [basic concepts section](../../../../../airbyte-cdk/python/docs/concepts/basic-concepts.md). +More details on cursor fields, and checkpointing can be found in the [incremental-stream section](../../../../../airbyte-cdk/python/docs/concepts/incremental-stream.md) + +## Data retriever + +The data retriever defines how to read the data from an API source. +The is currently only one implementation, the `SimpleRetriever`, which is defined by + +1. Requester: describes how to submit requests to the API source +2. Paginator[^1]: describes how to navigate through the API's pages +3. Record selector: describes how to select records from an HTTP response +4. Stream Slicer: describes how to partition the stream, enabling incremental syncs and checkpointing + +Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. +The developer can choose and configure the implementation they need depending on specifications of the integrations they are building against. + +### Data flow + +The retriever acts as a coordinator, moving the data between its components before emitting `AirbyteMessage`s that can be read by the platform. +The `SimpleRetriever`'s data flow can be described as follows: + +1. Given the connection config and the current stream state, the `StreamSlicer` computes the stream slices to read. +2. Iterate over all the stream slices defined by the stream slicer. +3. For each stream slice, + 1. Submit a request as defined by the requester + 2. Select the records from the response + 3. Repeat for as long as the paginator points to a next page + +## Requester + +The `Requester` defines how to prepare HTTP requests to send to the source API [^2]. +There currently is only one implementation, the `HttpRequester`, which is defined by + +1. A base url: the root of the API source +2. A path: the specific endpoint to fetch data from for a resource +3. The HTTP method: the HTTP method to use (GET or POST) +4. A request options provider: defines the request parameters and headers to set on outgoing HTTP requests +5. An authenticator: defines how to authenticate to the source +6. An error handler: defines how to handle errors + +## Connection Checker + +The `ConnectionChecker` defines how to test the connection to the integration. + +The only implementation as of now is `CheckStream`, which tries to read a record from a specified list of streams and fails if no records could be read. + +[^1] The paginator is conceptually more related to the requester than the data retriever, but is part of the `SimpleRetriever` because it inherits from `HttpStream` to increase code reusability. +[^2] As of today, the requester acts as a config object and is not directly responsible for preparing the HTTP requests. This is done in the `SimpleRetriever`'s parent class `HttpStream`. \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/concepts/pagination.md b/docs/connector-development/tutorials/config-based/concepts/pagination.md new file mode 100644 index 000000000000..97b6c243318e --- /dev/null +++ b/docs/connector-development/tutorials/config-based/concepts/pagination.md @@ -0,0 +1,100 @@ +# HTTP Requester + +Given a page size and a pagination strategy, the `LimitPaginator` will point to pages of results for as long as its strategy returns a `next_page_token`. + +Iterating over pages of result is different from iterating over stream slices. +Stream slices have semantic value, for instance, a Datetime stream slice defines data for a specific date range. 2 stream slices will have data for different date ranges. +Conversely, pages don't have semantic value. More pages simply means that more records are to be read, without specifying any meaningful difference between the records of the first and later pages. + +The paginator is defined by + +- page size: the number of records to fetch in a single request +- limit_option: how to specify the page size in the outgoing HTTP request +- pagination_strategy: how to compute the next page to fetch +- page_token_option: how to specify the next page to fetch in the outgoing HTTP request + +3 pagination strategies are supported + +1. Page increment +2. Offset increment +3. Cursor-based + +## Pagination Strategies + +### Page increment + +When using the `PageIncrement` strategy, the page number will be set as part of the `page_token_option`. + +The following paginator example will fetch 5 records per page, and specify the page number as a request_parameter: + +``` +paginator: + type: "LimitPaginator" + page_size: 5 + limit_option: + option_type: request_parameter + field_name: page_size + pagination_strategy: + type: "PageIncrement" + page_token: + option_type: "request_parameter" + field_name: "page" +``` + +### Offset increment + +When using the `OffsetIncrement` strategy, the number of records read will be set as part of the `page_token_option`. + +The following paginator example will fetch 5 records per page, and specify the offset as a request_parameter: + +``` +paginator: + type: "LimitPaginator" + page_size: 5 + limit_option: + option_type: request_parameter + field_name: page_size + pagination_strategy: + type: "PageIncrement" + page_token: + field_name: "page" + inject_into: "request_parameter" + +``` + +### Cursor + +The `CursorPaginationStrategy` outputs a token by evaluating its `cursor_value` string with the following parameters: + +- `response`: decoded response +- `headers`: HTTP headers on the response +- `last_records`: List of records selected from the last response + +This cursor value can be used to request the next page of record. + +In this example, the next page of record is defined by setting the `from` request parameter to the id of the last record read: + +``` +paginator: + type: "LimitPaginator" + <...> + pagination_strategy: + type: "CursorPaginationStrategy" + cursor_value: "{{ last_records[-1].id }}" + page_token: + field_name: "from" + inject_into: "request_parameter" +``` + +Some APIs directly point to the URL of the next page to fetch. In this example, the URL of the next page is extracted from the response headers: + +``` +paginator: + type: "LimitPaginator" + <...> + pagination_strategy: + type: "CursorPaginationStrategy" + cursor_value: "{{ headers.urls.next }}" + page_token: + inject_into: "path" +``` \ No newline at end of file diff --git a/docs/connector-development/tutorials/config-based/concepts/stream-slicers.md b/docs/connector-development/tutorials/config-based/concepts/stream-slicers.md new file mode 100644 index 000000000000..141a0aa23aac --- /dev/null +++ b/docs/connector-development/tutorials/config-based/concepts/stream-slicers.md @@ -0,0 +1,165 @@ +# Stream Slicers + +`StreamSlicer`s define how to partition a stream into a subset of records. + +It can be thought of as an iterator over the stream's data, where a `StreamSlice` is the retriever's unit of work. + +When a stream is read incrementally, a state message will be output by the connector after reading every slice, which enable checkpointing. + +At the beginning of a `read` operation, the `StreamSlicer` will compute the slices to sync given the connection config and the stream's current state, +As the `Retriever` reads data from the `Source`, the `StreamSlicer` keeps track of the `Stream`'s state, which will be emitted after reading each stream slice. + +More information of stream slicing can be found in the [stream-slices section](../../../cdk-python/stream-slices.md) + +## Implementations + +This section gives an overview of the stream slicers currently implemented. + +### Datetime + +The `DatetimeStreamSlicer` iterates over a datetime range by partitioning it into time windows. +Given a start time, an end time, and a step function, it will partition the interval [start, end] into small windows of the size described by the step. +For instance, + +``` +stream_slicer: + start_datetime: "2021-02-01T00:00:00.000000+0000", + end_datetime: "2021-03-01T00:00:00.000000+0000", + step: "1d" +``` + +will create one slice per day for the interval `2021-02-01` - `2021-03-01`. + +The `DatetimeStreamSlicer` also supports an optional lookback window, specifying how many days before the start_datetime to read data for. + +``` +stream_slicer: + start_datetime: "2021-02-01T00:00:00.000000+0000", + end_datetime: "2021-03-01T00:00:00.000000+0000", + lookback_window: "31d" + step: "1d" +``` + +will read data from `2021-01-01` to `2021-03-01`. + +The stream slices will be of the form `{"start_date": "2021-02-01T00:00:00.000000+0000", "end_date": "2021-02-01T00:00:00.000000+0000"}` +The stream slices' field names can be customized through the `stream_state_field_start` and `stream_state_field_end` parameters. + +The `datetime_format` can be used to specify the format of the start and end time. It is [RFC3339](https://datatracker.ietf.org/doc/html/rfc3339#section-5.6) by default. + +The Stream's state will be derived by reading the record's `cursor_field`. +If the `cursor_field` is `created`, and the record is `{"id": 1234, "created": "2021-02-02T00:00:00.000000+0000"}`, then the state after reading that record is `"created": "2021-02-02T00:00:00.000000+0000"`. [^1] + +#### Cursor update + +When reading data from the source, the cursor value will be updated to the max datetime between + +- the last record's cursor field +- the start of the stream slice +- the current cursor value + This ensures that the cursor will be updated even if a stream slice does not contain any data. + +#### Specifying query start and end time + +If an API supports filtering data based on the cursor field, the `start_time_option` and `end_time_option` parameters can be used to configure this filtering. +For instance, if the API supports filtering using the request parameters `created[gte]` and `created[lte]`, then the stream slicer can specify the request parameters as + +``` +stream_slicer: + type: "DatetimeStreamSlicer" + <...> + start_time_option: + field_name: "created[gte]" + inject_into: "request_parameter" + end_time_option: + field_name: "created[lte]" + inject_into: "request_parameter" +``` + +### List + +`ListStreamSlicer` iterates over values from a given list. +It is defined by + +- The slice values, which are the valid values for the cursor field +- The cursor field on a record +- request_option: optional request option to set on outgoing request parameters + +As an example, this stream slicer will iterate over the 2 repositories ("airbyte" and "airbyte-secret") and will set a request_parameter on outgoing HTTP requests. + +``` +stream_slicer: + type: "ListStreamSlicer" + slice_values: + - "airbyte" + - "airbyte-secret" + cursor_field: "repository" + request_option: + field_name: "repository" + inject_into: "request_parameter" +``` + +### Cartesian Product + +`CartesianProductStreamSlicer` iterates over the cartesian product of its underlying stream slicers. + +Given 2 stream slicers with the following slices: +A: `[{"start_date": "2021-01-01", "end_date": "2021-01-01"}, {"start_date": "2021-01-02", "end_date": "2021-01-02"}]` +B: `[{"s": "hello"}, {"s": "world"}]` +the resulting stream slices are + +``` +[ + {"start_date": "2021-01-01", "end_date": "2021-01-01", "s": "hello"}, + {"start_date": "2021-01-01", "end_date": "2021-01-01", "s": "world"}, + {"start_date": "2021-01-02", "end_date": "2021-01-02", "s": "hello"}, + {"start_date": "2021-02-01", "end_date": "2021-02-01", "s": "world"}, +] +``` + +### Substream + +`SubstreamSlicer` iterates over the parent's stream slices. +This is useful for defining sub-resources. + +We might for instance want to read all the commits for a given repository (parent resource). + +For each parent stream, the slicer needs to know + +- what the parent stream is +- what is the key of the records in the parent stream +- what is the field defining the stream slice representing the parent record +- how to specify that information on an outgoing HTTP request + +Assuming the commits for a given repository can be read by specifying the repository as a request_parameter, this could be defined as + +``` +stream_slicer: + type: "SubstreamSlicer" + parent_streams_configs: + - stream: "*ref(repositories_stream)" + parent_key: "id" + stream_slice_field: "repository" + request_option: + field_name: "repository" + inject_into: "request_parameter" +``` + +REST APIs often nest sub-resources in the URL path. +If the URL to fetch commits was "/repositories/:id/commits", then the `Requester`'s path would need to refer to the stream slice's value and no `request_option` would be set: + +``` +retriever: + <...> + requester: + <...> + path: "/respositories/{{ stream_slice.repository }}/commits + stream_slicer: + type: "SubstreamSlicer" + parent_streams_configs: + - stream: "*ref(repositories_stream)" + parent_key: "id" + stream_slice_field: "repository" +``` + +[^1] This is a slight oversimplification. See update cursor section for more details on how the cursor is updated From ff2b6028b48f90ab9db14bacf7d4f5278a7e670b Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 12:14:12 -0700 Subject: [PATCH 06/92] reset --- .../stream_slicers/test_datetime_stream_slicer.py | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py b/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py index 470dccdd7417..c34f4ada975d 100644 --- a/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py +++ b/airbyte-cdk/python/unit_tests/sources/declarative/stream_slicers/test_datetime_stream_slicer.py @@ -255,17 +255,6 @@ def mock_datetime_now(monkeypatch): {"start_date": "2021-01-05T00:00:00.000000+0000", "end_date": "2021-01-05T00:00:00.000000+0000"}, ], ), - ( - "test_docs", - None, - MinMaxDatetime("2021-02-01T00:00:00.000000+0000"), - MinMaxDatetime("2021-03-01T00:00:00.000000+0000"), - "1d", - cursor_field, - "31d", - datetime_format, - [], - ), ( "test_start_is_after_stream_state", {cursor_field: "2021-01-05T00:00:00.000000+0000"}, From 906f9151885978e6aabadc612235f02c2b055510 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 14:39:37 -0700 Subject: [PATCH 07/92] move files --- .../authentication.md} | 2 +- .../error-handling.md | 0 .../concepts => config-based}/overview.md | 20 +++++++++++++++---- .../concepts => config-based}/pagination.md | 0 .../stream-slicers.md | 2 +- .../tutorial}/0-getting-started.md | 0 .../tutorial}/1-create-source.md | 0 .../tutorial}/2-install-dependencies.md | 0 .../tutorial}/3-connecting.md | 0 .../tutorial}/4-reading-data.md | 0 .../tutorial}/5-incremental-reads.md | 0 .../tutorial}/6-testing.md | 0 12 files changed, 18 insertions(+), 6 deletions(-) rename docs/connector-development/{tutorials/config-based/concepts/authenticator.md => config-based/authentication.md} (99%) rename docs/connector-development/{tutorials/config-based/concepts => config-based}/error-handling.md (100%) rename docs/connector-development/{tutorials/config-based/concepts => config-based}/overview.md (79%) rename docs/connector-development/{tutorials/config-based/concepts => config-based}/pagination.md (100%) rename docs/connector-development/{tutorials/config-based/concepts => config-based}/stream-slicers.md (99%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/0-getting-started.md (100%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/1-create-source.md (100%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/2-install-dependencies.md (100%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/3-connecting.md (100%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/4-reading-data.md (100%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/5-incremental-reads.md (100%) rename docs/connector-development/{tutorials/config-based => config-based/tutorial}/6-testing.md (100%) diff --git a/docs/connector-development/tutorials/config-based/concepts/authenticator.md b/docs/connector-development/config-based/authentication.md similarity index 99% rename from docs/connector-development/tutorials/config-based/concepts/authenticator.md rename to docs/connector-development/config-based/authentication.md index 1eb6a1efc873..640f590ae11b 100644 --- a/docs/connector-development/tutorials/config-based/concepts/authenticator.md +++ b/docs/connector-development/config-based/authentication.md @@ -1,4 +1,4 @@ -# Authenticator +# Authentication The `Authenticator` defines how to configure outgoing HTTP requests to authenticate on the API source. diff --git a/docs/connector-development/tutorials/config-based/concepts/error-handling.md b/docs/connector-development/config-based/error-handling.md similarity index 100% rename from docs/connector-development/tutorials/config-based/concepts/error-handling.md rename to docs/connector-development/config-based/error-handling.md diff --git a/docs/connector-development/tutorials/config-based/concepts/overview.md b/docs/connector-development/config-based/overview.md similarity index 79% rename from docs/connector-development/tutorials/config-based/concepts/overview.md rename to docs/connector-development/config-based/overview.md index a47bc65cbe09..a9cff87e43d0 100644 --- a/docs/connector-development/tutorials/config-based/concepts/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -1,7 +1,13 @@ # Config-based connectors overview The goal of this document is to give enough technical specifics to understand how config-based connectors work. -When you're ready to start building a connector, you can start with [the tutorial](../0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) +When you're ready to start building a connector, you can start with [the tutorial](../../../config-based/tutorial/0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) + +## Overview + +Config-based connectors work by parsing a YAML configuration describing the Source, then running the configured connector using a Python backend. + +The process then submits HTTP requests to the API endpoint, and extracts records out of the response. ## Source @@ -27,12 +33,12 @@ A stream is defined by: 6. A set of transformations to be applied on the records read from the source before emitting them to the destination 7. A checkpoint interval: defines when to checkpoint syncs. -More details on streams and sources can be found in the [basic concepts section](../../../../../airbyte-cdk/python/docs/concepts/basic-concepts.md). -More details on cursor fields, and checkpointing can be found in the [incremental-stream section](../../../../../airbyte-cdk/python/docs/concepts/incremental-stream.md) +More details on streams and sources can be found in the [basic concepts section](../cdk-python/basic-concepts.md). +More details on cursor fields, and checkpointing can be found in the [incremental-stream section](../cdk-python/incremental-stream.md) ## Data retriever -The data retriever defines how to read the data from an API source. +The data retriever defines how to read the data from an API source, and acts as an orchestrator for the data retrieval flow. The is currently only one implementation, the `SimpleRetriever`, which is defined by 1. Requester: describes how to submit requests to the API source @@ -43,6 +49,9 @@ The is currently only one implementation, the `SimpleRetriever`, which is define Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. The developer can choose and configure the implementation they need depending on specifications of the integrations they are building against. +More details on the paginator can be found in the [pagination section](pagination.md) +More details on the stream slicers can be found in the [stream slicers section](stream-slicers.md) + ### Data flow The retriever acts as a coordinator, moving the data between its components before emitting `AirbyteMessage`s that can be read by the platform. @@ -67,6 +76,9 @@ There currently is only one implementation, the `HttpRequester`, which is define 5. An authenticator: defines how to authenticate to the source 6. An error handler: defines how to handle errors +More details on authentication can be found in the [authentication section](authentication.md). +More details on error handling can be found in the [error handling section](error-handling.md) + ## Connection Checker The `ConnectionChecker` defines how to test the connection to the integration. diff --git a/docs/connector-development/tutorials/config-based/concepts/pagination.md b/docs/connector-development/config-based/pagination.md similarity index 100% rename from docs/connector-development/tutorials/config-based/concepts/pagination.md rename to docs/connector-development/config-based/pagination.md diff --git a/docs/connector-development/tutorials/config-based/concepts/stream-slicers.md b/docs/connector-development/config-based/stream-slicers.md similarity index 99% rename from docs/connector-development/tutorials/config-based/concepts/stream-slicers.md rename to docs/connector-development/config-based/stream-slicers.md index 141a0aa23aac..ed5c42eb3b8b 100644 --- a/docs/connector-development/tutorials/config-based/concepts/stream-slicers.md +++ b/docs/connector-development/config-based/stream-slicers.md @@ -9,7 +9,7 @@ When a stream is read incrementally, a state message will be output by the conne At the beginning of a `read` operation, the `StreamSlicer` will compute the slices to sync given the connection config and the stream's current state, As the `Retriever` reads data from the `Source`, the `StreamSlicer` keeps track of the `Stream`'s state, which will be emitted after reading each stream slice. -More information of stream slicing can be found in the [stream-slices section](../../../cdk-python/stream-slices.md) +More information of stream slicing can be found in the [stream-slices section](../cdk-python/stream-slices.md) ## Implementations diff --git a/docs/connector-development/tutorials/config-based/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md similarity index 100% rename from docs/connector-development/tutorials/config-based/0-getting-started.md rename to docs/connector-development/config-based/tutorial/0-getting-started.md diff --git a/docs/connector-development/tutorials/config-based/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md similarity index 100% rename from docs/connector-development/tutorials/config-based/1-create-source.md rename to docs/connector-development/config-based/tutorial/1-create-source.md diff --git a/docs/connector-development/tutorials/config-based/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md similarity index 100% rename from docs/connector-development/tutorials/config-based/2-install-dependencies.md rename to docs/connector-development/config-based/tutorial/2-install-dependencies.md diff --git a/docs/connector-development/tutorials/config-based/3-connecting.md b/docs/connector-development/config-based/tutorial/3-connecting.md similarity index 100% rename from docs/connector-development/tutorials/config-based/3-connecting.md rename to docs/connector-development/config-based/tutorial/3-connecting.md diff --git a/docs/connector-development/tutorials/config-based/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md similarity index 100% rename from docs/connector-development/tutorials/config-based/4-reading-data.md rename to docs/connector-development/config-based/tutorial/4-reading-data.md diff --git a/docs/connector-development/tutorials/config-based/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md similarity index 100% rename from docs/connector-development/tutorials/config-based/5-incremental-reads.md rename to docs/connector-development/config-based/tutorial/5-incremental-reads.md diff --git a/docs/connector-development/tutorials/config-based/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md similarity index 100% rename from docs/connector-development/tutorials/config-based/6-testing.md rename to docs/connector-development/config-based/tutorial/6-testing.md From a64c758bb8119f433595d4d0e59c22c676bf0332 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 15:26:18 -0700 Subject: [PATCH 08/92] record selector, request options, and more links --- .../config-based/overview.md | 7 +- .../config-based/record-selector.md | 134 ++++++++++++++++++ .../config-based/request-options.md | 88 ++++++++++++ .../config-based/stream-slicers.md | 7 +- .../tutorial/0-getting-started.md | 9 +- .../config-based/tutorial/1-create-source.md | 4 +- .../config-based/tutorial/3-connecting.md | 11 +- .../config-based/tutorial/4-reading-data.md | 7 +- .../tutorial/5-incremental-reads.md | 12 +- .../config-based/tutorial/6-testing.md | 13 +- 10 files changed, 243 insertions(+), 49 deletions(-) create mode 100644 docs/connector-development/config-based/record-selector.md create mode 100644 docs/connector-development/config-based/request-options.md diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index a9cff87e43d0..41a2a64a7bea 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -49,9 +49,6 @@ The is currently only one implementation, the `SimpleRetriever`, which is define Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. The developer can choose and configure the implementation they need depending on specifications of the integrations they are building against. -More details on the paginator can be found in the [pagination section](pagination.md) -More details on the stream slicers can be found in the [stream slicers section](stream-slicers.md) - ### Data flow The retriever acts as a coordinator, moving the data between its components before emitting `AirbyteMessage`s that can be read by the platform. @@ -64,6 +61,10 @@ The `SimpleRetriever`'s data flow can be described as follows: 2. Select the records from the response 3. Repeat for as long as the paginator points to a next page +More details on the paginator can be found in the [pagination section](pagination.md) +More details on the record selector can be found in the [record selector section](record-selector.md) +More details on the stream slicers can be found in the [stream slicers section](stream-slicers.md) + ## Requester The `Requester` defines how to prepare HTTP requests to send to the source API [^2]. diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md new file mode 100644 index 000000000000..5f9119736b2d --- /dev/null +++ b/docs/connector-development/config-based/record-selector.md @@ -0,0 +1,134 @@ +# Record selector + +The record selector is responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering +records based on a heuristic. + +The current record selector implementation uses Jello to select record sfrom the json-decoded HTTP response. +More information on Jello can be found at https://github.com/kellyjonbrazil/jello + +## Common recipes: + +1. Selecting the whole json object can be done with `_` +2. Wrapping the whole json object in an array can be done with `[_]` +3. Inner fields can be selected by referring to it with the dot-notation: `_.data` will return the data field + +## Filtering records + +Records can be filtered by adding a record_filter to the selector. +The expression in the filter will be evaluated to a boolean returning true the record should be included. + +In this example, all records with a `created_at` field greater than the stream slice's `start_time` will be filtered out: + +``` +selector: + extractor: + transform: "[_]" + record_filter: + condition: "{{ record.created_at < stream_slice.start_time }}" +``` + +## Transformations + +Fields can be added or removed from records by adding `Transformation`s to a stream's definition. + +### Adding fields + +Fields can be added with the `AddFields` transformation. +This example adds a top-level field "field1" with a value "static_value" + +``` +stream: + <...> + transformations: + - type: AddFields + fields: + - path: ["field1"] + value: "static_value" +``` + +Fields can also be added in a nested object by writing the fields' path as a list. + +Given a record of the following shape: + +``` +{ + "id": 0, + "data": + { + "field0": "some_data" + } +} +``` + +this definition will add a field in the "data" nested object: + +``` +stream: + <...> + transformations: + - type: AddFields + fields: + - path: ["data", "field1"] + value: "static_value" +``` + +resulting in the following record: + +``` +{ + "id": 0, + "data": + { + "field0": "some_data", + "field1": "static_value" + } +} +``` + +### Removing fields + +Fields can be removed from records with the `RemoveFields` transformation. + +Given a record of the following shape: + +``` +{ + "path": + { + "to": + { + "field1": "data_to_remove", + "field2": "data_to_keep" + } + }, + "path2": "data_to_remove", + "path3": "data_to_keep" +} +``` + +this definition will remove the 2 instances of "data_to_remove" which are found in "path2" and "path.to.field1": + +``` +the_stream: + <...> + transformations: + - type: RemoveFields + field_pointers: + - ["path", "to", "field1"] + - ["path2"] +``` + +resulting in the following record: + +``` +{ + "path": + { + "to": + { + "field2": "data_to_keep" + } + }, + "path3": "data_to_keep" +} +``` \ No newline at end of file diff --git a/docs/connector-development/config-based/request-options.md b/docs/connector-development/config-based/request-options.md new file mode 100644 index 000000000000..727b3f502e33 --- /dev/null +++ b/docs/connector-development/config-based/request-options.md @@ -0,0 +1,88 @@ +# Request Options + +There are a few ways request parameters, headers, and body can be set on ongoing HTTP requests. + +## Request Options Provider + +The primary way to set request options is through the `Requester`'s `RequestOptionsProvider`. +The options can be configured as key value pairs: + +``` +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + k1: v1 + k2: v2 + request_headers: + header_key1: header_value1 + header_key2: header_value2 +``` + +It is also possible to configure add a json-encoded body to outgoing requests. + +``` +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_body_json: + key: value +``` + +## Authenticators + +It is also possible for authenticators to set request parameters or headers as needed. +For instance, the `BearerAuthenticator` will always set the `Authorization` header. + +More details on the various authenticators can be found in the [authentication section](authentication.md) + +## Paginators + +The `LimitPaginator` can optionally set request options through the `limit_option` and the `page_token_option`. +The respective values can be set on the outgoing HTTP requests by specifying where it should be injected. + +The following example will set the "page" request parameter value to the page to fetch, and the "page_size" request parameter to 5: + +``` +paginator: + type: "LimitPaginator" + page_size: 5 + limit_option: + option_type: request_parameter + field_name: page_size + pagination_strategy: + type: "PageIncrement" + page_token: + option_type: "request_parameter" + field_name: "page" +``` + +More details on paginators can be found in the [pagination section](pagination.md) + +## Stream slicers + +The `DatetimeStreamSlicer` can optionally set request options through the `start_time_option` and `end_time_option` fields. +The respective values can be set on the outgoing HTTP requests by specifying where it should be injected. + +The following example will set the "created[gte]" request parameter value to the start of the time window, and "created[lte]" to the end of the time window. + +``` +stream_slicer: + start_datetime: "2021-02-01T00:00:00.000000+0000", + end_datetime: "2021-03-01T00:00:00.000000+0000", + step: "1d" + start_time_option: + field_name: "created[gte]" + inject_into: "request_parameter" + end_time_option: + field_name: "created[lte]" + inject_into: "request_parameter" +``` + +More details on the stream slicers can be found in the [stream-slicers section](stream-slicers.md) diff --git a/docs/connector-development/config-based/stream-slicers.md b/docs/connector-development/config-based/stream-slicers.md index ed5c42eb3b8b..29e950bcd8d8 100644 --- a/docs/connector-development/config-based/stream-slicers.md +++ b/docs/connector-development/config-based/stream-slicers.md @@ -162,4 +162,9 @@ retriever: stream_slice_field: "repository" ``` -[^1] This is a slight oversimplification. See update cursor section for more details on how the cursor is updated +[^1] This is a slight oversimplification. See update cursor section for more details on how the cursor is updated + +## More readings + +- [Incremental streams](../cdk-python/incremental-stream.md) +- [Stream slices](../cdk-python/stream-slices.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index d072a21ce4d0..aa990f8a501f 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -39,11 +39,4 @@ This can be done by signing up for the Free tier plan on [Exchange Rates API](ht ## Next Steps -Next, we'll [create a Source using the connector generator.](1-create-source.md) - -## More readings - -- Source -- Stream -- Schema -- \ No newline at end of file +Next, we'll [create a Source using the connector generator.](1-create-source.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index 3a988b33e7c4..f7929502a807 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -24,6 +24,4 @@ For this walkthrough, we'll refer to our source as `exchange-rates-tutorial`. Th ## Next steps -Next, [we'll install dependencies required to run the connector](2-install-dependencies.md) - -## Note - Maybe this should be combined with 0 or 2 \ No newline at end of file +Next, [we'll install dependencies required to run the connector](2-install-dependencies.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/3-connecting.md b/docs/connector-development/config-based/tutorial/3-connecting.md index a0429c141813..7af6bfeed93d 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting.md +++ b/docs/connector-development/config-based/tutorial/3-connecting.md @@ -181,11 +181,6 @@ Next, we'll [extract the records from the response](4-reading-data.md) ## More readings - -- declarative stream -- check stream -- http requester -- authentication -- request options providers -- config -- spec file -- check operation \ No newline at end of file +- [Config-based connectors overview](../overview.md) +- [Authentication](../authentication.md) +- [Request options providers](../request-options.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index ff024c257cc3..a1ae0e74814d 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -133,9 +133,4 @@ Next, we'll ([enhance the connector to read data for a given date, which will en ## More readings -- record selectors -- catalog tutorial -- jello -- read operation -- primary key -- declarative stream \ No newline at end of file +- [Record selector](../record-selector.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index 2d55d46b6396..9e12d063e1f4 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -257,12 +257,6 @@ Next, we'll run the [Source Acceptance Tests suite to ensure the connector invar ## More readings -- incrementals (general guide) -- incrementals (low-code specific -- spec file -- config -- stream slicer -- datetime stream slicer -- cursor -- options -- requester \ No newline at end of file +- [Incremental reads](../../cdk-python/incremental-stream.md) +- [Stream slicers](../stream-slicers.md) +- [Stream slices](../cdk-python/stream-slices.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 8c58e76e31f2..7128ca01ad4c 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -95,14 +95,5 @@ Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com ## Read more: -- acceptance tests -- check operation -- building the image -- error handling -- interpolation - -missing: - -- custom code -- pagination -- transformation \ No newline at end of file +- [Error handling](../error-handling.md) +- [Pagination](../pagination.md) \ No newline at end of file From 2099b2471a351386971c2ed7982df4dd563bf442 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 16:14:27 -0700 Subject: [PATCH 09/92] update --- docs/connector-development/config-based/error-handling.md | 2 +- docs/connector-development/config-based/overview.md | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/connector-development/config-based/error-handling.md b/docs/connector-development/config-based/error-handling.md index 6aa7136493aa..2bae42c75b3c 100644 --- a/docs/connector-development/config-based/error-handling.md +++ b/docs/connector-development/config-based/error-handling.md @@ -1,6 +1,6 @@ # Error handling -By default, only retry server errors (HTTP 5XX) and too many requests (HTTP 429) will be retried up to 5 times with exponential backoff. +By default, only server errors (HTTP 5XX) and too many requests (HTTP 429) will be retried up to 5 times with exponential backoff. Other HTTP errors will result in a failed read. Other behaviors can be configured through the `Requester`'s `error_handler` field. diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 41a2a64a7bea..d70407a3f652 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -20,7 +20,7 @@ A source is defined by 2 components: ## Stream -Streams define the schema of the data of interest, as well as how to read it from the underlying API source. +Streams define the schema of the data to sync, as well as how to read it from the underlying API source. A stream generally corresponds to a resource within the API. They are analogous to tables for a RDMS source. A stream is defined by: @@ -39,7 +39,7 @@ More details on cursor fields, and checkpointing can be found in the [incrementa ## Data retriever The data retriever defines how to read the data from an API source, and acts as an orchestrator for the data retrieval flow. -The is currently only one implementation, the `SimpleRetriever`, which is defined by +There is currently only one implementation, the `SimpleRetriever`, which is defined by 1. Requester: describes how to submit requests to the API source 2. Paginator[^1]: describes how to navigate through the API's pages @@ -61,9 +61,9 @@ The `SimpleRetriever`'s data flow can be described as follows: 2. Select the records from the response 3. Repeat for as long as the paginator points to a next page -More details on the paginator can be found in the [pagination section](pagination.md) More details on the record selector can be found in the [record selector section](record-selector.md) More details on the stream slicers can be found in the [stream slicers section](stream-slicers.md) +More details on the paginator can be found in the [pagination section](pagination.md) ## Requester From 8bab845b60c3c6178ba9dc3c7c20fb8657318760 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 16:21:05 -0700 Subject: [PATCH 10/92] update --- docs/connector-development/config-based/error-handling.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/connector-development/config-based/error-handling.md b/docs/connector-development/config-based/error-handling.md index 2bae42c75b3c..db52c90eb16e 100644 --- a/docs/connector-development/config-based/error-handling.md +++ b/docs/connector-development/config-based/error-handling.md @@ -19,7 +19,7 @@ requester: action: RETRY ``` -Response filters can be used to specify HTTP errors to ignore instead of retrying. +Response filters can be used to specify HTTP errors to ignore. For instance, this example will configure the handler to ignore responses with 404 error: ``` @@ -45,7 +45,7 @@ requester: This can also be done through a more generic string interpolation strategy with the following parameters: -- response: +- response: the decoded response This example ignores errors where the response contains a "code" field: @@ -99,7 +99,7 @@ requester: header: "wait_time" ``` -Optionally, a regex can be configured to extract the wait time from the header value. +Optionally, a regular expression can be configured to extract the wait time from the header value. ``` requester: @@ -129,7 +129,7 @@ requester: min_wait: 5 ``` -The strategy accepts an optional regex to extract the time from the header value, and a minimum time to wait. +The strategy accepts an optional regular expression to extract the time from the header value, and a minimum time to wait. ## Advanced error handling From a03e6f38f07e17928b72ba0cceb2501e03131543 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 16:42:44 -0700 Subject: [PATCH 11/92] connector definition --- .../config-based/connector-definition.md | 206 ++++++++++++++++++ 1 file changed, 206 insertions(+) create mode 100644 docs/connector-development/config-based/connector-definition.md diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md new file mode 100644 index 000000000000..ef1a73779e2f --- /dev/null +++ b/docs/connector-development/config-based/connector-definition.md @@ -0,0 +1,206 @@ +# Connector definition + +Connectors are defined as a yaml configuration describing the connector's Source. + +2 top-level fields are required: + +1. `streams`: list of streams that are part of the source +2. `check`: component describing how to check the connection. + +The configuration will be validated against this JSON Schema, which defines the set of valid properties. + +We recommend using the `Configuration Based Source` template from the template generator in `airbyte-integrations/connector-templates/generator` to generate the basic file structure. + +## Object instantiation + +This section describes the object that are to be instantiated from the YAML definition. + +If the component is a literal, then it is returned as is: + +``` +3 +``` + +will result in + +``` +3 +``` + +If the component is a mapping with a "class_name" field, +an object of type "class_name" will be instantiated by passing the mapping's other fields to the constructor + +``` +{ + "class_name": "fully_qualified.class_name", + "a_parameter: 3, + "another_parameter: "hello" +} +``` + +will result in + +``` +fully_qualified.class_name(a_parameter=3, another_parameter="helo" +``` + +If the component definition is a mapping with a "type" field, +the factory will lookup the [CLASS_TYPES_REGISTRY](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py) and replace the "type" field by "class_name" -> CLASS_TYPES_REGISTRY[type] +and instantiate the object from the resulting mapping + +If the component definition is a mapping with neither a "class_name" nor a "type" field, +the factory will do a best-effort attempt at inferring the component type by looking up the parent object's constructor type hints. +If the type hint is an interface present in [DEFAULT_IMPLEMENTATIONS_REGISTRY](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/default_implementation_registry.py, +then the factory will create an object of its default implementation. + +If the component definition is a list, then the factory will iterate over the elements of the list, +instantiate its subcomponents, and return a list of instantiated objects. + +If the component has subcomponents, the factory will create the subcomponents before instantiating the top level object + +``` +{ + "type": TopLevel + "param": + { + "type": "ParamType" + "k": "v" + } +} +``` + +will result in + +``` +TopLevel(param=ParamType(k="v")) +``` + +Parameters can be passed down from a parent component to its subcomponents using the $options key. +This can be used to avoid repetitions. + +``` +outer: + $options: + MyKey: MyValue + inner: + k2: v2 +``` + +This the example above, if both outer and inner are types with a "MyKey" field, both of them will evaluate to "MyValue". + +The value can also be used for string interpolation: + +``` +outer: + $options: + MyKey: MyValue + inner: + k2: "MyKey is {{ options.MyKey }}" +``` + +In this example, outer.inner.k2 will evaluate to "MyValue" + +More details on object instantiation can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=factory#airbyte_cdk.sources.declarative.parsers.factory.DeclarativeComponentFactory). + +## References + +Strings can contain references to values previously defined. +The parser will dereference these values to produce a complete ConnectionDefinition + +References can be defined using a *ref() string. + +``` +key: 1234 +reference: "*ref(key)" +``` + +will produce the following definition: + +``` +key: 1234 +reference: 1234 +``` + +This also works with objects: + +``` +key_value_pairs: + k1: v1 + k2: v2 +same_key_value_pairs: "*ref(key_value_pairs)" +``` + +will produce the following definition: + +``` +key_value_pairs: + k1: v1 + k2: v2 +same_key_value_pairs: + k1: v1 + k2: v2 +``` + +The $ref keyword can be used to refer to an object and enhance it with addition key-value pairs + +``` +key_value_pairs: + k1: v1 + k2: v2 +same_key_value_pairs: + $ref: "*ref(key_value_pairs)" + k3: v3 +``` + +will produce the following definition: + +``` +key_value_pairs: + k1: v1 + k2: v2 +same_key_value_pairs: + k1: v1 + k2: v2 + k3: v3 +``` + +References can also point to nested values. +Nested references are ambiguous because one could define a key containing with `.` +in this example, we want to refer to the limit key in the dict object: + +``` +dict: + limit: 50 +limit_ref: "*ref(dict.limit)" +``` + +will produce the following definition: + +``` +dict + limit: 50 +limit-ref: 50 +``` + +whereas here we want to access the `nested.path` value. + +``` +nested: + path: "first one" +nested.path: "uh oh" +value: "ref(nested.path) +``` + +will produce the following definition: + +``` +nested: + path: "first one" +nested.path: "uh oh" +value: "uh oh" +``` + +to resolve the ambiguity, we try looking for the reference key at the top level, and then traverse the structs downward +until we find a key with the given path, or until there is nothing to traverse. + +More details on referencing values can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=yamlparser#airbyte_cdk.sources.declarative.parsers.yaml_parser.YamlParser). \ No newline at end of file From d444d7461bb8e660d27a45e2391bee7a76aa81a2 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 16:43:49 -0700 Subject: [PATCH 12/92] link --- docs/connector-development/config-based/overview.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index d70407a3f652..51e5a0d5a43e 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -9,6 +9,8 @@ Config-based connectors work by parsing a YAML configuration describing the Sour The process then submits HTTP requests to the API endpoint, and extracts records out of the response. +See the [connector definition section](connector-definition.md) for more information on the YAML file describing the connector. + ## Source Config-based connectors are a declarative way to define HTTP API sources. From 71e0a5b597a11a845a37c4df87a35554da4d36a4 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 17:02:01 -0700 Subject: [PATCH 13/92] links --- .../config-based/connector-definition.md | 2 + .../config-based/tutorial/6-testing.md | 72 ++++++++++++++++++- 2 files changed, 73 insertions(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index ef1a73779e2f..d2ce77f81f48 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -11,6 +11,8 @@ The configuration will be validated against this JSON Schema, which defines the We recommend using the `Configuration Based Source` template from the template generator in `airbyte-integrations/connector-templates/generator` to generate the basic file structure. +See the [tutorial for a complete connector definition](tutorial/6-testing.md) + ## Object instantiation This section describes the object that are to be instantiated from the YAML definition. diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 7128ca01ad4c..78e51865e5ad 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -89,6 +89,75 @@ docker build . -t airbyte/source-exchange-rates-tutorial:dev python -m pytest integration_tests -p integration_tests.acceptance ``` +Here is the full connector definition for reference: + +``` +schema_loader: + class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema + name: "rates" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" +selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" + base: "{{ config.base }}" +stream_slicer: + type: "DatetimeStreamSlicer" + start_datetime: + datetime: "{{ config.start_date }}" + datetime_format: "%Y-%m-%d" + end_datetime: + datetime: "{{ now_local() }}" + datetime_format: "%Y-%m-%d %H:%M:%S.%f" + step: "1d" + datetime_format: "%Y-%m-%d" + cursor_field: "{{ options.cursor_field }}" +retriever: + type: SimpleRetriever + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + ref: "*ref(selector)" + paginator: + type: NoPagination +rates_stream: + type: DeclarativeStream + $options: + name: "rates" + cursor_field: "date" + primary_key: "date" + schema_loader: + ref: "*ref(schema_loader)" + retriever: + ref: "*ref(retriever)" + stream_slicer: + ref: "*ref(stream_slicer)" + requester: + ref: "*ref(requester)" + path: + type: "InterpolatedString" + string: "{{ stream_slice.start_date }}" + default: "/latest" + error_handler: + response_filters: + - predicate: "{{'error' in response}}" + action: FAIL +streams: + - "*ref(rates_stream)" +check: + type: CheckStream + stream_names: ["rates"] +``` + ## Next steps: Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). @@ -96,4 +165,5 @@ Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com ## Read more: - [Error handling](../error-handling.md) -- [Pagination](../pagination.md) \ No newline at end of file +- [Pagination](../pagination.md) +- [Testing connectors](../../testing-connectors/README.md) \ No newline at end of file From 7b36ca65c9f5a35d9b61f2e740aeb62440f5c15b Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 18:02:02 -0700 Subject: [PATCH 14/92] update example --- .../config-based/tutorial/6-testing.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 78e51865e5ad..dbcd7bd56d37 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -94,7 +94,6 @@ Here is the full connector definition for reference: ``` schema_loader: class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema - name: "rates" file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" selector: type: RecordSelector @@ -126,7 +125,7 @@ retriever: name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: - ref: "*ref(selector)" + $ref: "*ref(selector)" paginator: type: NoPagination rates_stream: @@ -136,13 +135,13 @@ rates_stream: cursor_field: "date" primary_key: "date" schema_loader: - ref: "*ref(schema_loader)" + $ref: "*ref(schema_loader)" retriever: - ref: "*ref(retriever)" + $ref: "*ref(retriever)" stream_slicer: - ref: "*ref(stream_slicer)" + $ref: "*ref(stream_slicer)" requester: - ref: "*ref(requester)" + $ref: "*ref(requester)" path: type: "InterpolatedString" string: "{{ stream_slice.start_date }}" @@ -155,7 +154,8 @@ streams: - "*ref(rates_stream)" check: type: CheckStream - stream_names: ["rates"] + stream_names: [ "rates" ] + ``` ## Next steps: From 218bfd9d8690952adb474abf62be95db307b23e7 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 18:08:37 -0700 Subject: [PATCH 15/92] footnote --- docs/connector-development/config-based/overview.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 51e5a0d5a43e..71973ee83967 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -27,7 +27,7 @@ A stream generally corresponds to a resource within the API. They are analogous A stream is defined by: -1. Its name +1. A name 2. A primary key: used to uniquely identify records, enabling deduplication 3. A schema: describes the data to sync 4. A data retriever: describes how to retrieve the data from the API @@ -44,7 +44,7 @@ The data retriever defines how to read the data from an API source, and acts as There is currently only one implementation, the `SimpleRetriever`, which is defined by 1. Requester: describes how to submit requests to the API source -2. Paginator[^1]: describes how to navigate through the API's pages +2. Paginator 1: describes how to navigate through the API's pages 3. Record selector: describes how to select records from an HTTP response 4. Stream Slicer: describes how to partition the stream, enabling incremental syncs and checkpointing @@ -69,7 +69,7 @@ More details on the paginator can be found in the [pagination section](paginatio ## Requester -The `Requester` defines how to prepare HTTP requests to send to the source API [^2]. +The `Requester` defines how to prepare HTTP requests to send to the source API 2. There currently is only one implementation, the `HttpRequester`, which is defined by 1. A base url: the root of the API source @@ -88,5 +88,8 @@ The `ConnectionChecker` defines how to test the connection to the integration. The only implementation as of now is `CheckStream`, which tries to read a record from a specified list of streams and fails if no records could be read. -[^1] The paginator is conceptually more related to the requester than the data retriever, but is part of the `SimpleRetriever` because it inherits from `HttpStream` to increase code reusability. -[^2] As of today, the requester acts as a config object and is not directly responsible for preparing the HTTP requests. This is done in the `SimpleRetriever`'s parent class `HttpStream`. \ No newline at end of file +# Footnotes + +1. The paginator is conceptually more related to the requester than the data retriever, but is part of the `SimpleRetriever` because it inherits from `HttpStream` to increase code reusability. + +2. As of today, the requester acts as a config object and is not directly responsible for preparing the HTTP requests. This is done in the `SimpleRetriever`'s parent class `HttpStream`. \ No newline at end of file From 78599e32f6e286b92476a04d73ccd1f73e17fcd7 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 18:09:33 -0700 Subject: [PATCH 16/92] typo --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 71973ee83967..a340061d2716 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -69,7 +69,7 @@ More details on the paginator can be found in the [pagination section](paginatio ## Requester -The `Requester` defines how to prepare HTTP requests to send to the source API 2. +The `Requester` defines how to prepare HTTP requests to send to the source API 2. There currently is only one implementation, the `HttpRequester`, which is defined by 1. A base url: the root of the API source From c9bfb999a01f89f9ab63e783db9f96dc14ca68a1 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 18:53:49 -0700 Subject: [PATCH 17/92] document string interpolation --- .../config-based/connector-definition.md | 41 +++++++++++++++++-- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index d2ce77f81f48..d11d28bb982d 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -77,6 +77,10 @@ will result in TopLevel(param=ParamType(k="v")) ``` +More details on object instantiation can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=factory#airbyte_cdk.sources.declarative.parsers.factory.DeclarativeComponentFactory). + +### $options + Parameters can be passed down from a parent component to its subcomponents using the $options key. This can be used to avoid repetitions. @@ -102,8 +106,6 @@ outer: In this example, outer.inner.k2 will evaluate to "MyValue" -More details on object instantiation can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=factory#airbyte_cdk.sources.declarative.parsers.factory.DeclarativeComponentFactory). - ## References Strings can contain references to values previously defined. @@ -205,4 +207,37 @@ value: "uh oh" to resolve the ambiguity, we try looking for the reference key at the top level, and then traverse the structs downward until we find a key with the given path, or until there is nothing to traverse. -More details on referencing values can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=yamlparser#airbyte_cdk.sources.declarative.parsers.yaml_parser.YamlParser). \ No newline at end of file +More details on referencing values can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=yamlparser#airbyte_cdk.sources.declarative.parsers.yaml_parser.YamlParser). + +## String interpolation + +String values can be evaluated as Jinja2 templates. + +Interpolation strategy using the Jinja2 template engine. + +If the input string is a raw string, the interpolated string will be the same. +`"hello world" -> "hello world"` + +The engine will evaluate the content passed within {{}}, interpolating the keys from context-specific arguments. +the "options" keyword [see ($options)](connector-definition.md#object-instantiation) can be referenced. + +For example, inner_object.key will evaluate to "Hello airbyte" at runtime. + +``` +some_object: + $options: + name: "airbyte" + inner_object: + key: "Hello {{ options.name }}" +``` + +Some components also pass in additional arguments to the context. +This is the case for the [record selector](record-selector.md), which passes in an additional `response` argument. + +In additional to passing additional values through the kwargs argument, macros can be called from within the string interpolation. +For example, +"{{ max(2, 3) }}" will return 3 + +The macros available can be found [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/interpolation/macros.py). + +Additional information on jinja templating can be found at https://jinja.palletsprojects.com/en/3.1.x/templates/# \ No newline at end of file From 58567c546d42293fa045aeac544cc8862180edff Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 18:56:01 -0700 Subject: [PATCH 18/92] note on string interpolation --- docs/connector-development/config-based/request-options.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/connector-development/config-based/request-options.md b/docs/connector-development/config-based/request-options.md index 727b3f502e33..2fe1a9341039 100644 --- a/docs/connector-development/config-based/request-options.md +++ b/docs/connector-development/config-based/request-options.md @@ -35,6 +35,12 @@ requester: key: value ``` +In addition to $options, the provider can also access the following arguments for [string interpolation](connector-definition.md#string-interpolation): + +- stream_slice +- stream_state +- next_page_token + ## Authenticators It is also possible for authenticators to set request parameters or headers as needed. From 76a95aea4ba1addc73f524d1fe54c8c576819242 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 1 Aug 2022 18:56:58 -0700 Subject: [PATCH 19/92] update --- docs/connector-development/config-based/connector-definition.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index d11d28bb982d..78e9d9dadb90 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -236,7 +236,7 @@ This is the case for the [record selector](record-selector.md), which passes in In additional to passing additional values through the kwargs argument, macros can be called from within the string interpolation. For example, -"{{ max(2, 3) }}" will return 3 +`"{{ max(2, 3) }}" -> 3` The macros available can be found [here](https://github.com/airbytehq/airbyte/blob/master/airbyte-cdk/python/airbyte_cdk/sources/declarative/interpolation/macros.py). From ecf9b34f3759f68fa6bd54d61038c0363e6bcf55 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 2 Aug 2022 09:27:16 -0700 Subject: [PATCH 20/92] fix code sample --- .../config-based/tutorial/3-connecting.md | 55 +++++++++++++++++-- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting.md b/docs/connector-development/config-based/tutorial/3-connecting.md index 7af6bfeed93d..0148aa2c4e63 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting.md +++ b/docs/connector-development/config-based/tutorial/3-connecting.md @@ -86,15 +86,15 @@ requester: ``` rates_stream: type: DeclarativeStream - options: + $options: name: "rates" primary_key: "id" schema_loader: - ref: "*ref(schema_loader)" + $ref: "*ref(schema_loader)" retriever: - ref: "*ref(retriever)" + $ref: "*ref(retriever)" requester: - ref: "*ref(requester)" + $ref: "*ref(requester)" path: "/latest" ``` @@ -122,6 +122,53 @@ request_options_provider: base: "{{ config.base }}" ``` +The full connection definition should now look like + +``` +schema_loader: + type: JsonSchema + file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" +selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "_" +requester: + type: HttpRequester + name: "{{ options['name'] }}" + url_base: "https://api.exchangeratesapi.io/v1/" + http_method: "GET" + request_options_provider: + request_parameters: + access_key: "{{ config.access_key }}" + base: "{{ config.base }}" +retriever: + type: SimpleRetriever + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + $ref: "*ref(selector)" + paginator: + type: NoPagination +rates_stream: + type: DeclarativeStream + $options: + name: "rates" + primary_key: "id" + schema_loader: + $ref: "*ref(schema_loader)" + retriever: + $ref: "*ref(retriever)" + requester: + $ref: "*ref(requester)" + path: "/latest" +streams: + - "*ref(rates_stream)" +check: + type: CheckStream + stream_names: [ "rates" ] +``` + 6. Let's populate the config so the connector can access the access key and base currency. First, we'll add these properties to the connector spec in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` From 990d44ad5e4f0260cad7663c84aba5c808fe5987 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 2 Aug 2022 11:54:57 -0700 Subject: [PATCH 21/92] fix --- .../connector-development/config-based/tutorial/3-connecting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting.md b/docs/connector-development/config-based/tutorial/3-connecting.md index 0148aa2c4e63..dde99e01b2cf 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting.md +++ b/docs/connector-development/config-based/tutorial/3-connecting.md @@ -67,7 +67,7 @@ streams: - "*ref(rates_stream)" check: type: CheckStream - stream_names: ["rates_stream"] + stream_names: ["rates"] ``` 2. Next we'll set the base url. From f9b1b683640dc18b43d62f9d9469181bd1383f8d Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 2 Aug 2022 14:15:24 -0700 Subject: [PATCH 22/92] update sample --- .../config-based/tutorial/6-testing.md | 32 +++---------------- 1 file changed, 5 insertions(+), 27 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index dbcd7bd56d37..13f93fb73170 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -93,33 +93,22 @@ Here is the full connector definition for reference: ``` schema_loader: - class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema + type: JsonSchema file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" selector: type: RecordSelector extractor: type: JelloExtractor - transform: "[_]" + transform: "_" requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" + url_base: https://api.exchangeratesapi.io/v1/ http_method: "GET" request_options_provider: request_parameters: access_key: "{{ config.access_key }}" base: "{{ config.base }}" -stream_slicer: - type: "DatetimeStreamSlicer" - start_datetime: - datetime: "{{ config.start_date }}" - datetime_format: "%Y-%m-%d" - end_datetime: - datetime: "{{ now_local() }}" - datetime_format: "%Y-%m-%d %H:%M:%S.%f" - step: "1d" - datetime_format: "%Y-%m-%d" - cursor_field: "{{ options.cursor_field }}" retriever: type: SimpleRetriever name: "{{ options['name'] }}" @@ -132,30 +121,19 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - cursor_field: "date" - primary_key: "date" + primary_key: "id" schema_loader: $ref: "*ref(schema_loader)" retriever: $ref: "*ref(retriever)" - stream_slicer: - $ref: "*ref(stream_slicer)" requester: $ref: "*ref(requester)" - path: - type: "InterpolatedString" - string: "{{ stream_slice.start_date }}" - default: "/latest" - error_handler: - response_filters: - - predicate: "{{'error' in response}}" - action: FAIL + path: /latest streams: - "*ref(rates_stream)" check: type: CheckStream stream_names: [ "rates" ] - ``` ## Next steps: From 945cc3e266564763892a872dbc1b7c19ec4fa733 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 2 Aug 2022 14:48:51 -0700 Subject: [PATCH 23/92] fix --- docs/connector-development/config-based/tutorial/6-testing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 13f93fb73170..5eaa017c8d26 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -99,7 +99,7 @@ selector: type: RecordSelector extractor: type: JelloExtractor - transform: "_" + transform: "[_]" requester: type: HttpRequester name: "{{ options['name'] }}" From a3349df75e99af7dd698fe6debd8eecc29b15c6f Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 2 Aug 2022 15:25:51 -0700 Subject: [PATCH 24/92] use the actual config --- .../config-based/tutorial/6-testing.md | 40 ++++++++++++++----- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 5eaa017c8d26..b15bb2888c1b 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -93,7 +93,7 @@ Here is the full connector definition for reference: ``` schema_loader: - type: JsonSchema + class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" selector: type: RecordSelector @@ -103,37 +103,59 @@ selector: requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: https://api.exchangeratesapi.io/v1/ + url_base: "https://api.exchangeratesapi.io/v1/" http_method: "GET" request_options_provider: request_parameters: access_key: "{{ config.access_key }}" base: "{{ config.base }}" +stream_slicer: + type: "DatetimeStreamSlicer" + start_datetime: + datetime: "{{ config.start_date }}" + datetime_format: "%Y-%m-%d" + end_datetime: + datetime: "{{ now_local() }}" + datetime_format: "%Y-%m-%d %H:%M:%S.%f" + step: "1d" + datetime_format: "%Y-%m-%d" + cursor_field: "{{ options.cursor_field }}" retriever: type: SimpleRetriever name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: - $ref: "*ref(selector)" + ref: "*ref(selector)" paginator: type: NoPagination rates_stream: type: DeclarativeStream $options: name: "rates" - primary_key: "id" + cursor_field: "date" + primary_key: "date" schema_loader: - $ref: "*ref(schema_loader)" + ref: "*ref(schema_loader)" retriever: - $ref: "*ref(retriever)" + ref: "*ref(retriever)" + stream_slicer: + ref: "*ref(stream_slicer)" requester: - $ref: "*ref(requester)" - path: /latest + ref: "*ref(requester)" + path: + type: "InterpolatedString" + string: "{{ stream_slice.start_date }}" + default: "/latest" + error_handler: + response_filters: + - predicate: "{{'error' in response}}" + action: FAIL streams: - "*ref(rates_stream)" check: type: CheckStream - stream_names: [ "rates" ] + stream_names: ["rates"] + ``` ## Next steps: From 318e613a6187e5accc5afbfa4f224597b46a0d42 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Sun, 7 Aug 2022 12:23:02 -0700 Subject: [PATCH 25/92] Update as per comments --- .../config-based/authentication.md | 27 ++++++------ .../config-based/connector-definition.md | 12 +++--- .../config-based/error-handling.md | 2 +- .../config-based/overview.md | 41 ++++++++----------- .../config-based/pagination.md | 39 ++++++++++++++---- .../config-based/record-selector.md | 17 ++++++-- .../config-based/request-options.md | 2 +- .../config-based/stream-slicers.md | 14 ++++--- .../tutorial/0-getting-started.md | 3 +- .../config-based/tutorial/1-create-source.md | 4 +- .../tutorial/2-install-dependencies.md | 7 ++-- ...g.md => 3-connecting-to-the-API-source.md} | 17 +++++--- .../config-based/tutorial/4-reading-data.md | 14 +------ .../tutorial/5-incremental-reads.md | 37 +++++++++++------ .../config-based/tutorial/6-testing.md | 4 +- 15 files changed, 139 insertions(+), 101 deletions(-) rename docs/connector-development/config-based/tutorial/{3-connecting.md => 3-connecting-to-the-API-source.md} (88%) diff --git a/docs/connector-development/config-based/authentication.md b/docs/connector-development/config-based/authentication.md index 640f590ae11b..a856a325696b 100644 --- a/docs/connector-development/config-based/authentication.md +++ b/docs/connector-development/config-based/authentication.md @@ -34,12 +34,6 @@ More information on bearer authentication can be found [here](https://swagger.io The `BasicHttpAuthenticator` set the "Authorization" header with a (USER ID/password) pair, encoded using base64 as per [RFC 7617](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme). The following definition will set the header "Authorization" with a value "Basic " -The encoding scheme is: - -1. concatenate the username and the password with `":"` in between -2. Encode the resulting string in base 64 -3. Decode the result in utf8 - ``` authenticator: type: "BasicHttpAuthenticator" @@ -61,10 +55,19 @@ OAuth authentication is supported through the `OAuthAuthenticator`, which requir - token_refresh_endpoint: The endpoint to refresh the access token - client_id: The client id -- client_secret: Client secret +- client_secret: The client secret - refresh_token: The token used to refresh the access token -- scopes: The scopes to request -- token_expiry_date: The access token expiration date -- access_token_name: THe field to extract access token from in the response -- expires_in_name:The field to extract expires_in from in the response -- refresh_request_body: The request body to send in the refresh request \ No newline at end of file +- scopes (Optional): The scopes to request. Default: Empty list +- token_expiry_date (Optional): The access token expiration date formatted as RFC-3339 ("%Y-%m-%dT%H:%M:%S.%f%z") +- access_token_name (Optional): The field to extract access token from in the response. Default: "access_token". +- expires_in_name (Optional): The field to extract expires_in from in the response. Default: "expires_in" +- refresh_request_body (Optional): The request body to send in the refresh request. Default: None + +``` +authenticator: + type: "OAuthAuthenticator" + token_refresh_endpoint: "https://api.searchmetrics.com/v4/token" + client_id: "{{ config.api_key }}" + client_secret: "{{ config.client_secret }}" + refresh_token: "" +``` \ No newline at end of file diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index 78e9d9dadb90..5b77eb788911 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -43,7 +43,7 @@ an object of type "class_name" will be instantiated by passing the mapping's oth will result in ``` -fully_qualified.class_name(a_parameter=3, another_parameter="helo" +fully_qualified.class_name(a_parameter=3, another_parameter="hello" ``` If the component definition is a mapping with a "type" field, @@ -104,11 +104,11 @@ outer: k2: "MyKey is {{ options.MyKey }}" ``` -In this example, outer.inner.k2 will evaluate to "MyValue" +In this example, outer.inner.k2 will evaluate to "MyKey is MyValue" ## References -Strings can contain references to values previously defined. +Strings can contain references to previously defined values. The parser will dereference these values to produce a complete ConnectionDefinition References can be defined using a *ref() string. @@ -204,7 +204,7 @@ nested.path: "uh oh" value: "uh oh" ``` -to resolve the ambiguity, we try looking for the reference key at the top level, and then traverse the structs downward +To resolve the ambiguity, we try looking for the reference key at the top level, and then traverse the structs downward until we find a key with the given path, or until there is nothing to traverse. More details on referencing values can be found [here](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.parsers.html?highlight=yamlparser#airbyte_cdk.sources.declarative.parsers.yaml_parser.YamlParser). @@ -213,12 +213,10 @@ More details on referencing values can be found [here](https://airbyte-cdk.readt String values can be evaluated as Jinja2 templates. -Interpolation strategy using the Jinja2 template engine. - If the input string is a raw string, the interpolated string will be the same. `"hello world" -> "hello world"` -The engine will evaluate the content passed within {{}}, interpolating the keys from context-specific arguments. +The engine will evaluate the content passed within `{{...}}`, interpolating the keys from context-specific arguments. the "options" keyword [see ($options)](connector-definition.md#object-instantiation) can be referenced. For example, inner_object.key will evaluate to "Hello airbyte" at runtime. diff --git a/docs/connector-development/config-based/error-handling.md b/docs/connector-development/config-based/error-handling.md index db52c90eb16e..2979cb8e6241 100644 --- a/docs/connector-development/config-based/error-handling.md +++ b/docs/connector-development/config-based/error-handling.md @@ -74,7 +74,7 @@ requester: ## Backoff Strategies -The error handle supports a few backoff strategies, which are described in the following sections. +The error handler supports a few backoff strategies, which are described in the following sections. ### Exponential backoff diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index a340061d2716..499ec7588e30 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -23,34 +23,35 @@ A source is defined by 2 components: ## Stream Streams define the schema of the data to sync, as well as how to read it from the underlying API source. -A stream generally corresponds to a resource within the API. They are analogous to tables for a RDMS source. +A stream generally corresponds to a resource within the API. They are analogous to tables for a relational database source. A stream is defined by: 1. A name -2. A primary key: used to uniquely identify records, enabling deduplication -3. A schema: describes the data to sync -4. A data retriever: describes how to retrieve the data from the API -5. A cursor field: used to identify the stream's state from a record -6. A set of transformations to be applied on the records read from the source before emitting them to the destination -7. A checkpoint interval: defines when to checkpoint syncs. +2. Primary key (Optional): Used to uniquely identify records, enabling deduplication. Can be a string for single primary keys, a list of strings for composite primary keys, or a list of list of strings for composite primary keys consisting of nested fields. +3. Schema: describes the data to sync +4. [Data retriever](overview.md#data-retriever): Describes how to retrieve the data from the API +5. [Cursor field](../cdk-python/incremental-stream.md) (Optional): Field to use used as stream cursor. Can either be a string, or a list of strings if the cursor is a nested field. +6. Transformations (Optional): A set of transformations to be applied on the records read from the source before emitting them to the destination +7. [Checkpoint interval](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#state--checkpointing) (Optional): Defines the interval at which incremental syncs should be checkpointed. More details on streams and sources can be found in the [basic concepts section](../cdk-python/basic-concepts.md). -More details on cursor fields, and checkpointing can be found in the [incremental-stream section](../cdk-python/incremental-stream.md) ## Data retriever -The data retriever defines how to read the data from an API source, and acts as an orchestrator for the data retrieval flow. +The data retriever defines how to read the data for a Stream, and acts as an orchestrator for the data retrieval flow. There is currently only one implementation, the `SimpleRetriever`, which is defined by -1. Requester: describes how to submit requests to the API source -2. Paginator 1: describes how to navigate through the API's pages +1. Requester: Describes how to submit requests to the API source +2. Paginator: describes how to navigate through the API's pages 3. Record selector: describes how to select records from an HTTP response 4. Stream Slicer: describes how to partition the stream, enabling incremental syncs and checkpointing Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. The developer can choose and configure the implementation they need depending on specifications of the integrations they are building against. +Since the `Retriever` is defined as part of the Stream configuration, different Streams for a given Source can use different `Retriever` definitions if needed. + ### Data flow The retriever acts as a coordinator, moving the data between its components before emitting `AirbyteMessage`s that can be read by the platform. @@ -72,12 +73,12 @@ More details on the paginator can be found in the [pagination section](paginatio The `Requester` defines how to prepare HTTP requests to send to the source API 2. There currently is only one implementation, the `HttpRequester`, which is defined by -1. A base url: the root of the API source -2. A path: the specific endpoint to fetch data from for a resource +1. A base url: The root of the API source +2. A path: The specific endpoint to fetch data from for a resource 3. The HTTP method: the HTTP method to use (GET or POST) -4. A request options provider: defines the request parameters and headers to set on outgoing HTTP requests -5. An authenticator: defines how to authenticate to the source -6. An error handler: defines how to handle errors +4. A request options provider: Defines the request parameters (query parameters), headers, and request body to set on outgoing HTTP requests +5. An authenticator: Defines how to authenticate to the source +6. An error handler: Defines how to handle errors More details on authentication can be found in the [authentication section](authentication.md). More details on error handling can be found in the [error handling section](error-handling.md) @@ -86,10 +87,4 @@ More details on error handling can be found in the [error handling section](erro The `ConnectionChecker` defines how to test the connection to the integration. -The only implementation as of now is `CheckStream`, which tries to read a record from a specified list of streams and fails if no records could be read. - -# Footnotes - -1. The paginator is conceptually more related to the requester than the data retriever, but is part of the `SimpleRetriever` because it inherits from `HttpStream` to increase code reusability. - -2. As of today, the requester acts as a config object and is not directly responsible for preparing the HTTP requests. This is done in the `SimpleRetriever`'s parent class `HttpStream`. \ No newline at end of file +The only implementation as of now is `CheckStream`, which tries to read a record from a specified list of streams and fails if no records could be read. \ No newline at end of file diff --git a/docs/connector-development/config-based/pagination.md b/docs/connector-development/config-based/pagination.md index 97b6c243318e..09ffb8cd4c0d 100644 --- a/docs/connector-development/config-based/pagination.md +++ b/docs/connector-development/config-based/pagination.md @@ -1,17 +1,17 @@ -# HTTP Requester +# Pagination Given a page size and a pagination strategy, the `LimitPaginator` will point to pages of results for as long as its strategy returns a `next_page_token`. Iterating over pages of result is different from iterating over stream slices. -Stream slices have semantic value, for instance, a Datetime stream slice defines data for a specific date range. 2 stream slices will have data for different date ranges. +Stream slices have semantic value, for instance, a Datetime stream slice defines data for a specific date range. Two stream slices will have data for different date ranges. Conversely, pages don't have semantic value. More pages simply means that more records are to be read, without specifying any meaningful difference between the records of the first and later pages. The paginator is defined by -- page size: the number of records to fetch in a single request -- limit_option: how to specify the page size in the outgoing HTTP request -- pagination_strategy: how to compute the next page to fetch -- page_token_option: how to specify the next page to fetch in the outgoing HTTP request +- `page_size`: The number of records to fetch in a single request +- `limit_option`: How to specify the page size in the outgoing HTTP request +- `pagination_strategy`: How to compute the next page to fetch +- `page_token_option`: How to specify the next page to fetch in the outgoing HTTP request 3 pagination strategies are supported @@ -41,6 +41,13 @@ paginator: field_name: "page" ``` +If the page contains less than 5 records, then the paginator knows there are no more pages to fetch. +If the API returns more records than requested, all records will be processed. + +Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`, +the first request will be sent as `https://cloud.airbyte.com/api/get_data?page_size=5&page=0` +and the second request as `https://cloud.airbyte.com/api/get_data?page_size=5&page=1`, + ### Offset increment When using the `OffsetIncrement` strategy, the number of records read will be set as part of the `page_token_option`. @@ -55,13 +62,17 @@ paginator: option_type: request_parameter field_name: page_size pagination_strategy: - type: "PageIncrement" + type: "OffsetIncrement" page_token: - field_name: "page" + field_name: "offset" inject_into: "request_parameter" ``` +Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`, +the first request will be sent as `https://cloud.airbyte.com/api/get_data?page_size=5&offset=0` +and the second request as `https://cloud.airbyte.com/api/get_data?page_size=5&offset=5`, + ### Cursor The `CursorPaginationStrategy` outputs a token by evaluating its `cursor_value` string with the following parameters: @@ -86,6 +97,11 @@ paginator: inject_into: "request_parameter" ``` +Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`, +the first request will be sent as `https://cloud.airbyte.com/api/get_data` +Assuming the id of the last record fetched is 1000, +the next request will be sent as `https://cloud.airbyte.com/api/get_data?from=1000` + Some APIs directly point to the URL of the next page to fetch. In this example, the URL of the next page is extracted from the response headers: ``` @@ -97,4 +113,9 @@ paginator: cursor_value: "{{ headers.urls.next }}" page_token: inject_into: "path" -``` \ No newline at end of file +``` + +Assuming the endpoint to fetch data from is `https://cloud.airbyte.com/api/get_data`, +the first request will be sent as `https://cloud.airbyte.com/api/get_data` +Assuming the response's next url is `https://cloud.airbyte.com/api/get_data?page=1&page_size=100`, +the next request will be sent as `https://cloud.airbyte.com/api/get_data?page=1&page_size=100` \ No newline at end of file diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index 5f9119736b2d..26dae3f87afa 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -1,9 +1,8 @@ # Record selector -The record selector is responsible for translating an HTTP response into a list of records by extracting records from the response and optionally filtering -records based on a heuristic. +The record selector is responsible for translating an HTTP response into a list of Airbyte records by extracting records from the response and optionally filtering and shaping records based on a heuristic. -The current record selector implementation uses Jello to select record sfrom the json-decoded HTTP response. +The current record selector implementation uses Jello to select records from the json-decoded HTTP response. More information on Jello can be found at https://github.com/kellyjonbrazil/jello ## Common recipes: @@ -46,6 +45,18 @@ stream: value: "static_value" ``` +This example adds a top-level field "start_date", whose value is evaluated from the stream slice: + +``` +stream: + <...> + transformations: + - type: AddFields + fields: + - path: ["start_date"] + value: {{ stream_slice.start_date }} +``` + Fields can also be added in a nested object by writing the fields' path as a list. Given a record of the following shape: diff --git a/docs/connector-development/config-based/request-options.md b/docs/connector-development/config-based/request-options.md index 2fe1a9341039..ccf3611cf558 100644 --- a/docs/connector-development/config-based/request-options.md +++ b/docs/connector-development/config-based/request-options.md @@ -1,6 +1,6 @@ # Request Options -There are a few ways request parameters, headers, and body can be set on ongoing HTTP requests. +There are a few ways to set request parameters, headers, and body on ongoing HTTP requests. ## Request Options Provider diff --git a/docs/connector-development/config-based/stream-slicers.md b/docs/connector-development/config-based/stream-slicers.md index 29e950bcd8d8..df98c294e030 100644 --- a/docs/connector-development/config-based/stream-slicers.md +++ b/docs/connector-development/config-based/stream-slicers.md @@ -4,7 +4,7 @@ It can be thought of as an iterator over the stream's data, where a `StreamSlice` is the retriever's unit of work. -When a stream is read incrementally, a state message will be output by the connector after reading every slice, which enable checkpointing. +When a stream is read incrementally, a state message will be output by the connector after reading every slice, which allows for [checkpointing](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#state--checkpointing). At the beginning of a `read` operation, the `StreamSlicer` will compute the slices to sync given the connection config and the stream's current state, As the `Retriever` reads data from the `Source`, the `StreamSlicer` keeps track of the `Stream`'s state, which will be emitted after reading each stream slice. @@ -18,6 +18,8 @@ This section gives an overview of the stream slicers currently implemented. ### Datetime The `DatetimeStreamSlicer` iterates over a datetime range by partitioning it into time windows. +This is done by slicing the stream on the records' cursor value, defined by the Stream's `cursor_field`. + Given a start time, an end time, and a step function, it will partition the interval [start, end] into small windows of the size described by the step. For instance, @@ -59,7 +61,7 @@ When reading data from the source, the cursor value will be updated to the max d - the current cursor value This ensures that the cursor will be updated even if a stream slice does not contain any data. -#### Specifying query start and end time +#### Stream slicer on dates If an API supports filtering data based on the cursor field, the `start_time_option` and `end_time_option` parameters can be used to configure this filtering. For instance, if the API supports filtering using the request parameters `created[gte]` and `created[lte]`, then the stream slicer can specify the request parameters as @@ -76,7 +78,7 @@ stream_slicer: inject_into: "request_parameter" ``` -### List +### List stream slicer `ListStreamSlicer` iterates over values from a given list. It is defined by @@ -99,7 +101,7 @@ stream_slicer: inject_into: "request_parameter" ``` -### Cartesian Product +### Cartesian Product stream slicer `CartesianProductStreamSlicer` iterates over the cartesian product of its underlying stream slicers. @@ -117,14 +119,14 @@ the resulting stream slices are ] ``` -### Substream +### Substream slicer `SubstreamSlicer` iterates over the parent's stream slices. This is useful for defining sub-resources. We might for instance want to read all the commits for a given repository (parent resource). -For each parent stream, the slicer needs to know +For each stream, the slicer needs to know - what the parent stream is - what is the key of the records in the parent stream diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index aa990f8a501f..8b8407ba8c2b 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -2,7 +2,7 @@ ## Summary -Throughout this tutorial, we'll walk you through the creation an Airbyte source to read data from an HTTP API. +Throughout this tutorial, we'll walk you through the creation an Airbyte source to read and extract data from an HTTP API. We'll build a connector reading data from the Exchange Rates API, but the steps will apply to other HTTP APIs you might be interested in integrating with. @@ -33,6 +33,7 @@ This can be done by signing up for the Free tier plan on [Exchange Rates API](ht ## Requirements +- An Exchange Rates API key - Python >= 3.9 - Docker - NodeJS diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index f7929502a807..c8c3ed28e9d7 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -1,4 +1,4 @@ -# Step 1: Create the Source +# Step 1: Generate the source connector project locally Let's start by cloning the Airbyte repository @@ -20,7 +20,7 @@ Configuration Based Source Source name: exchange-rates-tutorial ``` -For this walkthrough, we'll refer to our source as `exchange-rates-tutorial`. The complete source code for this tutorial can be found here [args]`. @@ -36,7 +37,7 @@ The module's generated `README.md` contains more details on the supported comman ## Next steps -Next, we'll [connect to the API source](3-connecting.md) +Next, we'll [connect to the API source](3-connecting-to-the-API-source.md) ## More readings diff --git a/docs/connector-development/config-based/tutorial/3-connecting.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md similarity index 88% rename from docs/connector-development/config-based/tutorial/3-connecting.md rename to docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index dde99e01b2cf..ef1530854fd9 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -4,6 +4,8 @@ We're now ready to start implementing the connector. The code generator already created a boilerplate connector definition in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` +More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../connector-definition.md) sections. + ``` schema_loader: type: JsonSchema @@ -51,13 +53,14 @@ check: Let's fill this out these TODOs with the information found in the [Exchange Rates API docs](https://exchangeratesapi.io/documentation/) -1. First, let's rename the stream from `customers` to `rates. +1. First, let's rename the stream from `customers` to `rates`, and update the primary key to `date` ``` rates_stream: type: DeclarativeStream - options: + $options: name: "rates" + primary_key: "date" ``` and update the references in the streams list and check block @@ -78,7 +81,7 @@ check: requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" + url_base: "https://api.exchangeratesapi.io/v1/" # Only change the url_base field ``` 3. We can fetch the latest data by submitting a request to "/latest". This path is specific to the stream, so we'll set within the `rates_stream` definition. @@ -88,7 +91,7 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - primary_key: "id" + primary_key: "date" schema_loader: $ref: "*ref(schema_loader)" retriever: @@ -99,7 +102,7 @@ rates_stream: ``` 4. Next, we'll set up the authentication. - The Exchange Rates API requires an access key, which we'll need to make accessible to our connector. + The Exchange Rates API requires an access key to be passed as request parameter. We'll need to make this access key accessible to our connector, and pass it as a request_parameter in the `request_parameters` field of the `request_options_provider` We'll configure the connector to use this access key by setting the access key in a request parameter and pointing to a field in the config, which we'll populate in the next step: ``` @@ -113,6 +116,8 @@ requester: access_key: "{{ config.access_key }}" ``` +Since the access key is set directly as a request parameter, we can remove the `authentication` field from the `requester`. + 5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter: ``` @@ -154,7 +159,7 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - primary_key: "id" + primary_key: "date" schema_loader: $ref: "*ref(schema_loader)" retriever: diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index a1ae0e74814d..63ffa89fbd14 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -1,4 +1,4 @@ -# Step 3: Reading data +# Step 4: Reading data Now that we're able to authenticate to the source API, we'll want to select data from the HTTP responses. Let's first add the stream to the configured catalog in `source-exchange_rates-tutorial/integration_tests/configured_catalog.json` @@ -39,7 +39,7 @@ rm source_exchange_rates_tutorial/schemas/customers.json rm source_exchange_rates_tutorial/schemas/employees.json ``` -Next, we'll update the record selection to wrap the single record returned by the source in an array. +Next, we'll update the record selection to wrap the single record returned by the source in an array in `source_exchange_rates_tutorial/exchange_rates_tutorial.yamlsource_exchange_rates_tutorial/exchange_rates_tutorial.yaml` ``` selector: @@ -51,16 +51,6 @@ selector: The transform is defined using the `Jello` syntax, which is a Python-based JQ alternative. More details on Jello can be found [here](https://github.com/kellyjonbrazil/jello). -We'll also set the primary key to `date`. - -``` -rates_stream: - type: DeclarativeStream - options: - name: "rates" - primary_key: "date" -``` - Here is the complete connector definition for convenience: ``` diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index 9e12d063e1f4..c107e5a5d09d 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -44,26 +44,34 @@ connectionSpecification: - USD ``` -Then we'll set the `start_date` to last week our connection config in `secrets/config.json`. -The following `echo` command will update your config with a start date set at 7 days prior to today. +Then we'll set the `start_date` to last week in our connection config in `secrets/config.json`. +Let's add a start_date field to `secrets/config.json`. +The file should look like ``` -echo "{\"access_key\": \"\", \"start_date\": \"$(date -v -7d '+%Y-%m-%d')\", \"base\": \"USD\"}" > secrets/config.json +{ + "access_key": "", + "start_date": "2022-07-26", + "base": "USD" +} ``` +where the start date should be 7 days in the past. + And we'll update the `path` in the connector definition to point to `/{{ config.start_date }}`. Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/latest`: ``` retriever: requester: - path: - type: "InterpolatedString" - string: "{{ config.start_date }}" - default: "/latest" + $ref: "*ref(requester)" + path: + type: "InterpolatedString" + string: "/{{ config.start_date }}" + default: "/latest" ``` -You can test the connector by executing the `read` operation: +You can test these changes by executing the `read` operation: ```python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json``` @@ -92,21 +100,24 @@ stream_slicer: datetime_format: "%Y-%m-%d" ``` -and refer to it in the stream's retriever. Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: +and refer to it in the stream's retriever. +This will generate slices from the start date until the current date, where each slice is exactly one day. + +Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: ``` rates_stream: type: DeclarativeStream - options: + $options: name: "rates" cursor_field: "date primary_key: "date" schema_loader: - ref: "*ref(schema_loader)" + $ref: "*ref(schema_loader)" retriever: - ref: "*ref(retriever)" + $ref: "*ref(retriever)" stream_slicer: - ref: "*ref(stream_slicer)" + $ref: "*ref(stream_slicer)" ``` And we'll update the path to point to the `stream_slice`'s start_date diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index b15bb2888c1b..1fb1559ceec0 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -1,7 +1,7 @@ -# Step 5: Testing +# Step 6: Testing We should make sure the connector respects the Airbyte specifications before we start using it in production. -This can be done by executing the Source-Acceptance Tests (SAT). +This can be done by executing the Source Acceptance Tests (SAT). These tests will assert the most basic functionalities work as expected and are configured in `acceptance-test-config`. From 9cc1e4bd31420484c0ccd614f54b4298a64a47b6 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:18:45 -0700 Subject: [PATCH 26/92] write as yaml --- .../config-based/connector-definition.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index 5b77eb788911..6a22ca23b143 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -33,11 +33,10 @@ If the component is a mapping with a "class_name" field, an object of type "class_name" will be instantiated by passing the mapping's other fields to the constructor ``` -{ - "class_name": "fully_qualified.class_name", - "a_parameter: 3, - "another_parameter: "hello" -} +my_component: + class_name: "fully_qualified.class_name" + a_parameter: 3 + another_parameter: "hello" ``` will result in From f096296f3ab7dad0376617cdca13985bfa647dab Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:20:42 -0700 Subject: [PATCH 27/92] typo --- .../python/airbyte_cdk/sources/declarative/parsers/factory.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/factory.py b/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/factory.py index 6303b05ca1f8..3119fd73bcd5 100644 --- a/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/factory.py +++ b/airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/factory.py @@ -52,7 +52,7 @@ class DeclarativeComponentFactory: If the component definition is a mapping with neither a "class_name" nor a "type" field, the factory will do a best-effort attempt at inferring the component type by looking up the parent object's constructor type hints. If the type hint is an interface present in `DEFAULT_IMPLEMENTATIONS_REGISTRY`, - then the factory will create an object of it's default implementation. + then the factory will create an object of its default implementation. If the component definition is a list, then the factory will iterate over the elements of the list, instantiate its subcomponents, and return a list of instantiated objects. From 8bd35b49b0d4f2a13328ba67512a611c1979899a Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:24:41 -0700 Subject: [PATCH 28/92] Clarify options overloading --- .../config-based/connector-definition.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index 6a22ca23b143..409942b1bb35 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -93,6 +93,20 @@ outer: This the example above, if both outer and inner are types with a "MyKey" field, both of them will evaluate to "MyValue". +These parameters can be overwritten by subcomponents as a form of specialization: + +``` +outer: + $options: + MyKey: MyValue + inner: + $options: + MyKey: YourValue + k2: v2 +``` + +In this example, "outer.MyKey" will evaluate to "MyValue", and "inner.MyKey" will evaluate to "YourValue". + The value can also be used for string interpolation: ``` From cfb4528e7e68d57b51c385f4ce2608d8ba9b0cda Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:29:33 -0700 Subject: [PATCH 29/92] clarify that docker must be running --- .../config-based/tutorial/0-getting-started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index 8b8407ba8c2b..8f16057eb8e2 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -35,7 +35,7 @@ This can be done by signing up for the Free tier plan on [Exchange Rates API](ht - An Exchange Rates API key - Python >= 3.9 -- Docker +- Docker must be running - NodeJS ## Next Steps From 85d5afb955405a5514d942add2bfdf5d1d93ce41 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:29:48 -0700 Subject: [PATCH 30/92] remove extra footnote --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 499ec7588e30..af549282d483 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -70,7 +70,7 @@ More details on the paginator can be found in the [pagination section](paginatio ## Requester -The `Requester` defines how to prepare HTTP requests to send to the source API 2. +The `Requester` defines how to prepare HTTP requests to send to the source API. There currently is only one implementation, the `HttpRequester`, which is defined by 1. A base url: The root of the API source From 61a75b53316f4b15363dadc2a95cff2eb986ab58 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:36:49 -0700 Subject: [PATCH 31/92] use venv directly --- .../config-based/tutorial/2-install-dependencies.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md index 0c57382c82f0..d5224c9d0a81 100644 --- a/docs/connector-development/config-based/tutorial/2-install-dependencies.md +++ b/docs/connector-development/config-based/tutorial/2-install-dependencies.md @@ -8,9 +8,10 @@ If this is the case on your machine, substitute the `python` commands with `pyth The subsequent `python` invocations will use the virtual environment created for the connector. ``` -python tools/bin/update_intellij_venv.py -modules source-exchange-rates-tutorial --install-venv -cd airbyte-integrations/connectors/source-exchange-rates-tutorial +cd ../../connectors/source-exchange-rates-tutorial +python -m venv .venv source .venv/bin/activate +pip install -r requirements.txt ``` These steps create an initial python environment (using `python -m venv`), and install the dependencies required to run an API Source connector (using `pip install`). From 7e1dc959dff79132c390bf1e190ced67108571a9 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 14:40:11 -0700 Subject: [PATCH 32/92] Apply suggestions from code review Co-authored-by: Sherif A. Nada --- .../config-based/connector-definition.md | 11 ++++++++++- .../tutorial/3-connecting-to-the-API-source.md | 4 ++-- .../config-based/tutorial/4-reading-data.md | 2 +- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index 409942b1bb35..5122490c94e0 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -9,6 +9,15 @@ Connectors are defined as a yaml configuration describing the connector's Source The configuration will be validated against this JSON Schema, which defines the set of valid properties. +The general structure of the YAML is as follows: +``` + +streams: + +check: + +``` + We recommend using the `Configuration Based Source` template from the template generator in `airbyte-integrations/connector-templates/generator` to generate the basic file structure. See the [tutorial for a complete connector definition](tutorial/6-testing.md) @@ -42,7 +51,7 @@ my_component: will result in ``` -fully_qualified.class_name(a_parameter=3, another_parameter="hello" +fully_qualified.class_name(a_parameter=3, another_parameter="hello") ``` If the component definition is a mapping with a "type" field, diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index ef1530854fd9..288f03f256a8 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -116,9 +116,9 @@ requester: access_key: "{{ config.access_key }}" ``` -Since the access key is set directly as a request parameter, we can remove the `authentication` field from the `requester`. +Since the access key is set directly as a request parameter, we can remove the `authenticator` field from the `requester`. -5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter: +5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter. Let's assume the user will configure this via the connector configuration in parameter called `base`; we'll pass the value input by the user as a request parameter: ``` request_options_provider: diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 63ffa89fbd14..31d1364ad7ef 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -46,7 +46,7 @@ selector: type: RecordSelector extractor: type: JelloExtractor - transform: "[_]" + transform: "[_]" # wrap the single record returned by the API in an array ``` The transform is defined using the `Jello` syntax, which is a Python-based JQ alternative. More details on Jello can be found [here](https://github.com/kellyjonbrazil/jello). From 3df507169dbfbe67bcc8b35d906886947282c0e1 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 15:37:55 -0700 Subject: [PATCH 33/92] signup instructions --- .../config-based/tutorial/0-getting-started.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index 8f16057eb8e2..cd1c19da9895 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -29,7 +29,12 @@ The output schema of our stream will look like the following: ## Exchange Rates API Setup Before we can get started, you'll need to generate an API access key for the Exchange Rates API. -This can be done by signing up for the Free tier plan on [Exchange Rates API](https://exchangeratesapi.io/). +This can be done by signing up for the Free tier plan on [Exchange Rates API](https://exchangeratesapi.io/): + +1. Visit https://exchangeratesapi.io and click "Get free API key" on the top right +2. You'll be taken to https://apilayer.com -- finish the sign up process, signing up for the free tier +3. Once you're signed in, visit https://apilayer.com/marketplace/exchangerates_data-api#documentation-tab and click "Live Demo" +4. Inside that editor, you'll see an API key. This is your API key. ## Requirements From b0748323f5f48f036dc5d4299ecb93012fd90a77 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 15:53:56 -0700 Subject: [PATCH 34/92] update --- .../config-based/overview.md | 10 +++++----- .../config-based/pagination.md | 2 +- .../config-based/record-selector.md | 16 ++++++++++++++++ .../tutorial/3-connecting-to-the-API-source.md | 9 +++++++++ .../config-based/tutorial/6-testing.md | 2 +- 5 files changed, 32 insertions(+), 7 deletions(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index af549282d483..81117e191801 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -29,10 +29,10 @@ A stream is defined by: 1. A name 2. Primary key (Optional): Used to uniquely identify records, enabling deduplication. Can be a string for single primary keys, a list of strings for composite primary keys, or a list of list of strings for composite primary keys consisting of nested fields. -3. Schema: describes the data to sync +3. [Schema](../cdk-python/schemas.md): Describes the data to sync 4. [Data retriever](overview.md#data-retriever): Describes how to retrieve the data from the API 5. [Cursor field](../cdk-python/incremental-stream.md) (Optional): Field to use used as stream cursor. Can either be a string, or a list of strings if the cursor is a nested field. -6. Transformations (Optional): A set of transformations to be applied on the records read from the source before emitting them to the destination +6. [Transformations](./record-selector.md#transformations) (Optional): A set of transformations to be applied on the records read from the source before emitting them to the destination 7. [Checkpoint interval](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#state--checkpointing) (Optional): Defines the interval at which incremental syncs should be checkpointed. More details on streams and sources can be found in the [basic concepts section](../cdk-python/basic-concepts.md). @@ -43,9 +43,9 @@ The data retriever defines how to read the data for a Stream, and acts as an orc There is currently only one implementation, the `SimpleRetriever`, which is defined by 1. Requester: Describes how to submit requests to the API source -2. Paginator: describes how to navigate through the API's pages -3. Record selector: describes how to select records from an HTTP response -4. Stream Slicer: describes how to partition the stream, enabling incremental syncs and checkpointing +2. Paginator: Describes how to navigate through the API's pages +3. Record selector: Describes how to select records from an HTTP response +4. Stream Slicer: Describes how to partition the stream, enabling incremental syncs and checkpointing Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. The developer can choose and configure the implementation they need depending on specifications of the integrations they are building against. diff --git a/docs/connector-development/config-based/pagination.md b/docs/connector-development/config-based/pagination.md index 09ffb8cd4c0d..cecc3b687b1e 100644 --- a/docs/connector-development/config-based/pagination.md +++ b/docs/connector-development/config-based/pagination.md @@ -77,7 +77,7 @@ and the second request as `https://cloud.airbyte.com/api/get_data?page_size=5&of The `CursorPaginationStrategy` outputs a token by evaluating its `cursor_value` string with the following parameters: -- `response`: decoded response +- `response`: The decoded response - `headers`: HTTP headers on the response - `last_records`: List of records selected from the last response diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index 26dae3f87afa..a3879aa88069 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -3,6 +3,7 @@ The record selector is responsible for translating an HTTP response into a list of Airbyte records by extracting records from the response and optionally filtering and shaping records based on a heuristic. The current record selector implementation uses Jello to select records from the json-decoded HTTP response. +The record selection uses Python syntax, where `_` means top of the object. See [common recipes](#common-recipes). More information on Jello can be found at https://github.com/kellyjonbrazil/jello ## Common recipes: @@ -11,6 +12,21 @@ More information on Jello can be found at https://github.com/kellyjonbrazil/jell 2. Wrapping the whole json object in an array can be done with `[_]` 3. Inner fields can be selected by referring to it with the dot-notation: `_.data` will return the data field +Given a json object of the form + +``` +{ + "data": [{"id": 0}, {"id": 1}], + "metadata": {"api-version": "1.0.0"} +} +``` + +and a selector `_.data`, will produce the following: + +``` +[{"id": 0}, {"id": 1}] +``` + ## Filtering records Records can be filtered by adding a record_filter to the selector. diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 288f03f256a8..2dc3fcb439ad 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -2,6 +2,15 @@ We're now ready to start implementing the connector. +Over the course of this tutorial, we'll be editing a few files that were generated by the code generator: + +- `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml`: This is the [spec file](../../connector-specification-reference.md). It describes the inputs used to configure the connector. +- `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: This is the connector definition. It describes how the data should be read from the API source. + We'll also be creating the following files: +- `source-exchange-rates-tutorial/secrets/config.json`: This is the configuration file we'll be using to test the connector. It's schema should match the schema defined in the spec file. +- `source_exchange_rates_tutorial/schemas/rates.json`: This is the [schema definition](../../cdk-python/schemas.md) for the stream we'll implement. +- `source-exchange-rates/acceptance-test-config.yml`: This is the [acceptance test configuration file](../../testing-connectors/README.md). It describes the integration tests to be used to verify that the connector works as expected. + The code generator already created a boilerplate connector definition in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../connector-definition.md) sections. diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 1fb1559ceec0..284ec7420123 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -3,7 +3,7 @@ We should make sure the connector respects the Airbyte specifications before we start using it in production. This can be done by executing the Source Acceptance Tests (SAT). -These tests will assert the most basic functionalities work as expected and are configured in `acceptance-test-config`. +These tests will assert the most basic functionalities work as expected and are configured in `acceptance-test-config.yml`. Before running the tests, we'll create an invalid config to make sure the `check` operation fails if the credentials are wrong, and an abnormal state to verify the connector's behavior when running with an abnormal state. From 672eb163b0a5641d7a129c05b373462e55889772 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 15:58:19 -0700 Subject: [PATCH 35/92] clarify that both dot and bracket notations are interchangeable --- .../config-based/connector-definition.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/connector-definition.md index 5122490c94e0..fd70d809e258 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/connector-definition.md @@ -9,7 +9,8 @@ Connectors are defined as a yaml configuration describing the connector's Source The configuration will be validated against this JSON Schema, which defines the set of valid properties. -The general structure of the YAML is as follows: +The general structure of the YAML is as follows: + ``` streams: @@ -241,7 +242,7 @@ If the input string is a raw string, the interpolated string will be the same. The engine will evaluate the content passed within `{{...}}`, interpolating the keys from context-specific arguments. the "options" keyword [see ($options)](connector-definition.md#object-instantiation) can be referenced. -For example, inner_object.key will evaluate to "Hello airbyte" at runtime. +For example, some_object.inner_object.key will evaluate to "Hello airbyte" at runtime. ``` some_object: @@ -254,6 +255,12 @@ some_object: Some components also pass in additional arguments to the context. This is the case for the [record selector](record-selector.md), which passes in an additional `response` argument. +Both dot notation and bracket notations (with single quotes ( `'`)) are interchangeable. +This means that both these string templates will evaluate to the same string: + +1. `"{{ options.name }}"` +2. `"{{ options['name'] }}"` + In additional to passing additional values through the kwargs argument, macros can be called from within the string interpolation. For example, `"{{ max(2, 3) }}" -> 3` From 6575c9db07d70442846259ad9395d19bfc45a61c Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 16:06:40 -0700 Subject: [PATCH 36/92] Clarify how check works --- .../config-based/tutorial/6-testing.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 284ec7420123..5307ec730c38 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -24,6 +24,16 @@ and `integration_tests/abnormal_state.json` with ``` +The `check` operation tries to read from the streams passed in the `stream_names` array: + +``` +check: + type: CheckStream + stream_names: ["rates"] +``` + +The operation will fail if the stream does not output at least one record. See the [connection checker](../overview.md#connection-checker) section for more details. + You can build the connector's docker image and run the acceptance tests by running the following commands: ``` @@ -38,6 +48,7 @@ airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/t ``` This test is failing because the `check` operation is succeeding even with invalid credentials. + This can be confirmed by running ``` From e747b4aee3cdb4c4c5427f99245eddcadf1162e5 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 16:14:01 -0700 Subject: [PATCH 37/92] create spec and config before updating connector definition --- .../3-connecting-to-the-API-source.md | 82 ++++++++++--------- 1 file changed, 42 insertions(+), 40 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 2dc3fcb439ad..9cb675336399 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -11,8 +11,49 @@ Over the course of this tutorial, we'll be editing a few files that were generat - `source_exchange_rates_tutorial/schemas/rates.json`: This is the [schema definition](../../cdk-python/schemas.md) for the stream we'll implement. - `source-exchange-rates/acceptance-test-config.yml`: This is the [acceptance test configuration file](../../testing-connectors/README.md). It describes the integration tests to be used to verify that the connector works as expected. -The code generator already created a boilerplate connector definition in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` +## Updating the connector spec and config +1. Let's populate the config so the connector can access the access key and base currency. + First, we'll add these properties to the connector spec in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` + +``` +documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi +connectionSpecification: + $schema: http://json-schema.org/draft-07/schema# + title: exchangeratesapi.io Source Spec + type: object + required: + - access_key + - base + additionalProperties: true + properties: + access_key: + type: string + description: >- + Your API Access Key. See here. The key is + case sensitive. + airbyte_secret: true + base: + type: string + description: >- + ISO reference currency. See here. + examples: + - EUR + - USD +``` + +2. We also need to fill in the connection config in the `secrets/config.json` + Because of the sensitive nature of the access key, we recommend storing this config in the `secrets` directory because it is ignored by git. + +``` +echo '{"access_key": "", "base": "USD"}' > secrets/config.json +``` + +## Updating the connector definition + +Next, we'll update the connector definition which was generated by the code generation script `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../connector-definition.md) sections. ``` @@ -183,45 +224,6 @@ check: stream_names: [ "rates" ] ``` -6. Let's populate the config so the connector can access the access key and base currency. - First, we'll add these properties to the connector spec in - `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` - -``` -documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi -connectionSpecification: - $schema: http://json-schema.org/draft-07/schema# - title: exchangeratesapi.io Source Spec - type: object - required: - - access_key - - base - additionalProperties: true - properties: - access_key: - type: string - description: >- - Your API Access Key. See here. The key is - case sensitive. - airbyte_secret: true - base: - type: string - description: >- - ISO reference currency. See here. - examples: - - EUR - - USD -``` - -7. We also need to fill in the connection config in the `secrets/config.json` - Because of the sensitive nature of the access key, we recommend storing this config in the `secrets` directory because it is ignored by git. - -``` -echo '{"access_key": "", "base": "USD"}' > secrets/config.json -``` - We can now run the `check` operation, which verifies the connector can connect to the API source. ``` From d5ac31d9dee2616a5ab008232ecc7d0b50f139c1 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 16:17:46 -0700 Subject: [PATCH 38/92] clarify what now_local() is --- .../config-based/tutorial/5-incremental-reads.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index c107e5a5d09d..af25cb66bacf 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -101,7 +101,8 @@ stream_slicer: ``` and refer to it in the stream's retriever. -This will generate slices from the start date until the current date, where each slice is exactly one day. +This will generate slices from the start time until the end time, where each slice is exactly one day. +The start time is defined in the config file, while the end time is defined by the `now_local()` macro, which will evaluate to the current date in the current timezone at runtime. See the section on [string interpolation](../connector-definition.md#string-interpolation) for more details. Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: From fdce2c6768597001630ec71ccf79c3eb80648e6f Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 16:19:12 -0700 Subject: [PATCH 39/92] rename to yaml structure --- docs/connector-development/config-based/overview.md | 2 +- docs/connector-development/config-based/request-options.md | 2 +- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- .../config-based/tutorial/5-incremental-reads.md | 2 +- .../config-based/{connector-definition.md => yaml-structure.md} | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) rename docs/connector-development/config-based/{connector-definition.md => yaml-structure.md} (98%) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 81117e191801..122cf177524b 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -9,7 +9,7 @@ Config-based connectors work by parsing a YAML configuration describing the Sour The process then submits HTTP requests to the API endpoint, and extracts records out of the response. -See the [connector definition section](connector-definition.md) for more information on the YAML file describing the connector. +See the [connector definition section](yaml-structure.md) for more information on the YAML file describing the connector. ## Source diff --git a/docs/connector-development/config-based/request-options.md b/docs/connector-development/config-based/request-options.md index ccf3611cf558..8877340a2c6e 100644 --- a/docs/connector-development/config-based/request-options.md +++ b/docs/connector-development/config-based/request-options.md @@ -35,7 +35,7 @@ requester: key: value ``` -In addition to $options, the provider can also access the following arguments for [string interpolation](connector-definition.md#string-interpolation): +In addition to $options, the provider can also access the following arguments for [string interpolation](yaml-structure.md#string-interpolation): - stream_slice - stream_state diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 9cb675336399..cdf14664b15f 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -54,7 +54,7 @@ echo '{"access_key": "", "base": "USD"}' > secrets/config.json ## Updating the connector definition Next, we'll update the connector definition which was generated by the code generation script `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` -More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../connector-definition.md) sections. +More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../yaml-structure.md) sections. ``` schema_loader: diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index af25cb66bacf..82d2a4262b80 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -102,7 +102,7 @@ stream_slicer: and refer to it in the stream's retriever. This will generate slices from the start time until the end time, where each slice is exactly one day. -The start time is defined in the config file, while the end time is defined by the `now_local()` macro, which will evaluate to the current date in the current timezone at runtime. See the section on [string interpolation](../connector-definition.md#string-interpolation) for more details. +The start time is defined in the config file, while the end time is defined by the `now_local()` macro, which will evaluate to the current date in the current timezone at runtime. See the section on [string interpolation](../yaml-structure.md#string-interpolation) for more details. Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: diff --git a/docs/connector-development/config-based/connector-definition.md b/docs/connector-development/config-based/yaml-structure.md similarity index 98% rename from docs/connector-development/config-based/connector-definition.md rename to docs/connector-development/config-based/yaml-structure.md index fd70d809e258..47a7dbbddd23 100644 --- a/docs/connector-development/config-based/connector-definition.md +++ b/docs/connector-development/config-based/yaml-structure.md @@ -240,7 +240,7 @@ If the input string is a raw string, the interpolated string will be the same. `"hello world" -> "hello world"` The engine will evaluate the content passed within `{{...}}`, interpolating the keys from context-specific arguments. -the "options" keyword [see ($options)](connector-definition.md#object-instantiation) can be referenced. +the "options" keyword [see ($options)](yaml-structure.md#object-instantiation) can be referenced. For example, some_object.inner_object.key will evaluate to "Hello airbyte" at runtime. From 198b4218d3ed0fa3663b3d4a54a7e82e87ee4d5c Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 17:43:34 -0700 Subject: [PATCH 40/92] Go through tutorial and update end of section code samples --- .../config-based/tutorial/1-create-source.md | 3 +- .../tutorial/2-install-dependencies.md | 2 +- .../3-connecting-to-the-API-source.md | 74 +++------ .../config-based/tutorial/4-reading-data.md | 12 +- .../tutorial/5-incremental-reads.md | 27 ++-- .../config-based/tutorial/6-testing.md | 140 +----------------- 6 files changed, 43 insertions(+), 215 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index c8c3ed28e9d7..e2fac304f0cf 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -1,9 +1,10 @@ # Step 1: Generate the source connector project locally -Let's start by cloning the Airbyte repository +Let's start by cloning the Airbyte repository: ``` git clone git@github.com:airbytehq/airbyte.git +cd airbyte ``` Airbyte provides a code generator which bootstraps the scaffolding for our connector. diff --git a/docs/connector-development/config-based/tutorial/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md index d5224c9d0a81..795ccb57afb6 100644 --- a/docs/connector-development/config-based/tutorial/2-install-dependencies.md +++ b/docs/connector-development/config-based/tutorial/2-install-dependencies.md @@ -14,7 +14,7 @@ source .venv/bin/activate pip install -r requirements.txt ``` -These steps create an initial python environment (using `python -m venv`), and install the dependencies required to run an API Source connector (using `pip install`). +These steps create an initial python environment, and install the dependencies required to run an API Source connector. Let's verify everything works as expected by running the Airbyte `spec` operation: diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index cdf14664b15f..18cea8aef3df 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -6,8 +6,13 @@ Over the course of this tutorial, we'll be editing a few files that were generat - `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml`: This is the [spec file](../../connector-specification-reference.md). It describes the inputs used to configure the connector. - `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: This is the connector definition. It describes how the data should be read from the API source. - We'll also be creating the following files: +- `source-exchange_rates-tutorial/integration_tests/configured_catalog.json`: This is the connector's [catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md). It describes what data is available in a source +- `source-exchange-rates-tutorial/integration_tests/sample_state.json`: Sample state object to be used to test [incremental syncs](../../cdk-python/incremental-stream.md). + +We'll also be creating the following files: + - `source-exchange-rates-tutorial/secrets/config.json`: This is the configuration file we'll be using to test the connector. It's schema should match the schema defined in the spec file. +- `source-exchange-rates-tutorial/secrets/invalid_config.json`: This is an invalid configuration file we'll be using to test the connector. It's schema should match the schema defined in the spec file. - `source_exchange_rates_tutorial/schemas/rates.json`: This is the [schema definition](../../cdk-python/schemas.md) for the stream we'll implement. - `source-exchange-rates/acceptance-test-config.yml`: This is the [acceptance test configuration file](../../testing-connectors/README.md). It describes the integration tests to be used to verify that the connector works as expected. @@ -56,51 +61,6 @@ echo '{"access_key": "", "base": "USD"}' > secrets/config.json Next, we'll update the connector definition which was generated by the code generation script `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../yaml-structure.md) sections. -``` -schema_loader: - type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" -selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "_" -requester: - type: HttpRequester - name: "{{ options['name'] }}" - url_base: TODO "your_api_base_url" - http_method: "GET" - authenticator: - type: TokenAuthenticator - token: "{{ config['api_key'] }}" -retriever: - type: SimpleRetriever - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - record_selector: - $ref: "*ref(selector)" - paginator: - type: NoPagination -customers_stream: - type: DeclarativeStream - $options: - name: "customers" - primary_key: "id" - schema_loader: - $ref: "*ref(schema_loader)" - retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: TODO "your_endpoint_path" -streams: - - "*ref(customers_stream)" -check: - type: CheckStream - stream_names: ["customers_stream"] - -``` - Let's fill this out these TODOs with the information found in the [Exchange Rates API docs](https://exchangeratesapi.io/documentation/) 1. First, let's rename the stream from `customers` to `rates`, and update the primary key to `date` @@ -152,8 +112,8 @@ rates_stream: ``` 4. Next, we'll set up the authentication. - The Exchange Rates API requires an access key to be passed as request parameter. We'll need to make this access key accessible to our connector, and pass it as a request_parameter in the `request_parameters` field of the `request_options_provider` - We'll configure the connector to use this access key by setting the access key in a request parameter and pointing to a field in the config, which we'll populate in the next step: + The Exchange Rates API requires an access key to be passed as header named "apikey". + This can be done using an `ApiKeyAuthenticator`, which we'll configure to point to the config's `access_key` field. ``` requester: @@ -166,8 +126,6 @@ requester: access_key: "{{ config.access_key }}" ``` -Since the access key is set directly as a request parameter, we can remove the `authenticator` field from the `requester`. - 5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter. Let's assume the user will configure this via the connector configuration in parameter called `base`; we'll pass the value input by the user as a request parameter: ``` @@ -182,7 +140,7 @@ The full connection definition should now look like ``` schema_loader: type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" selector: type: RecordSelector extractor: @@ -191,14 +149,18 @@ selector: requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" request_options_provider: request_parameters: - access_key: "{{ config.access_key }}" base: "{{ config.base }}" retriever: type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" # Only change the url_base field name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: @@ -209,19 +171,19 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - primary_key: "date" + primary_key: "date" schema_loader: $ref: "*ref(schema_loader)" retriever: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: "/latest" + path: "/exchangerates_data/latest" streams: - "*ref(rates_stream)" check: type: CheckStream - stream_names: [ "rates" ] + stream_names: ["rates"] ``` We can now run the `check` operation, which verifies the connector can connect to the API source. diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 31d1364ad7ef..aaa5bb1e6669 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -56,7 +56,7 @@ Here is the complete connector definition for convenience: ``` schema_loader: type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" selector: type: RecordSelector extractor: @@ -65,14 +65,18 @@ selector: requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" request_options_provider: request_parameters: - access_key: "{{ config.access_key }}" base: "{{ config.base }}" retriever: type: SimpleRetriever + $options: + url_base: "https://api.exchangeratesapi.io/v1/" # Only change the url_base field name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: @@ -119,7 +123,7 @@ The `--debug` flag can be set to print out debug information, including the outg We now have a working implementation of a connector reading the latest exchange rates for a given currency. We're however limited to only reading the latest exchange rate value. -Next, we'll ([enhance the connector to read data for a given date, which will enable us to backfill the stream with historical data.](5-incremental-reads.md) +Next, we'll [enhance the connector to read data for a given date, which will enable us to backfill the stream with historical data](5-incremental-reads.md). ## More readings diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index 82d2a4262b80..255ea9c11b86 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -126,10 +126,7 @@ And we'll update the path to point to the `stream_slice`'s start_date ``` requester: ref: "*ref(requester)" - path: - type: "InterpolatedString" - string: "{{ stream_slice.start_date }}" - default: "/latest" + path: "{{ stream_slice.start_date or 'latest' }}" ``` The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: @@ -137,7 +134,7 @@ The full connector definition should now look like `./source_exchange_rates_tuto ``` schema_loader: type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" + file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" selector: type: RecordSelector extractor: @@ -146,11 +143,13 @@ selector: requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" request_options_provider: request_parameters: - access_key: "{{ config.access_key }}" base: "{{ config.base }}" stream_slicer: type: "DatetimeStreamSlicer" @@ -162,21 +161,24 @@ stream_slicer: datetime_format: "%Y-%m-%d %H:%M:%S.%f" step: "1d" datetime_format: "%Y-%m-%d" + cursor_field: "{{ options.stream_cursor_field }}" retriever: type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" # Only change the url_base field name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" + stream_slicer: + $ref: "*ref(stream_slicer)" record_selector: $ref: "*ref(selector)" paginator: type: NoPagination - stream_slicer: - $ref: "*ref(stream_slicer)" rates_stream: type: DeclarativeStream $options: name: "rates" - cursor_field: "date" + stream_cursor_field: "date" primary_key: "date" schema_loader: $ref: "*ref(schema_loader)" @@ -184,10 +186,7 @@ rates_stream: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: - type: "InterpolatedString" - string: "{{ stream_slice.start_date }}" - default: "/latest" + path: "/exchangerates_data/{{ stream_slice.start_time or 'latest' }}" streams: - "*ref(rates_stream)" check: diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 5307ec730c38..6c1ce22758d5 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -24,151 +24,13 @@ and `integration_tests/abnormal_state.json` with ``` -The `check` operation tries to read from the streams passed in the `stream_names` array: - -``` -check: - type: CheckStream - stream_names: ["rates"] -``` - -The operation will fail if the stream does not output at least one record. See the [connection checker](../overview.md#connection-checker) section for more details. - -You can build the connector's docker image and run the acceptance tests by running the following commands: - -``` -docker build . -t airbyte/source-exchange-rates-tutorial:dev -python -m pytest integration_tests -p integration_tests.acceptance -``` - -1 test should be failing - -``` -airbyte-integrations/bases/source-acceptance-test/source_acceptance_test/tests/test_core.py:183 TestConnection.test_check[inputs1] -``` - -This test is failing because the `check` operation is succeeding even with invalid credentials. - -This can be confirmed by running - -``` -python main.py check --config integration_tests/invalid_config.json -``` - -The `--debug` flag can be used to inspect the response: - -``` -python main.py check --debug --config integration_tests/invalid_config.json -``` - -You should see a message similar to this one: - -``` -{"type": "DEBUG", "message": "Receiving response", "data": {"headers": "{'Date': 'Thu, 28 Jul 2022 17:56:31 GMT', 'Content-Type': 'application/json; Charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'access-control-allow-methods': 'GET, HEAD, POST, PUT, PATCH, DELETE, OPTIONS', 'access-control-allow-origin': '*', 'x-blocked-at-loadbalancer': '1', 'CF-Cache-Status': 'DYNAMIC', 'Expect-CT': 'max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"', 'Report-To': '{\"endpoints\":[{\"url\":\"https:\\\\/\\\\/a.nel.cloudflare.com\\\\/report\\\\/v3?s=MpyuXqiuxH%2FEA1%2F75CQiP4bPOt0DKeg9utWdBShkseCK9f4G8R9K126fe65nIvsKWQVGMTou%2BeTRCq%2FCzgoxr2B1BT%2Bm3l6i0DFDu5sYAqHAWzd9pSoqJZ6jktjQgB5D%2BqG7jQvhIDnK\"}],\"group\":\"cf-nel\",\"max_age\":604800}', 'NEL': '{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}', 'Server': 'cloudflare', 'CF-RAY': '731f7df109709e68-SJC', 'Content-Encoding': 'gzip'}", "status": "200", "body": "{\n \"success\": false,\n \"error\": {\n \"code\": 101,\n \"type\": \"invalid_access_key\",\n \"info\": \"You have not supplied a valid API Access Key. [Technical Support: support@apilayer.com]\"\n }\n}\n"}} -``` - -The endpoint is returning a 200 HTTP response, but the message contains an error, which our connector isn't handling. - -This can be fixed by adding an error handler to the requester: - -``` -requester: - type: HttpRequester - name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" - http_method: "GET" - request_options_provider: - request_parameters: - access_key: "{{ config.access_key }}" - base: "{{ config.base }}" - error_handler: - response_filters: - - action: FAIL - predicate: "{{ 'error' in response }}" -``` - -The `check` operation should now fail - -``` -python main.py check --debug --config integration_tests/invalid_config.json -``` - -and the acceptance tests should pass +You can run the acceptance tests with the following commands: ``` docker build . -t airbyte/source-exchange-rates-tutorial:dev python -m pytest integration_tests -p integration_tests.acceptance ``` -Here is the full connector definition for reference: - -``` -schema_loader: - class_name: airbyte_cdk.sources.declarative.schema.json_schema.JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options.name }}.json" -selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "[_]" -requester: - type: HttpRequester - name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" - http_method: "GET" - request_options_provider: - request_parameters: - access_key: "{{ config.access_key }}" - base: "{{ config.base }}" -stream_slicer: - type: "DatetimeStreamSlicer" - start_datetime: - datetime: "{{ config.start_date }}" - datetime_format: "%Y-%m-%d" - end_datetime: - datetime: "{{ now_local() }}" - datetime_format: "%Y-%m-%d %H:%M:%S.%f" - step: "1d" - datetime_format: "%Y-%m-%d" - cursor_field: "{{ options.cursor_field }}" -retriever: - type: SimpleRetriever - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - record_selector: - ref: "*ref(selector)" - paginator: - type: NoPagination -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - cursor_field: "date" - primary_key: "date" - schema_loader: - ref: "*ref(schema_loader)" - retriever: - ref: "*ref(retriever)" - stream_slicer: - ref: "*ref(stream_slicer)" - requester: - ref: "*ref(requester)" - path: - type: "InterpolatedString" - string: "{{ stream_slice.start_date }}" - default: "/latest" - error_handler: - response_filters: - - predicate: "{{'error' in response}}" - action: FAIL -streams: - - "*ref(rates_stream)" -check: - type: CheckStream - stream_names: ["rates"] - -``` - ## Next steps: Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). From 18bc40fbd0a6202c612354a092c8abe443e17410 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 17:44:32 -0700 Subject: [PATCH 41/92] fix link --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 18cea8aef3df..bf66387208d6 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -205,7 +205,7 @@ Next, we'll [extract the records from the response](4-reading-data.md) ## More readings -- +- [Connector definition YAML file](../yaml-structure.md) - [Config-based connectors overview](../overview.md) - [Authentication](../authentication.md) - [Request options providers](../request-options.md) \ No newline at end of file From f4e5ed4a19ebb86ec5381fee5bcb2acc1128911d Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 18:48:42 -0700 Subject: [PATCH 42/92] update --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index bf66387208d6..3c387914e420 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -70,7 +70,7 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - primary_key: "date" + primary_key: "date" ``` and update the references in the streams list and check block From 83e3845a0ab0a1ce5082153cbb29a6c9f6d1b80e Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 18:54:37 -0700 Subject: [PATCH 43/92] update code samples --- .../3-connecting-to-the-API-source.md | 36 +++++++++++-------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 3c387914e420..73bb03e2ca38 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -85,13 +85,12 @@ check: 2. Next we'll set the base url. According to the API documentation, the base url is `"https://api.exchangeratesapi.io/v1/"`. - This can be set in the requester definition. ``` -requester: - type: HttpRequester - name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" # Only change the url_base field +retriever: + type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" # Only change the url_base field ``` 3. We can fetch the latest data by submitting a request to "/latest". This path is specific to the stream, so we'll set within the `rates_stream` definition. @@ -101,14 +100,14 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - primary_key: "date" + primary_key: "date" schema_loader: $ref: "*ref(schema_loader)" retriever: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: "/latest" + path: "/exchangerates_data/latest" ``` 4. Next, we'll set up the authentication. @@ -119,20 +118,27 @@ rates_stream: requester: type: HttpRequester name: "{{ options['name'] }}" - url_base: "https://api.exchangeratesapi.io/v1/" http_method: "GET" - request_options_provider: - request_parameters: - access_key: "{{ config.access_key }}" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" ``` 5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter. Let's assume the user will configure this via the connector configuration in parameter called `base`; we'll pass the value input by the user as a request parameter: ``` -request_options_provider: - request_parameters: - access_key: "{{ config.access_key }}" - base: "{{ config.base }}" +requester: + type: HttpRequester + name: "{{ options['name'] }}" + http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" + request_options_provider: + request_parameters: + base: "{{ config.base }}" ``` The full connection definition should now look like From bab017bd97ad83131067953949dee4d565ff0643 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 18:58:13 -0700 Subject: [PATCH 44/92] Update code samples --- .../tutorial/5-incremental-reads.md | 59 +++++++++++++------ 1 file changed, 41 insertions(+), 18 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index 255ea9c11b86..f407a2c72060 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -62,13 +62,19 @@ And we'll update the `path` in the connector definition to point to `/{{ config. Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/latest`: ``` -retriever: - requester: - $ref: "*ref(requester)" - path: - type: "InterpolatedString" - string: "/{{ config.start_date }}" - default: "/latest" +rates_stream: + type: DeclarativeStream + $options: + name: "rates" + stream_cursor_field: "date" + primary_key: "date" + schema_loader: + $ref: "*ref(schema_loader)" + retriever: + $ref: "*ref(retriever)" + requester: + $ref: "*ref(requester)" + path: "/exchangerates_data/{{ stream_slice.start_time or 'latest' }}" ``` You can test these changes by executing the `read` operation: @@ -98,6 +104,7 @@ stream_slicer: datetime_format: "%Y-%m-%d %H:%M:%S.%f" step: "1d" datetime_format: "%Y-%m-%d" + cursor_field: "{{ options.stream_cursor_field }}" ``` and refer to it in the stream's retriever. @@ -111,22 +118,38 @@ rates_stream: type: DeclarativeStream $options: name: "rates" - cursor_field: "date + stream_cursor_field: "date" +``` + +We'll also update the retriever to user the stream slicer: + +``` +retriever: + type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" # Only change the url_base field + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + stream_slicer: + $ref: "*ref(stream_slicer)" +``` + +Finally, we'll update the path to point to the `stream_slice`'s start_date + +``` +rates_stream: + type: DeclarativeStream + $options: + name: "rates" + stream_cursor_field: "date" primary_key: "date" schema_loader: $ref: "*ref(schema_loader)" retriever: $ref: "*ref(retriever)" - stream_slicer: - $ref: "*ref(stream_slicer)" -``` - -And we'll update the path to point to the `stream_slice`'s start_date - -``` -requester: - ref: "*ref(requester)" - path: "{{ stream_slice.start_date or 'latest' }}" + requester: + $ref: "*ref(requester)" + path: "/exchangerates_data/{{ stream_slice.start_time or 'latest' }}" ``` The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: From 37d1fded1986fc5c561fec58a373bb60178f24bf Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Mon, 8 Aug 2022 19:03:38 -0700 Subject: [PATCH 45/92] Update to bracket notation --- .../config-based/authentication.md | 4 ++-- .../config-based/pagination.md | 4 ++-- .../config-based/record-selector.md | 4 ++-- .../config-based/tutorial/4-reading-data.md | 2 +- .../config-based/tutorial/5-incremental-reads.md | 14 +++++++------- .../config-based/yaml-structure.md | 2 +- 6 files changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/connector-development/config-based/authentication.md b/docs/connector-development/config-based/authentication.md index a856a325696b..d78734f3fd42 100644 --- a/docs/connector-development/config-based/authentication.md +++ b/docs/connector-development/config-based/authentication.md @@ -67,7 +67,7 @@ OAuth authentication is supported through the `OAuthAuthenticator`, which requir authenticator: type: "OAuthAuthenticator" token_refresh_endpoint: "https://api.searchmetrics.com/v4/token" - client_id: "{{ config.api_key }}" - client_secret: "{{ config.client_secret }}" + client_id: "{{ config['api_key'] }}" + client_secret: "{{ config['client_secret'] }}" refresh_token: "" ``` \ No newline at end of file diff --git a/docs/connector-development/config-based/pagination.md b/docs/connector-development/config-based/pagination.md index cecc3b687b1e..d5966793a21c 100644 --- a/docs/connector-development/config-based/pagination.md +++ b/docs/connector-development/config-based/pagination.md @@ -91,7 +91,7 @@ paginator: <...> pagination_strategy: type: "CursorPaginationStrategy" - cursor_value: "{{ last_records[-1].id }}" + cursor_value: "{{ last_records[-1]['id'] }}" page_token: field_name: "from" inject_into: "request_parameter" @@ -110,7 +110,7 @@ paginator: <...> pagination_strategy: type: "CursorPaginationStrategy" - cursor_value: "{{ headers.urls.next }}" + cursor_value: "{{ headers['urls']['next'] }}" page_token: inject_into: "path" ``` diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index a3879aa88069..45b75d1483cb 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -39,7 +39,7 @@ selector: extractor: transform: "[_]" record_filter: - condition: "{{ record.created_at < stream_slice.start_time }}" + condition: "{{ record['created_at'] < stream_slice['start_time'] }}" ``` ## Transformations @@ -70,7 +70,7 @@ stream: - type: AddFields fields: - path: ["start_date"] - value: {{ stream_slice.start_date }} + value: {{ stream_slice['start_date'] }} ``` Fields can also be added in a nested object by writing the fields' path as a list. diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index aaa5bb1e6669..8eb653bf2caa 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -72,7 +72,7 @@ requester: api_token: "{{ config['access_key'] }}" request_options_provider: request_parameters: - base: "{{ config.base }}" + base: "{{ config['base'] }}" retriever: type: SimpleRetriever $options: diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index f407a2c72060..9f147f7bd0e7 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -74,7 +74,7 @@ rates_stream: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: "/exchangerates_data/{{ stream_slice.start_time or 'latest' }}" + path: "/exchangerates_data/{{ stream_slice['start_time'] or 'latest' }}" ``` You can test these changes by executing the `read` operation: @@ -97,14 +97,14 @@ Let's first define a stream slicer at the top level of the connector definition: stream_slicer: type: "DatetimeStreamSlicer" start_datetime: - datetime: "{{ config.start_date }}" + datetime: "{{ config['start_date'] }}" datetime_format: "%Y-%m-%d" end_datetime: datetime: "{{ now_local() }}" datetime_format: "%Y-%m-%d %H:%M:%S.%f" step: "1d" datetime_format: "%Y-%m-%d" - cursor_field: "{{ options.stream_cursor_field }}" + cursor_field: "{{ options['stream_cursor_field'] }}" ``` and refer to it in the stream's retriever. @@ -149,7 +149,7 @@ rates_stream: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: "/exchangerates_data/{{ stream_slice.start_time or 'latest' }}" + path: "/exchangerates_data/{{ stream_slice['start_time'] or 'latest' }}" ``` The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: @@ -177,14 +177,14 @@ requester: stream_slicer: type: "DatetimeStreamSlicer" start_datetime: - datetime: "{{ config.start_date }}" + datetime: "{{ config['start_date'] }}" datetime_format: "%Y-%m-%d" end_datetime: datetime: "{{ now_local() }}" datetime_format: "%Y-%m-%d %H:%M:%S.%f" step: "1d" datetime_format: "%Y-%m-%d" - cursor_field: "{{ options.stream_cursor_field }}" + cursor_field: "{{ options['stream_cursor_field'] }}" retriever: type: SimpleRetriever $options: @@ -209,7 +209,7 @@ rates_stream: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: "/exchangerates_data/{{ stream_slice.start_time or 'latest' }}" + path: "/exchangerates_data/{{ stream_slice['start_time'] or 'latest' }}" streams: - "*ref(rates_stream)" check: diff --git a/docs/connector-development/config-based/yaml-structure.md b/docs/connector-development/config-based/yaml-structure.md index 47a7dbbddd23..c5d6fa758e19 100644 --- a/docs/connector-development/config-based/yaml-structure.md +++ b/docs/connector-development/config-based/yaml-structure.md @@ -124,7 +124,7 @@ outer: $options: MyKey: MyValue inner: - k2: "MyKey is {{ options.MyKey }}" + k2: "MyKey is {{ options['MyKey'] }}" ``` In this example, outer.inner.k2 will evaluate to "MyKey is MyValue" From 1fc83fe6c7b03f14dc9a23cdca11e5853b1c6a91 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 08:59:01 -0700 Subject: [PATCH 46/92] remove superfluous comments --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- .../config-based/tutorial/4-reading-data.md | 2 +- .../config-based/tutorial/5-incremental-reads.md | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 73bb03e2ca38..9a9e8dd1a7fb 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -166,7 +166,7 @@ requester: retriever: type: SimpleRetriever $options: - url_base: "https://api.apilayer.com" # Only change the url_base field + url_base: "https://api.apilayer.com" name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 8eb653bf2caa..ceeb94c299ae 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -76,7 +76,7 @@ requester: retriever: type: SimpleRetriever $options: - url_base: "https://api.exchangeratesapi.io/v1/" # Only change the url_base field + url_base: "https://api.exchangeratesapi.io/v1/" name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" record_selector: diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index 9f147f7bd0e7..ff3f2e682228 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -188,7 +188,7 @@ stream_slicer: retriever: type: SimpleRetriever $options: - url_base: "https://api.apilayer.com" # Only change the url_base field + url_base: "https://api.apilayer.com" name: "{{ options['name'] }}" primary_key: "{{ options['primary_key'] }}" stream_slicer: From 6944916035657e20dcba39a0d55344e3be8c77fd Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:42:15 -0700 Subject: [PATCH 47/92] Update docs/connector-development/config-based/tutorial/2-install-dependencies.md Co-authored-by: Augustin --- .../config-based/tutorial/2-install-dependencies.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md index 795ccb57afb6..cfed3655c86b 100644 --- a/docs/connector-development/config-based/tutorial/2-install-dependencies.md +++ b/docs/connector-development/config-based/tutorial/2-install-dependencies.md @@ -18,11 +18,9 @@ These steps create an initial python environment, and install the dependencies r Let's verify everything works as expected by running the Airbyte `spec` operation: -``` +```bash python main.py spec -``` - You should see an output similar to the one below: ``` From c317a496e75a557469cfdafdea6f4ea2dc8bb382 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:42:46 -0700 Subject: [PATCH 48/92] Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 9a9e8dd1a7fb..e46b6beb6ea5 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -18,8 +18,9 @@ We'll also be creating the following files: ## Updating the connector spec and config -1. Let's populate the config so the connector can access the access key and base currency. - First, we'll add these properties to the connector spec in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` +Let's populate the specification (`spec.yaml`) the configuration (`secrets/config.json), so the connector can access the access key and base currency. + +1. We'll add these properties to the connector spec in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` ``` documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi From 096a370e04bdf67e6bddb42158f1e393fea1e8e7 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:43:22 -0700 Subject: [PATCH 49/92] Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index e46b6beb6ea5..ded59f706e4f 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -59,7 +59,7 @@ echo '{"access_key": "", "base": "USD"}' > secrets/config.json ## Updating the connector definition -Next, we'll update the connector definition which was generated by the code generation script `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml` +Next, we'll update the connector definition (`source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`). It was generated by the code generation script. More details on the connector definition file can be found in the [overview](../overview.md) and [connection definition](../yaml-structure.md) sections. Let's fill this out these TODOs with the information found in the [Exchange Rates API docs](https://exchangeratesapi.io/documentation/) From ff804bed3fd8aee00d3a31d150b950868fd5aa85 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:43:52 -0700 Subject: [PATCH 50/92] Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index ded59f706e4f..e7577e5370a3 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -74,7 +74,7 @@ rates_stream: primary_key: "date" ``` -and update the references in the streams list and check block +and update the references in the `streams` list and `check` block ``` streams: From 49be0312c8d75aea929ef854810dc71aa15c0ba4 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:47:05 -0700 Subject: [PATCH 51/92] Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index e7577e5370a3..dbb553406389 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -94,7 +94,7 @@ retriever: url_base: "https://api.apilayer.com" # Only change the url_base field ``` -3. We can fetch the latest data by submitting a request to "/latest". This path is specific to the stream, so we'll set within the `rates_stream` definition. +3. We can fetch the latest data by submitting a request to the `/latest` API endpoint. This path is specific to the stream, so we'll set it within the `rates_stream` definition, at the `retriever` level. ``` rates_stream: From 67904223f13e3d0aaab62ba74adde0ba097d1b9f Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:47:27 -0700 Subject: [PATCH 52/92] Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index dbb553406389..e8e8ee785edb 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -139,7 +139,7 @@ requester: api_token: "{{ config['access_key'] }}" request_options_provider: request_parameters: - base: "{{ config.base }}" + base: "{{ config['base'] }}" ``` The full connection definition should now look like From 34214b442b5e8c1b16645ede582582242a96b6bf Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:48:36 -0700 Subject: [PATCH 53/92] Update docs/connector-development/config-based/tutorial/4-reading-data.md Co-authored-by: Augustin --- .../config-based/tutorial/4-reading-data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index ceeb94c299ae..18f63b72af09 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -22,7 +22,7 @@ Let's first add the stream to the configured catalog in `source-exchange_rates-t ``` The configured catalog declares the sync modes supported by the stream \(full refresh or incremental\). -See the [catalog tutorial](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) for more information. +See the [catalog guide](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) for more information. Let's define the stream schema in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/schemas/rates.json` From bf9a20553b82eccc9778855de9e7f4d776a19b09 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 11:52:02 -0700 Subject: [PATCH 54/92] fix path --- .../config-based/tutorial/4-reading-data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 18f63b72af09..62f46ae74a70 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -94,7 +94,7 @@ rates_stream: $ref: "*ref(retriever)" requester: $ref: "*ref(requester)" - path: "/latest" + path: "/exchangerates_data/latest" streams: - "*ref(rates_stream)" check: From 74e4de8eb36c0bc76f7f7cf053e7bb0dea3d9413 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 12:01:27 -0700 Subject: [PATCH 55/92] update --- .../config-based/tutorial/3-connecting-to-the-API-source.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index e8e8ee785edb..4892fa2a46d0 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -84,6 +84,9 @@ check: stream_names: ["rates"] ``` +Adding the reference in `streams` will make the stream available to the `Source`. +Adding the reference in the `check` tells the `check` operation to use that stream to test the connection. + 2. Next we'll set the base url. According to the API documentation, the base url is `"https://api.exchangeratesapi.io/v1/"`. From ca0f93ced9eeae10d0171256269c3f9682a8c692 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Tue, 9 Aug 2022 15:46:09 -0700 Subject: [PATCH 56/92] motivation blurp --- .../config-based/overview.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 122cf177524b..88d835cb692a 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -2,9 +2,24 @@ The goal of this document is to give enough technical specifics to understand how config-based connectors work. When you're ready to start building a connector, you can start with [the tutorial](../../../config-based/tutorial/0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) +See the [motivation section](./motivation.md) for more information on the motivation driving this framework. ## Overview +In building over 100 API connectors, we observed that most API connectors are implemented in a formulaic approach: + +1. Implement the API's authentication mechanism +2. Describe the schema of the data returned by the API +3. Make requests to the API's URL containing the data of interest +4. Implement pagination strategy +5. Implement error handling and rate limiting +6. Decode the data returned from the API +7. Keep track of what data was already synced. + +Each of these problems have a finite number of solutions. For instance, most APIs use one of 3 standard pagination mechanism. + +The CDK's config-based interface uses a declarative approach to the problem and allows the developers to specify __what__ data they want to read from a source and abstracts the specifics of the __how__. + Config-based connectors work by parsing a YAML configuration describing the Source, then running the configured connector using a Python backend. The process then submits HTTP requests to the API endpoint, and extracts records out of the response. From 9cfd2236cde713fced04b7b804a3a72edafbfb57 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 14:30:44 -0700 Subject: [PATCH 57/92] warning --- docs/connector-development/config-based/overview.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 88d835cb692a..9ca2b91a4d26 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -1,5 +1,7 @@ # Config-based connectors overview +:warning: This framework is in alpha stage. Support is not in production and is available only to selected users. + The goal of this document is to give enough technical specifics to understand how config-based connectors work. When you're ready to start building a connector, you can start with [the tutorial](../../../config-based/tutorial/0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) See the [motivation section](./motivation.md) for more information on the motivation driving this framework. From 65a966cec6f24cdbac6d16a895355dae893994a0 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 14:31:54 -0700 Subject: [PATCH 58/92] warning --- .../config-based/tutorial/0-getting-started.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index cd1c19da9895..0decb28b52c9 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -1,5 +1,7 @@ # Getting Started +:warning: This framework is in alpha stage. Support is not in production and is available only to selected users. + ## Summary Throughout this tutorial, we'll walk you through the creation an Airbyte source to read and extract data from an HTTP API. From dd4437c0be46d44de1bd19e5b59081cfa935d5ad Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 14:42:18 -0700 Subject: [PATCH 59/92] fix code block --- .../config-based/tutorial/2-install-dependencies.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/connector-development/config-based/tutorial/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md index cfed3655c86b..11691bb08bc5 100644 --- a/docs/connector-development/config-based/tutorial/2-install-dependencies.md +++ b/docs/connector-development/config-based/tutorial/2-install-dependencies.md @@ -20,6 +20,7 @@ Let's verify everything works as expected by running the Airbyte `spec` operatio ```bash python main.py spec +``` You should see an output similar to the one below: From 365c0dcc055617602141d9d72c46269136078ffa Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 14:53:04 -0700 Subject: [PATCH 60/92] update code samples --- .../3-connecting-to-the-API-source.md | 167 +++++++++--------- 1 file changed, 83 insertions(+), 84 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 4892fa2a46d0..ce6aaf64b29b 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -67,51 +67,50 @@ Let's fill this out these TODOs with the information found in the [Exchange Rate 1. First, let's rename the stream from `customers` to `rates`, and update the primary key to `date` ``` -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - primary_key: "date" +streams: + - type: DeclarativeStream + $options: + name: "rates" + primary_key: "date" ``` -and update the references in the `streams` list and `check` block +and update the references in the `check` block ``` -streams: - - "*ref(rates_stream)" check: type: CheckStream stream_names: ["rates"] ``` -Adding the reference in `streams` will make the stream available to the `Source`. Adding the reference in the `check` tells the `check` operation to use that stream to test the connection. 2. Next we'll set the base url. According to the API documentation, the base url is `"https://api.exchangeratesapi.io/v1/"`. ``` -retriever: - type: SimpleRetriever - $options: - url_base: "https://api.apilayer.com" # Only change the url_base field +definitions: + <...> + retriever: + type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" ``` 3. We can fetch the latest data by submitting a request to the `/latest` API endpoint. This path is specific to the stream, so we'll set it within the `rates_stream` definition, at the `retriever` level. ``` -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - primary_key: "date" - schema_loader: - $ref: "*ref(schema_loader)" - retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: "/exchangerates_data/latest" +streams: + - type: DeclarativeStream + $options: + name: "rates" + primary_key: "date" + schema_loader: + $ref: "*ref(definitions.schema_loader)" + retriever: + $ref: "*ref(definitions.retriever)" + requester: + $ref: "*ref(definitions.requester)" + path: "/exchangerates_data/latest" ``` 4. Next, we'll set up the authentication. @@ -119,78 +118,78 @@ rates_stream: This can be done using an `ApiKeyAuthenticator`, which we'll configure to point to the config's `access_key` field. ``` -requester: - type: HttpRequester - name: "{{ options['name'] }}" - http_method: "GET" - authenticator: - type: ApiKeyAuthenticator - header: "apikey" - api_token: "{{ config['access_key'] }}" +definitions: + <...> + requester: + type: HttpRequester + name: "{{ options['name'] }}" + http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" ``` 5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter. Let's assume the user will configure this via the connector configuration in parameter called `base`; we'll pass the value input by the user as a request parameter: ``` -requester: - type: HttpRequester - name: "{{ options['name'] }}" - http_method: "GET" - authenticator: - type: ApiKeyAuthenticator - header: "apikey" - api_token: "{{ config['access_key'] }}" - request_options_provider: - request_parameters: - base: "{{ config['base'] }}" +definitions: + <...> + requester: + <...> + request_options_provider: + request_parameters: + base: "{{ config['base'] }}" ``` The full connection definition should now look like ``` -schema_loader: - type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" -selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "_" -requester: - type: HttpRequester - name: "{{ options['name'] }}" - http_method: "GET" - authenticator: - type: ApiKeyAuthenticator - header: "apikey" - api_token: "{{ config['access_key'] }}" - request_options_provider: - request_parameters: - base: "{{ config.base }}" -retriever: - type: SimpleRetriever - $options: - url_base: "https://api.apilayer.com" - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - record_selector: - $ref: "*ref(selector)" - paginator: - type: NoPagination -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - primary_key: "date" +version: "0.1.0" + +definitions: schema_loader: - $ref: "*ref(schema_loader)" + type: JsonSchema + file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" + selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "_" + requester: + type: HttpRequester + name: "{{ options['name'] }}" + http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" + request_options_provider: + request_parameters: + base: "{{ config['base'] }}" retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: "/exchangerates_data/latest" + type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + $ref: "*ref(definitions.selector)" + paginator: + type: NoPagination + streams: - - "*ref(rates_stream)" + - type: DeclarativeStream + $options: + name: "rates" + primary_key: "date" + schema_loader: + $ref: "*ref(definitions.schema_loader)" + retriever: + $ref: "*ref(definitions.retriever)" + requester: + $ref: "*ref(definitions.requester)" + path: "/exchangerates_data/latest" check: type: CheckStream stream_names: ["rates"] From ebaa701124bfb3f4e50b99819021b9b47f608078 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 14:54:38 -0700 Subject: [PATCH 61/92] update code sample --- .../config-based/tutorial/4-reading-data.md | 94 ++++++++++--------- 1 file changed, 49 insertions(+), 45 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 62f46ae74a70..5546af426e90 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -42,11 +42,13 @@ rm source_exchange_rates_tutorial/schemas/employees.json Next, we'll update the record selection to wrap the single record returned by the source in an array in `source_exchange_rates_tutorial/exchange_rates_tutorial.yamlsource_exchange_rates_tutorial/exchange_rates_tutorial.yaml` ``` -selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "[_]" # wrap the single record returned by the API in an array +definitions: + <...> + selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" # wrap the single record returned by the API in an array ``` The transform is defined using the `Jello` syntax, which is a Python-based JQ alternative. More details on Jello can be found [here](https://github.com/kellyjonbrazil/jello). @@ -54,49 +56,51 @@ The transform is defined using the `Jello` syntax, which is a Python-based JQ al Here is the complete connector definition for convenience: ``` -schema_loader: - type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" -selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "[_]" -requester: - type: HttpRequester - name: "{{ options['name'] }}" - http_method: "GET" - authenticator: - type: ApiKeyAuthenticator - header: "apikey" - api_token: "{{ config['access_key'] }}" - request_options_provider: - request_parameters: - base: "{{ config['base'] }}" -retriever: - type: SimpleRetriever - $options: - url_base: "https://api.exchangeratesapi.io/v1/" - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - record_selector: - $ref: "*ref(selector)" - paginator: - type: NoPagination -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - primary_key: "date" +version: "0.1.0" + +definitions: schema_loader: - $ref: "*ref(schema_loader)" + type: JsonSchema + file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" + selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" # wrap the single record returned by the API in an array + requester: + type: HttpRequester + name: "{{ options['name'] }}" + http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" + request_options_provider: + request_parameters: + base: "{{ config['base'] }}" retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: "/exchangerates_data/latest" + type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + $ref: "*ref(definitions.selector)" + paginator: + type: NoPagination + streams: - - "*ref(rates_stream)" + - type: DeclarativeStream + $options: + name: "rates" + primary_key: "rates" + schema_loader: + $ref: "*ref(definitions.schema_loader)" + retriever: + $ref: "*ref(definitions.retriever)" + requester: + $ref: "*ref(definitions.requester)" + path: "/exchangerates_data/latest" check: type: CheckStream stream_names: ["rates"] From aacc30a7175859e74072a5e23763b8ec397d7c71 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 15:03:47 -0700 Subject: [PATCH 62/92] update code samples --- .../tutorial/5-incremental-reads.md | 209 +++++++++--------- 1 file changed, 106 insertions(+), 103 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index ff3f2e682228..f029213b630d 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -62,19 +62,18 @@ And we'll update the `path` in the connector definition to point to `/{{ config. Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/latest`: ``` -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - stream_cursor_field: "date" - primary_key: "date" - schema_loader: - $ref: "*ref(schema_loader)" - retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: "/exchangerates_data/{{ stream_slice['start_time'] or 'latest' }}" +streams: + - type: DeclarativeStream + $options: + name: "rates" + primary_key: "rates" + schema_loader: + $ref: "*ref(definitions.schema_loader)" + retriever: + $ref: "*ref(definitions.retriever)" + requester: + $ref: "*ref(definitions.requester)" + path: "/exchangerates_data/{{config['start_date'] or 'latest'}}" ``` You can test these changes by executing the `read` operation: @@ -94,17 +93,22 @@ More details on the stream slicers can be found [here](./link-to-stream-slicers. Let's first define a stream slicer at the top level of the connector definition: ``` -stream_slicer: - type: "DatetimeStreamSlicer" - start_datetime: - datetime: "{{ config['start_date'] }}" +definitions: + requester: + <...> + stream_slicer: + type: "DatetimeStreamSlicer" + start_datetime: + datetime: "{{ config['start_date'] }}" + datetime_format: "%Y-%m-%d" + end_datetime: + datetime: "{{ now_local() }}" + datetime_format: "%Y-%m-%d %H:%M:%S.%f" + step: "1d" datetime_format: "%Y-%m-%d" - end_datetime: - datetime: "{{ now_local() }}" - datetime_format: "%Y-%m-%d %H:%M:%S.%f" - step: "1d" - datetime_format: "%Y-%m-%d" - cursor_field: "{{ options['stream_cursor_field'] }}" + cursor_field: "{{ options['stream_cursor_field'] }}" + retriever: + <...> ``` and refer to it in the stream's retriever. @@ -114,107 +118,106 @@ The start time is defined in the config file, while the end time is defined by t Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: ``` -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - stream_cursor_field: "date" +streams: + - type: DeclarativeStream + $options: + name: "rates" + stream_cursor_field: "date" + primary_key: "rates" + <...> ``` We'll also update the retriever to user the stream slicer: ``` -retriever: - type: SimpleRetriever - $options: - url_base: "https://api.apilayer.com" # Only change the url_base field - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - stream_slicer: - $ref: "*ref(stream_slicer)" +definitions: + <...> + retriever: + type: SimpleRetriever + <...> + stream_slicer: + $ref: "*ref(definitions.stream_slicer)" ``` Finally, we'll update the path to point to the `stream_slice`'s start_date ``` -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - stream_cursor_field: "date" - primary_key: "date" - schema_loader: - $ref: "*ref(schema_loader)" - retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: "/exchangerates_data/{{ stream_slice['start_time'] or 'latest' }}" +streams: + - type: DeclarativeStream + <...> + retriever: + $ref: "*ref(definitions.retriever)" + requester: + $ref: "*ref(definitions.requester)" + path: "/exchangerates_data/{{stream_slice['start_time'] or 'latest'}}" ``` The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: ``` -schema_loader: - type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" -selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "[_]" -requester: - type: HttpRequester - name: "{{ options['name'] }}" - http_method: "GET" - authenticator: - type: ApiKeyAuthenticator - header: "apikey" - api_token: "{{ config['access_key'] }}" - request_options_provider: - request_parameters: - base: "{{ config.base }}" -stream_slicer: - type: "DatetimeStreamSlicer" - start_datetime: - datetime: "{{ config['start_date'] }}" - datetime_format: "%Y-%m-%d" - end_datetime: - datetime: "{{ now_local() }}" - datetime_format: "%Y-%m-%d %H:%M:%S.%f" - step: "1d" - datetime_format: "%Y-%m-%d" - cursor_field: "{{ options['stream_cursor_field'] }}" -retriever: - type: SimpleRetriever - $options: - url_base: "https://api.apilayer.com" - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - stream_slicer: - $ref: "*ref(stream_slicer)" - record_selector: - $ref: "*ref(selector)" - paginator: - type: NoPagination -rates_stream: - type: DeclarativeStream - $options: - name: "rates" - stream_cursor_field: "date" - primary_key: "date" +version: "0.1.0" + +definitions: schema_loader: - $ref: "*ref(schema_loader)" + type: JsonSchema + file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" + selector: + type: RecordSelector + extractor: + type: JelloExtractor + transform: "[_]" + requester: + type: HttpRequester + name: "{{ options['name'] }}" + http_method: "GET" + authenticator: + type: ApiKeyAuthenticator + header: "apikey" + api_token: "{{ config['access_key'] }}" + request_options_provider: + request_parameters: + base: "{{ config['base'] }}" + stream_slicer: + type: "DatetimeStreamSlicer" + start_datetime: + datetime: "{{ config['start_date'] }}" + datetime_format: "%Y-%m-%d" + end_datetime: + datetime: "{{ now_local() }}" + datetime_format: "%Y-%m-%d %H:%M:%S.%f" + step: "1d" + datetime_format: "%Y-%m-%d" + cursor_field: "{{ options['stream_cursor_field'] }}" retriever: - $ref: "*ref(retriever)" - requester: - $ref: "*ref(requester)" - path: "/exchangerates_data/{{ stream_slice['start_time'] or 'latest' }}" + type: SimpleRetriever + $options: + url_base: "https://api.apilayer.com" + name: "{{ options['name'] }}" + primary_key: "{{ options['primary_key'] }}" + record_selector: + $ref: "*ref(definitions.selector)" + paginator: + type: NoPagination + stream_slicer: + $ref: "*ref(definitions.stream_slicer)" + streams: - - "*ref(rates_stream)" + - type: DeclarativeStream + $options: + name: "rates" + stream_cursor_field: "date" + primary_key: "rates" + schema_loader: + $ref: "*ref(definitions.schema_loader)" + retriever: + $ref: "*ref(definitions.retriever)" + requester: + $ref: "*ref(definitions.requester)" + path: "/exchangerates_data/{{stream_slice['start_time'] or 'latest'}}" check: type: CheckStream stream_names: ["rates"] + ``` Running the `read` operation will now read all data for all days between start_date and now: From 3b1e85f0452fc6aa602f000909570dc97967edec Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 15:28:54 -0700 Subject: [PATCH 63/92] small updates --- .../config-based/tutorial/0-getting-started.md | 2 +- .../tutorial/3-connecting-to-the-API-source.md | 3 +-- .../config-based/tutorial/4-reading-data.md | 2 +- .../config-based/tutorial/5-incremental-reads.md | 8 ++++---- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index 0decb28b52c9..e44a1a65da85 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -14,7 +14,7 @@ In this tutorial, we will read data from the following endpoints: - `Latest Rates Endpoint` - `Historical Rates Endpoint` -With the end goal of implementing a Source with a single `Stream` containing exchange rates going from a base currency to many other currencies. +With the end goal of implementing a `Source` with a single `Stream` containing exchange rates going from a base currency to many other currencies. The output schema of our stream will look like the following: ```json diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index ce6aaf64b29b..8352bef2fb20 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -7,14 +7,13 @@ Over the course of this tutorial, we'll be editing a few files that were generat - `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml`: This is the [spec file](../../connector-specification-reference.md). It describes the inputs used to configure the connector. - `source-exchange-rates-tutorial/source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: This is the connector definition. It describes how the data should be read from the API source. - `source-exchange_rates-tutorial/integration_tests/configured_catalog.json`: This is the connector's [catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md). It describes what data is available in a source -- `source-exchange-rates-tutorial/integration_tests/sample_state.json`: Sample state object to be used to test [incremental syncs](../../cdk-python/incremental-stream.md). +- `source-exchange-rates-tutorial/integration_tests/sample_state.json`: This is a sample state object to be used to test [incremental syncs](../../cdk-python/incremental-stream.md). We'll also be creating the following files: - `source-exchange-rates-tutorial/secrets/config.json`: This is the configuration file we'll be using to test the connector. It's schema should match the schema defined in the spec file. - `source-exchange-rates-tutorial/secrets/invalid_config.json`: This is an invalid configuration file we'll be using to test the connector. It's schema should match the schema defined in the spec file. - `source_exchange_rates_tutorial/schemas/rates.json`: This is the [schema definition](../../cdk-python/schemas.md) for the stream we'll implement. -- `source-exchange-rates/acceptance-test-config.yml`: This is the [acceptance test configuration file](../../testing-connectors/README.md). It describes the integration tests to be used to verify that the connector works as expected. ## Updating the connector spec and config diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 5546af426e90..848284ddfc2c 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -21,7 +21,7 @@ Let's first add the stream to the configured catalog in `source-exchange_rates-t } ``` -The configured catalog declares the sync modes supported by the stream \(full refresh or incremental\). +The configured catalog declares the sync modes supported by the stream (full refresh or incremental). See the [catalog guide](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) for more information. Let's define the stream schema in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/schemas/rates.json` diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index f029213b630d..23f979f2da33 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -3,7 +3,7 @@ We now have a working implementation of a connector reading the latest exchange rates for a given currency. In this section, we'll update the source to read historical data instead of only reading the latest exchange rates. -According to the API documentation, we can read the exchange rate for a specific date by querying the "/{date}" endpoint instead of "/latest". +According to the API documentation, we can read the exchange rate for a specific date by querying the `"/exchangerates_data/{date}"` endpoint instead of `"/exchangerates_data/latest"`. We'll now add a `start_date` property to the connector. @@ -59,7 +59,7 @@ The file should look like where the start date should be 7 days in the past. And we'll update the `path` in the connector definition to point to `/{{ config.start_date }}`. -Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/latest`: +Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/exchangerates_data/latest`: ``` streams: @@ -115,7 +115,7 @@ and refer to it in the stream's retriever. This will generate slices from the start time until the end time, where each slice is exactly one day. The start time is defined in the config file, while the end time is defined by the `now_local()` macro, which will evaluate to the current date in the current timezone at runtime. See the section on [string interpolation](../yaml-structure.md#string-interpolation) for more details. -Note that we're also setting the `cursor_field` in the stream's `options` because it is used both by the `Stream` and the `StreamSlicer`: +Note that we're also setting the `stream_cursor_field` in the stream's `$options` so it can be accessed by the `StreamSlicer`: ``` streams: @@ -139,7 +139,7 @@ definitions: $ref: "*ref(definitions.stream_slicer)" ``` -Finally, we'll update the path to point to the `stream_slice`'s start_date +Finally, we'll update the path to point to the `stream_slice`'s start_time ``` streams: From b4498f3ccb6afd415b613686a4c77a04f4cdff91 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 15:37:04 -0700 Subject: [PATCH 64/92] update yaml structure --- docs/connector-development/config-based/yaml-structure.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/yaml-structure.md b/docs/connector-development/config-based/yaml-structure.md index c5d6fa758e19..5547c75ec56f 100644 --- a/docs/connector-development/config-based/yaml-structure.md +++ b/docs/connector-development/config-based/yaml-structure.md @@ -12,9 +12,11 @@ The configuration will be validated against this JSON Schema, which defines the The general structure of the YAML is as follows: ``` - +version: "0.1.0" +definitions: + streams: - + check: ``` From 306e9e5d795ddb670416664603bfefd0aafb322c Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 15:45:09 -0700 Subject: [PATCH 65/92] custom class example --- .../config-based/overview.md | 35 ++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 9ca2b91a4d26..9b99e579081b 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -104,4 +104,37 @@ More details on error handling can be found in the [error handling section](erro The `ConnectionChecker` defines how to test the connection to the integration. -The only implementation as of now is `CheckStream`, which tries to read a record from a specified list of streams and fails if no records could be read. \ No newline at end of file +The only implementation as of now is `CheckStream`, which tries to read a record from a specified list of streams and fails if no records could be read. + +## Custom components + +Any builtin components can be overloaded by a custom Python class. +To create a custom component, define a new class in a new file in the connector's module. +The class must implement the interface of the component it is replacing. For instance, a pagination strategy must implement `airbyte_cdk.sources.declarative.requesters.paginators.strategies.pagination_strategy.PaginationStrategy`. +The class must also be a dataclass where each field represent an argument to configure from the yaml file, and an `InitVar` named options. + +For example: + +``` +@dataclass +class MyPaginationStrategy(PaginationStrategy): + my_field: Union[InterpolatedString, str] + options: InitVar[Mapping[str, Any]] + + def __post_init__(self, options: Mapping[str, Any]): + pass + + def next_page_token(self, response: requests.Response, last_records: List[Mapping[str, Any]]) -> Optional[Any]: + pass + + def reset(self): + pass +``` + +This class can then be referred from the yaml file using its fully qualified class name: + +```yaml +pagination_strategy: + class_name: "my_connector_module.MyPaginationStrategy" + my_field: "hello world" +``` \ No newline at end of file From c2d9b8646da59997739f31e33710bed8e0e170cf Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 15:51:18 -0700 Subject: [PATCH 66/92] language annotations --- .../config-based/authentication.md | 10 +-- .../config-based/error-handling.md | 72 +++++++++---------- .../config-based/pagination.md | 8 +-- .../config-based/record-selector.md | 42 +++++------ .../config-based/request-options.md | 8 +-- .../config-based/stream-slicers.md | 12 ++-- .../config-based/tutorial/1-create-source.md | 4 +- .../tutorial/2-install-dependencies.md | 2 +- .../3-connecting-to-the-API-source.md | 24 +++---- .../config-based/tutorial/4-reading-data.md | 18 ++--- .../tutorial/5-incremental-reads.md | 30 ++++---- .../config-based/tutorial/6-testing.md | 12 ++-- .../config-based/yaml-structure.md | 54 +++++++------- 13 files changed, 152 insertions(+), 144 deletions(-) diff --git a/docs/connector-development/config-based/authentication.md b/docs/connector-development/config-based/authentication.md index d78734f3fd42..f626234e6c00 100644 --- a/docs/connector-development/config-based/authentication.md +++ b/docs/connector-development/config-based/authentication.md @@ -9,7 +9,7 @@ The `Authenticator` defines how to configure outgoing HTTP requests to authentic The `ApiKeyAuthenticator` sets an HTTP header on outgoing requests. The following definition will set the header "Authorization" with a value "Bearer hello": -``` +```yaml authenticator: type: "ApiKeyAuthenticator" header: "Authorization" @@ -21,7 +21,7 @@ authenticator: The `BearerAuthenticator` is a specialized `ApiKeyAuthenticator` that always sets the header "Authorization" with the value "Bearer {token}". The following definition will set the header "Authorization" with a value "Bearer hello" -``` +```yaml authenticator: type: "BearerAuthenticator" token: "hello" @@ -34,7 +34,7 @@ More information on bearer authentication can be found [here](https://swagger.io The `BasicHttpAuthenticator` set the "Authorization" header with a (USER ID/password) pair, encoded using base64 as per [RFC 7617](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme). The following definition will set the header "Authorization" with a value "Basic " -``` +```yaml authenticator: type: "BasicHttpAuthenticator" username: "hello" @@ -43,7 +43,7 @@ authenticator: The password is optional. Authenticating with APIs using Basic HTTP and a single API key can be done as: -``` +```yaml authenticator: type: "BasicHttpAuthenticator" username: "hello" @@ -63,7 +63,7 @@ OAuth authentication is supported through the `OAuthAuthenticator`, which requir - expires_in_name (Optional): The field to extract expires_in from in the response. Default: "expires_in" - refresh_request_body (Optional): The request body to send in the refresh request. Default: None -``` +```yaml authenticator: type: "OAuthAuthenticator" token_refresh_endpoint: "https://api.searchmetrics.com/v4/token" diff --git a/docs/connector-development/config-based/error-handling.md b/docs/connector-development/config-based/error-handling.md index 2979cb8e6241..cea3d4d3cdd2 100644 --- a/docs/connector-development/config-based/error-handling.md +++ b/docs/connector-development/config-based/error-handling.md @@ -10,37 +10,37 @@ Other behaviors can be configured through the `Requester`'s `error_handler` fiel Response filters can be used to define how to handle requests resulting in responses with a specific HTTP status code. For instance, this example will configure the handler to also retry responses with 404 error: -``` +```yaml requester: <...> error_handler: response_filters: - - http_codes: [404] - action: RETRY + - http_codes: [ 404 ] + action: RETRY ``` Response filters can be used to specify HTTP errors to ignore. For instance, this example will configure the handler to ignore responses with 404 error: -``` +```yaml requester: <...> error_handler: response_filters: - - http_codes: [404] - action: IGNORE + - http_codes: [ 404 ] + action: IGNORE ``` Errors can also be defined by parsing the error message. For instance, this error handler will ignores responses if the error message contains the string "ignorethisresponse" -``` +```yaml requester: <...> error_handler: response_filters: - - error_message_contain: "ignorethisresponse" - action: IGNORE + - error_message_contain: "ignorethisresponse" + action: IGNORE ``` This can also be done through a more generic string interpolation strategy with the following parameters: @@ -49,27 +49,27 @@ This can also be done through a more generic string interpolation strategy with This example ignores errors where the response contains a "code" field: -``` +```yaml requester: <...> error_handler: response_filters: - - predicate: "{{ 'code' in response }}" - action: IGNORE + - predicate: "{{ 'code' in response }}" + action: IGNORE ``` The error handler can have multiple response filters. The following example is configured to ignore 404 errors, and retry 429 errors: -``` +```yaml requester: <...> error_handler: response_filters: - - http_codes: [404] - action: IGNORE - - http_codes: [429] - action: RETRY + - http_codes: [ 404 ] + action: IGNORE + - http_codes: [ 429 ] + action: RETRY ``` ## Backoff Strategies @@ -89,27 +89,27 @@ When using the `ConstantBackoffStrategy`, the requester will backoff with a cons When using the `WaitTimeFromHeaderBackoffStrategy`, the requester will backoff by an interval specified in the response header. In this example, the requester will backoff by the response's "wait_time" header value: -``` +```yaml requester: <...> error_handler: <...> backoff_strategies: - - type: "WaitTimeFromHeaderBackoffStrategy" - header: "wait_time" + - type: "WaitTimeFromHeaderBackoffStrategy" + header: "wait_time" ``` Optionally, a regular expression can be configured to extract the wait time from the header value. -``` +```yaml requester: <...> error_handler: <...> backoff_strategies: - - type: "WaitTimeFromHeaderBackoffStrategy" - header: "wait_time" - regex: "[-+]?\d+" + - type: "WaitTimeFromHeaderBackoffStrategy" + header: "wait_time" + regex: "[-+]?\d+" ``` ### Wait until time defined in header @@ -117,16 +117,16 @@ requester: When using the `WaitUntilTimeFromHeaderBackoffStrategy`, the requester will backoff until the time specified in the response header. In this example, the requester will wait until the time specified in the "wait_until" header value: -``` +```yaml requester: <...> error_handler: <...> backoff_strategies: - - type: "WaitUntilTimeFromHeaderBackoffStrategy" - header: "wait_until" - regex: "[-+]?\d+" - min_wait: 5 + - type: "WaitUntilTimeFromHeaderBackoffStrategy" + header: "wait_until" + regex: "[-+]?\d+" + min_wait: 5 ``` The strategy accepts an optional regular expression to extract the time from the header value, and a minimum time to wait. @@ -136,24 +136,24 @@ The strategy accepts an optional regular expression to extract the time from the The error handler can have multiple backoff strategies, allowing it to fallback if a strategy cannot be evaluated. For instance, the following defines an error handler that will read the backoff time from a header, and default to a constant backoff if the wait time could not be extracted from the response: -``` +```yaml requester: <...> error_handler: <...> backoff_strategies: - - type: "WaitTimeFromHeaderBackoffStrategy" - header: "wait_time" - - type: "ConstantBackoffStrategy" - backoff_time_in_seconds: 5 - + - type: "WaitTimeFromHeaderBackoffStrategy" + header: "wait_time" + - type: "ConstantBackoffStrategy" + backoff_time_in_seconds: 5 + ``` The `requester` can be configured to use a `CompositeErrorHandler`, which sequentially iterates over a list of error handlers, enabling different retry mechanisms for different types of errors. In this example, a constant backoff of 5 seconds, will be applied if the response contains a "code" field, and an exponential backoff will be applied if the error code is 403: -``` +```yaml requester: <...> error_handler: diff --git a/docs/connector-development/config-based/pagination.md b/docs/connector-development/config-based/pagination.md index d5966793a21c..1e7f18bbb072 100644 --- a/docs/connector-development/config-based/pagination.md +++ b/docs/connector-development/config-based/pagination.md @@ -27,7 +27,7 @@ When using the `PageIncrement` strategy, the page number will be set as part of The following paginator example will fetch 5 records per page, and specify the page number as a request_parameter: -``` +```yaml paginator: type: "LimitPaginator" page_size: 5 @@ -54,7 +54,7 @@ When using the `OffsetIncrement` strategy, the number of records read will be se The following paginator example will fetch 5 records per page, and specify the offset as a request_parameter: -``` +```yaml paginator: type: "LimitPaginator" page_size: 5 @@ -85,7 +85,7 @@ This cursor value can be used to request the next page of record. In this example, the next page of record is defined by setting the `from` request parameter to the id of the last record read: -``` +```yaml paginator: type: "LimitPaginator" <...> @@ -104,7 +104,7 @@ the next request will be sent as `https://cloud.airbyte.com/api/get_data?from=10 Some APIs directly point to the URL of the next page to fetch. In this example, the URL of the next page is extracted from the response headers: -``` +```yaml paginator: type: "LimitPaginator" <...> diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index 45b75d1483cb..0d21005f35b3 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -34,7 +34,7 @@ The expression in the filter will be evaluated to a boolean returning true the r In this example, all records with a `created_at` field greater than the stream slice's `start_time` will be filtered out: -``` +```yaml selector: extractor: transform: "[_]" @@ -51,26 +51,26 @@ Fields can be added or removed from records by adding `Transformation`s to a str Fields can be added with the `AddFields` transformation. This example adds a top-level field "field1" with a value "static_value" -``` +```yaml stream: <...> transformations: - - type: AddFields - fields: - - path: ["field1"] - value: "static_value" + - type: AddFields + fields: + - path: [ "field1" ] + value: "static_value" ``` This example adds a top-level field "start_date", whose value is evaluated from the stream slice: -``` +```yaml stream: <...> transformations: - - type: AddFields - fields: - - path: ["start_date"] - value: {{ stream_slice['start_date'] }} + - type: AddFields + fields: + - path: [ "start_date" ] + value: { { stream_slice[ 'start_date' ] } } ``` Fields can also be added in a nested object by writing the fields' path as a list. @@ -89,14 +89,14 @@ Given a record of the following shape: this definition will add a field in the "data" nested object: -``` +```yaml stream: <...> transformations: - - type: AddFields - fields: - - path: ["data", "field1"] - value: "static_value" + - type: AddFields + fields: + - path: [ "data", "field1" ] + value: "static_value" ``` resulting in the following record: @@ -135,14 +135,14 @@ Given a record of the following shape: this definition will remove the 2 instances of "data_to_remove" which are found in "path2" and "path.to.field1": -``` +```yaml the_stream: <...> transformations: - - type: RemoveFields - field_pointers: - - ["path", "to", "field1"] - - ["path2"] + - type: RemoveFields + field_pointers: + - [ "path", "to", "field1" ] + - [ "path2" ] ``` resulting in the following record: diff --git a/docs/connector-development/config-based/request-options.md b/docs/connector-development/config-based/request-options.md index 8877340a2c6e..f5f75a38549e 100644 --- a/docs/connector-development/config-based/request-options.md +++ b/docs/connector-development/config-based/request-options.md @@ -7,7 +7,7 @@ There are a few ways to set request parameters, headers, and body on ongoing HTT The primary way to set request options is through the `Requester`'s `RequestOptionsProvider`. The options can be configured as key value pairs: -``` +```yaml requester: type: HttpRequester name: "{{ options['name'] }}" @@ -24,7 +24,7 @@ requester: It is also possible to configure add a json-encoded body to outgoing requests. -``` +```yaml requester: type: HttpRequester name: "{{ options['name'] }}" @@ -55,7 +55,7 @@ The respective values can be set on the outgoing HTTP requests by specifying whe The following example will set the "page" request parameter value to the page to fetch, and the "page_size" request parameter to 5: -``` +```yaml paginator: type: "LimitPaginator" page_size: 5 @@ -78,7 +78,7 @@ The respective values can be set on the outgoing HTTP requests by specifying whe The following example will set the "created[gte]" request parameter value to the start of the time window, and "created[lte]" to the end of the time window. -``` +```yaml stream_slicer: start_datetime: "2021-02-01T00:00:00.000000+0000", end_datetime: "2021-03-01T00:00:00.000000+0000", diff --git a/docs/connector-development/config-based/stream-slicers.md b/docs/connector-development/config-based/stream-slicers.md index df98c294e030..96f0a5bd0a7e 100644 --- a/docs/connector-development/config-based/stream-slicers.md +++ b/docs/connector-development/config-based/stream-slicers.md @@ -23,7 +23,7 @@ This is done by slicing the stream on the records' cursor value, defined by the Given a start time, an end time, and a step function, it will partition the interval [start, end] into small windows of the size described by the step. For instance, -``` +```yaml stream_slicer: start_datetime: "2021-02-01T00:00:00.000000+0000", end_datetime: "2021-03-01T00:00:00.000000+0000", @@ -34,7 +34,7 @@ will create one slice per day for the interval `2021-02-01` - `2021-03-01`. The `DatetimeStreamSlicer` also supports an optional lookback window, specifying how many days before the start_datetime to read data for. -``` +```yaml stream_slicer: start_datetime: "2021-02-01T00:00:00.000000+0000", end_datetime: "2021-03-01T00:00:00.000000+0000", @@ -66,7 +66,7 @@ When reading data from the source, the cursor value will be updated to the max d If an API supports filtering data based on the cursor field, the `start_time_option` and `end_time_option` parameters can be used to configure this filtering. For instance, if the API supports filtering using the request parameters `created[gte]` and `created[lte]`, then the stream slicer can specify the request parameters as -``` +```yaml stream_slicer: type: "DatetimeStreamSlicer" <...> @@ -89,7 +89,7 @@ It is defined by As an example, this stream slicer will iterate over the 2 repositories ("airbyte" and "airbyte-secret") and will set a request_parameter on outgoing HTTP requests. -``` +```yaml stream_slicer: type: "ListStreamSlicer" slice_values: @@ -135,7 +135,7 @@ For each stream, the slicer needs to know Assuming the commits for a given repository can be read by specifying the repository as a request_parameter, this could be defined as -``` +```yaml stream_slicer: type: "SubstreamSlicer" parent_streams_configs: @@ -150,7 +150,7 @@ stream_slicer: REST APIs often nest sub-resources in the URL path. If the URL to fetch commits was "/repositories/:id/commits", then the `Requester`'s path would need to refer to the stream slice's value and no `request_option` would be set: -``` +```yaml retriever: <...> requester: diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index e2fac304f0cf..c33ae9ef5802 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -2,14 +2,14 @@ Let's start by cloning the Airbyte repository: -``` +```bash git clone git@github.com:airbytehq/airbyte.git cd airbyte ``` Airbyte provides a code generator which bootstraps the scaffolding for our connector. -``` +```bash cd airbyte-integrations/connector-templates/generator ./generate.sh ``` diff --git a/docs/connector-development/config-based/tutorial/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md index 11691bb08bc5..06bc75dc38d6 100644 --- a/docs/connector-development/config-based/tutorial/2-install-dependencies.md +++ b/docs/connector-development/config-based/tutorial/2-install-dependencies.md @@ -7,7 +7,7 @@ The command below assume that `python` points to a version of python >=3.9.0. If this is the case on your machine, substitute the `python` commands with `python3`. The subsequent `python` invocations will use the virtual environment created for the connector. -``` +```bash cd ../../connectors/source-exchange-rates-tutorial python -m venv .venv source .venv/bin/activate diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 8352bef2fb20..cda6b321e226 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -21,7 +21,7 @@ Let's populate the specification (`spec.yaml`) the configuration (`secrets/confi 1. We'll add these properties to the connector spec in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/spec.yaml` -``` +```yaml documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi connectionSpecification: $schema: http://json-schema.org/draft-07/schema# @@ -52,7 +52,7 @@ connectionSpecification: 2. We also need to fill in the connection config in the `secrets/config.json` Because of the sensitive nature of the access key, we recommend storing this config in the `secrets` directory because it is ignored by git. -``` +```bash echo '{"access_key": "", "base": "USD"}' > secrets/config.json ``` @@ -65,7 +65,7 @@ Let's fill this out these TODOs with the information found in the [Exchange Rate 1. First, let's rename the stream from `customers` to `rates`, and update the primary key to `date` -``` +```yaml streams: - type: DeclarativeStream $options: @@ -75,10 +75,10 @@ streams: and update the references in the `check` block -``` +```yaml check: type: CheckStream - stream_names: ["rates"] + stream_names: [ "rates" ] ``` Adding the reference in the `check` tells the `check` operation to use that stream to test the connection. @@ -86,7 +86,7 @@ Adding the reference in the `check` tells the `check` operation to use that stre 2. Next we'll set the base url. According to the API documentation, the base url is `"https://api.exchangeratesapi.io/v1/"`. -``` +```yaml definitions: <...> retriever: @@ -97,7 +97,7 @@ definitions: 3. We can fetch the latest data by submitting a request to the `/latest` API endpoint. This path is specific to the stream, so we'll set it within the `rates_stream` definition, at the `retriever` level. -``` +```yaml streams: - type: DeclarativeStream $options: @@ -116,7 +116,7 @@ streams: The Exchange Rates API requires an access key to be passed as header named "apikey". This can be done using an `ApiKeyAuthenticator`, which we'll configure to point to the config's `access_key` field. -``` +```yaml definitions: <...> requester: @@ -131,7 +131,7 @@ definitions: 5. According to the ExchangeRatesApi documentation, we can specify the base currency of interest in a request parameter. Let's assume the user will configure this via the connector configuration in parameter called `base`; we'll pass the value input by the user as a request parameter: -``` +```yaml definitions: <...> requester: @@ -143,7 +143,7 @@ definitions: The full connection definition should now look like -``` +```yaml version: "0.1.0" definitions: @@ -191,12 +191,12 @@ streams: path: "/exchangerates_data/latest" check: type: CheckStream - stream_names: ["rates"] + stream_names: [ "rates" ] ``` We can now run the `check` operation, which verifies the connector can connect to the API source. -``` +```bash python main.py check --config secrets/config.json ``` diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 848284ddfc2c..b9bab51fb58b 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -3,7 +3,7 @@ Now that we're able to authenticate to the source API, we'll want to select data from the HTTP responses. Let's first add the stream to the configured catalog in `source-exchange_rates-tutorial/integration_tests/configured_catalog.json` -``` +```json { "streams": [ { @@ -28,20 +28,20 @@ Let's define the stream schema in `source-exchange-rates-tutorial/source_exchang You can download the JSON file describing the output schema with all currencies [here](https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json) for convenience and place it in `schemas/`. -``` +```bash curl https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json > source_exchange_rates_tutorial/schemas/rates.json ``` We can also delete the boilerplate schema files -``` +```bash rm source_exchange_rates_tutorial/schemas/customers.json rm source_exchange_rates_tutorial/schemas/employees.json ``` Next, we'll update the record selection to wrap the single record returned by the source in an array in `source_exchange_rates_tutorial/exchange_rates_tutorial.yamlsource_exchange_rates_tutorial/exchange_rates_tutorial.yaml` -``` +```yaml definitions: <...> selector: @@ -55,7 +55,7 @@ The transform is defined using the `Jello` syntax, which is a Python-based JQ al Here is the complete connector definition for convenience: -``` +```yaml version: "0.1.0" definitions: @@ -103,12 +103,12 @@ streams: path: "/exchangerates_data/latest" check: type: CheckStream - stream_names: ["rates"] + stream_names: [ "rates" ] ``` Reading from the source can be done by running the `read` operation -``` +```bash python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json ``` @@ -121,7 +121,9 @@ The logs should show that 1 record was read from the stream. The `--debug` flag can be set to print out debug information, including the outgoing request and its associated response -```python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --debug``` +```bash +python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --debug +``` ## Next steps diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index 23f979f2da33..f6d18d7fbffc 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -9,7 +9,7 @@ We'll now add a `start_date` property to the connector. First we'll update the spec `source_exchange_rates_tutorial/spec.yaml` -``` +```yaml documentationUrl: https://docs.airbyte.io/integrations/sources/exchangeratesapi connectionSpecification: $schema: http://json-schema.org/draft-07/schema# @@ -48,7 +48,7 @@ Then we'll set the `start_date` to last week in our connection config in `secret Let's add a start_date field to `secrets/config.json`. The file should look like -``` +```json { "access_key": "", "start_date": "2022-07-26", @@ -61,7 +61,7 @@ where the start date should be 7 days in the past. And we'll update the `path` in the connector definition to point to `/{{ config.start_date }}`. Note that we are setting a default value because the `check` operation does not know the `start_date`. We'll default to hitting `/exchangerates_data/latest`: -``` +```yaml streams: - type: DeclarativeStream $options: @@ -78,7 +78,9 @@ streams: You can test these changes by executing the `read` operation: -```python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json``` +```bash +python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json +``` By reading the output record, you should see that we read historical data instead of the latest exchange rate. For example: @@ -92,7 +94,7 @@ More details on the stream slicers can be found [here](./link-to-stream-slicers. Let's first define a stream slicer at the top level of the connector definition: -``` +```yaml definitions: requester: <...> @@ -117,7 +119,7 @@ The start time is defined in the config file, while the end time is defined by t Note that we're also setting the `stream_cursor_field` in the stream's `$options` so it can be accessed by the `StreamSlicer`: -``` +```yaml streams: - type: DeclarativeStream $options: @@ -129,7 +131,7 @@ streams: We'll also update the retriever to user the stream slicer: -``` +```yaml definitions: <...> retriever: @@ -141,7 +143,7 @@ definitions: Finally, we'll update the path to point to the `stream_slice`'s start_time -``` +```yaml streams: - type: DeclarativeStream <...> @@ -154,7 +156,7 @@ streams: The full connector definition should now look like `./source_exchange_rates_tutorial/exchange_rates_tutorial.yaml`: -``` +```yaml version: "0.1.0" definitions: @@ -216,13 +218,13 @@ streams: path: "/exchangerates_data/{{stream_slice['start_time'] or 'latest'}}" check: type: CheckStream - stream_names: ["rates"] + stream_names: [ "rates" ] ``` Running the `read` operation will now read all data for all days between start_date and now: -``` +```bash python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json ``` @@ -237,7 +239,7 @@ The operation should now output more than one record: Instead of always reading data for all dates, we would like the connector to only read data for dates we haven't read yet. This can be achieved by updating the catalog to run in incremental mode (`integration_tests/configured_catalog.json`): -``` +```json { "streams": [ { @@ -267,7 +269,7 @@ Where the date ("2022-07-15") should be replaced by today's date. We can simulate incremental syncs by creating a state file containing the last state produced by the `read` operation. `source-exchange-rates-tutorial/integration_tests/sample_state.json`: -``` +```json { "rates": { "date": "2022-07-15" @@ -277,7 +279,7 @@ We can simulate incremental syncs by creating a state file containing the last s Running the `read` operation will now only read data for dates later than the given state: -``` +```bash python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --state integration_tests/sample_state.json ``` diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 6c1ce22758d5..1c1c5cef8a16 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -9,13 +9,17 @@ Before running the tests, we'll create an invalid config to make sure the `check Update `integration_tests/invalid_config.json` with this content -``` -{"access_key": "", "start_date": "2022-07-21", "base": "USD"} +```json +{ + "access_key": "", + "start_date": "2022-07-21", + "base": "USD" +} ``` and `integration_tests/abnormal_state.json` with -``` +```json { "rates": { "date": "2999-12-31" @@ -26,7 +30,7 @@ and `integration_tests/abnormal_state.json` with You can run the acceptance tests with the following commands: -``` +```bash docker build . -t airbyte/source-exchange-rates-tutorial:dev python -m pytest integration_tests -p integration_tests.acceptance ``` diff --git a/docs/connector-development/config-based/yaml-structure.md b/docs/connector-development/config-based/yaml-structure.md index 5547c75ec56f..2ec08ca2f098 100644 --- a/docs/connector-development/config-based/yaml-structure.md +++ b/docs/connector-development/config-based/yaml-structure.md @@ -11,13 +11,13 @@ The configuration will be validated against this JSON Schema, which defines the The general structure of the YAML is as follows: -``` +```yaml version: "0.1.0" definitions: - + streams: -check: +check: ``` @@ -44,7 +44,7 @@ will result in If the component is a mapping with a "class_name" field, an object of type "class_name" will be instantiated by passing the mapping's other fields to the constructor -``` +```yaml my_component: class_name: "fully_qualified.class_name" a_parameter: 3 @@ -95,38 +95,38 @@ More details on object instantiation can be found [here](https://airbyte-cdk.rea Parameters can be passed down from a parent component to its subcomponents using the $options key. This can be used to avoid repetitions. -``` +```yaml outer: $options: MyKey: MyValue inner: - k2: v2 + k2: v2 ``` This the example above, if both outer and inner are types with a "MyKey" field, both of them will evaluate to "MyValue". These parameters can be overwritten by subcomponents as a form of specialization: -``` +```yaml outer: $options: MyKey: MyValue inner: - $options: - MyKey: YourValue - k2: v2 + $options: + MyKey: YourValue + k2: v2 ``` In this example, "outer.MyKey" will evaluate to "MyValue", and "inner.MyKey" will evaluate to "YourValue". The value can also be used for string interpolation: -``` +```yaml outer: $options: MyKey: MyValue inner: - k2: "MyKey is {{ options['MyKey'] }}" + k2: "MyKey is {{ options['MyKey'] }}" ``` In this example, outer.inner.k2 will evaluate to "MyKey is MyValue" @@ -138,21 +138,21 @@ The parser will dereference these values to produce a complete ConnectionDefinit References can be defined using a *ref() string. -``` +```yaml key: 1234 reference: "*ref(key)" ``` will produce the following definition: -``` +```yaml key: 1234 reference: 1234 ``` This also works with objects: -``` +```yaml key_value_pairs: k1: v1 k2: v2 @@ -161,7 +161,7 @@ same_key_value_pairs: "*ref(key_value_pairs)" will produce the following definition: -``` +```yaml key_value_pairs: k1: v1 k2: v2 @@ -172,7 +172,7 @@ same_key_value_pairs: The $ref keyword can be used to refer to an object and enhance it with addition key-value pairs -``` +```yaml key_value_pairs: k1: v1 k2: v2 @@ -183,7 +183,7 @@ same_key_value_pairs: will produce the following definition: -``` +```yaml key_value_pairs: k1: v1 k2: v2 @@ -197,34 +197,34 @@ References can also point to nested values. Nested references are ambiguous because one could define a key containing with `.` in this example, we want to refer to the limit key in the dict object: -``` +```yaml dict: - limit: 50 + limit: 50 limit_ref: "*ref(dict.limit)" ``` will produce the following definition: -``` +```yaml dict - limit: 50 +limit: 50 limit-ref: 50 ``` whereas here we want to access the `nested.path` value. -``` +```yaml nested: - path: "first one" + path: "first one" nested.path: "uh oh" value: "ref(nested.path) ``` will produce the following definition: -``` +```yaml nested: - path: "first one" + path: "first one" nested.path: "uh oh" value: "uh oh" ``` @@ -246,7 +246,7 @@ the "options" keyword [see ($options)](yaml-structure.md#object-instantiation) c For example, some_object.inner_object.key will evaluate to "Hello airbyte" at runtime. -``` +```yaml some_object: $options: name: "airbyte" From 562844b0e7c7614f3552e4b1df5972100c571fa1 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Wed, 10 Aug 2022 19:56:13 -0700 Subject: [PATCH 67/92] update warning --- docs/connector-development/config-based/overview.md | 2 +- .../config-based/tutorial/0-getting-started.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 9b99e579081b..37a175adba75 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -1,6 +1,6 @@ # Config-based connectors overview -:warning: This framework is in alpha stage. Support is not in production and is available only to selected users. +:warning: This framework is in alpha stage. Support is not in production and is available only to select users. :warning: The goal of this document is to give enough technical specifics to understand how config-based connectors work. When you're ready to start building a connector, you can start with [the tutorial](../../../config-based/tutorial/0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) diff --git a/docs/connector-development/config-based/tutorial/0-getting-started.md b/docs/connector-development/config-based/tutorial/0-getting-started.md index e44a1a65da85..5170b6ad5d26 100644 --- a/docs/connector-development/config-based/tutorial/0-getting-started.md +++ b/docs/connector-development/config-based/tutorial/0-getting-started.md @@ -1,6 +1,6 @@ # Getting Started -:warning: This framework is in alpha stage. Support is not in production and is available only to selected users. +:warning: This framework is in alpha stage. Support is not in production and is available only to select users. :warning: ## Summary From 08487f7dc14512308488ba4ada580eab4fc78b8a Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Thu, 11 Aug 2022 14:37:22 -0700 Subject: [PATCH 68/92] Update tutorial to use dpath extractor --- .../3-connecting-to-the-API-source.md | 4 +- .../config-based/tutorial/4-reading-data.md | 67 ------------------- .../tutorial/5-incremental-reads.md | 4 +- 3 files changed, 4 insertions(+), 71 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index cda6b321e226..04a669c658dc 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -153,8 +153,8 @@ definitions: selector: type: RecordSelector extractor: - type: JelloExtractor - transform: "_" + type: DpathExtractor + field_pointer: [] requester: type: HttpRequester name: "{{ options['name'] }}" diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index b9bab51fb58b..6520250ef0a4 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -39,73 +39,6 @@ rm source_exchange_rates_tutorial/schemas/customers.json rm source_exchange_rates_tutorial/schemas/employees.json ``` -Next, we'll update the record selection to wrap the single record returned by the source in an array in `source_exchange_rates_tutorial/exchange_rates_tutorial.yamlsource_exchange_rates_tutorial/exchange_rates_tutorial.yaml` - -```yaml -definitions: - <...> - selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "[_]" # wrap the single record returned by the API in an array -``` - -The transform is defined using the `Jello` syntax, which is a Python-based JQ alternative. More details on Jello can be found [here](https://github.com/kellyjonbrazil/jello). - -Here is the complete connector definition for convenience: - -```yaml -version: "0.1.0" - -definitions: - schema_loader: - type: JsonSchema - file_path: "./source_exchange_rates_tutorial/schemas/{{ options['name'] }}.json" - selector: - type: RecordSelector - extractor: - type: JelloExtractor - transform: "[_]" # wrap the single record returned by the API in an array - requester: - type: HttpRequester - name: "{{ options['name'] }}" - http_method: "GET" - authenticator: - type: ApiKeyAuthenticator - header: "apikey" - api_token: "{{ config['access_key'] }}" - request_options_provider: - request_parameters: - base: "{{ config['base'] }}" - retriever: - type: SimpleRetriever - $options: - url_base: "https://api.apilayer.com" - name: "{{ options['name'] }}" - primary_key: "{{ options['primary_key'] }}" - record_selector: - $ref: "*ref(definitions.selector)" - paginator: - type: NoPagination - -streams: - - type: DeclarativeStream - $options: - name: "rates" - primary_key: "rates" - schema_loader: - $ref: "*ref(definitions.schema_loader)" - retriever: - $ref: "*ref(definitions.retriever)" - requester: - $ref: "*ref(definitions.requester)" - path: "/exchangerates_data/latest" -check: - type: CheckStream - stream_names: [ "rates" ] -``` - Reading from the source can be done by running the `read` operation ```bash diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index f6d18d7fbffc..b886b37a658f 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -166,8 +166,8 @@ definitions: selector: type: RecordSelector extractor: - type: JelloExtractor - transform: "[_]" + type: DpathExtractor + transform: [ ] requester: type: HttpRequester name: "{{ options['name'] }}" From 30d25c0f4080d286001d77acf867f281f6dce26c Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Thu, 11 Aug 2022 14:50:52 -0700 Subject: [PATCH 69/92] Update record selector docs --- .../config-based/record-selector.md | 115 ++++++++++++++++-- 1 file changed, 105 insertions(+), 10 deletions(-) diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index 0d21005f35b3..5dc281734220 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -2,17 +2,55 @@ The record selector is responsible for translating an HTTP response into a list of Airbyte records by extracting records from the response and optionally filtering and shaping records based on a heuristic. -The current record selector implementation uses Jello to select records from the json-decoded HTTP response. -The record selection uses Python syntax, where `_` means top of the object. See [common recipes](#common-recipes). -More information on Jello can be found at https://github.com/kellyjonbrazil/jello +The current record extraction implementation uses [dpath](https://pypi.org/project/dpath/) to select records from the json-decoded HTTP response. ## Common recipes: -1. Selecting the whole json object can be done with `_` -2. Wrapping the whole json object in an array can be done with `[_]` -3. Inner fields can be selected by referring to it with the dot-notation: `_.data` will return the data field +Here are some common patterns: -Given a json object of the form +### Selecting the whole response + +If the root of the response is an array containing the records, the records can be extracted using the following definition: + +```yaml +selector: + extractor: + field_pointer: [ ] +``` + +If the root of the response is a json object representing a single record, the record can be extracted and wrapped in an array. + +For example, + +Given a response body of the form + +```json +{ + "id": 1 +} +``` + +and a selector + +```yaml +selector: + extractor: + field_pointer: [ ] +``` + +The selected records will be + +```json +[ + { + "id": 1 + } +] +``` + +### Selecting a field + +Given a response body of the form ``` { @@ -21,10 +59,67 @@ Given a json object of the form } ``` -and a selector `_.data`, will produce the following: +and a selector +```yaml +selector: + extractor: + field_pointer: [ "data" ] +``` + +The selected records will be + +```json +[ + { + "id": 0 + }, + { + "id": 1 + } +] ``` -[{"id": 0}, {"id": 1}] + +### Selecting an inner field + +Given a response body of the form + +```json +{ + "data": { + "records": [ + { + "id": 1 + }, + { + "id": 2 + } + ] + } +} +``` + +and a selector + +```yaml +selector: + extractor: + field_pointer: + - "data" + - "records" +``` + +The selected records will be + +```json +[ + { + "id": 1 + }, + { + "id": 2 + } +] ``` ## Filtering records @@ -37,7 +132,7 @@ In this example, all records with a `created_at` field greater than the stream s ```yaml selector: extractor: - transform: "[_]" + field_pointer: [ ] record_filter: condition: "{{ record['created_at'] < stream_slice['start_time'] }}" ``` From 63a295c56d4fb254481815dabc9652f6bd9bd45d Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Thu, 11 Aug 2022 14:59:21 -0700 Subject: [PATCH 70/92] unit test --- .../sources/declarative/extractors/test_dpath_extractor.py | 1 + 1 file changed, 1 insertion(+) diff --git a/airbyte-cdk/python/unit_tests/sources/declarative/extractors/test_dpath_extractor.py b/airbyte-cdk/python/unit_tests/sources/declarative/extractors/test_dpath_extractor.py index ca94f57a41bd..5b15fb5fbb6a 100644 --- a/airbyte-cdk/python/unit_tests/sources/declarative/extractors/test_dpath_extractor.py +++ b/airbyte-cdk/python/unit_tests/sources/declarative/extractors/test_dpath_extractor.py @@ -20,6 +20,7 @@ [ ("test_extract_from_array", ["data"], {"data": [{"id": 1}, {"id": 2}]}, [{"id": 1}, {"id": 2}]), ("test_extract_single_record", ["data"], {"data": {"id": 1}}, [{"id": 1}]), + ("test_extract_single_record_from_root", [], {"id": 1}, [{"id": 1}]), ("test_extract_from_root_array", [], [{"id": 1}, {"id": 2}], [{"id": 1}, {"id": 2}]), ("test_nested_field", ["data", "records"], {"data": {"records": [{"id": 1}, {"id": 2}]}}, [{"id": 1}, {"id": 2}]), ("test_field_in_config", ["{{ config['field'] }}"], {"record_array": [{"id": 1}, {"id": 2}]}, [{"id": 1}, {"id": 2}]), From 019cc0a535b0d0b8216035190e0ed0c49de9fd89 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:01:06 -0700 Subject: [PATCH 71/92] link to contributing --- .../connector-development/config-based/tutorial/6-testing.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 1c1c5cef8a16..eb9aa8591dbe 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -39,8 +39,11 @@ python -m pytest integration_tests -p integration_tests.acceptance Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). +See your [Contributiong guide]() on how to get started releasing your connector. + ## Read more: - [Error handling](../error-handling.md) - [Pagination](../pagination.md) -- [Testing connectors](../../testing-connectors/README.md) \ No newline at end of file +- [Testing connectors](../../testing-connectors/README.md) +- [Contribution guide](../../../contributing-to-airbyte/README.md) \ No newline at end of file From 117ee2f802e714b5104cd133845bca5ca5ca5262 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:03:48 -0700 Subject: [PATCH 72/92] tiny update --- docs/connector-development/config-based/yaml-structure.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/connector-development/config-based/yaml-structure.md b/docs/connector-development/config-based/yaml-structure.md index 2ec08ca2f098..d81da3a3c0d9 100644 --- a/docs/connector-development/config-based/yaml-structure.md +++ b/docs/connector-development/config-based/yaml-structure.md @@ -2,10 +2,11 @@ Connectors are defined as a yaml configuration describing the connector's Source. -2 top-level fields are required: +3 top-level fields are required: -1. `streams`: list of streams that are part of the source -2. `check`: component describing how to check the connection. +1. `streams`: List of streams that are part of the source +2. `check`: Component describing how to check the connection. +3. `version`: The framework version. The configuration will be validated against this JSON Schema, which defines the set of valid properties. From 3a00dac75dac8b99e5c6b4b4366b38f0d163f21a Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:11:18 -0700 Subject: [PATCH 73/92] $ in front of commands --- .../config-based/tutorial/1-create-source.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index c33ae9ef5802..6749f7f29b93 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -3,8 +3,8 @@ Let's start by cloning the Airbyte repository: ```bash -git clone git@github.com:airbytehq/airbyte.git -cd airbyte +$ git clone git@github.com:airbytehq/airbyte.git +$ cd airbyte ``` Airbyte provides a code generator which bootstraps the scaffolding for our connector. From b2040fc50c85b31a3512d403ade19f68f663bbf4 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:13:19 -0700 Subject: [PATCH 74/92] $ in front of commands --- .../config-based/tutorial/1-create-source.md | 4 ++-- .../config-based/tutorial/2-install-dependencies.md | 10 +++++----- .../tutorial/3-connecting-to-the-API-source.md | 6 +++--- .../config-based/tutorial/4-reading-data.md | 10 +++++----- .../config-based/tutorial/5-incremental-reads.md | 6 +++--- .../config-based/tutorial/6-testing.md | 4 ++-- 6 files changed, 20 insertions(+), 20 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index 6749f7f29b93..bc3c10a87993 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -10,8 +10,8 @@ $ cd airbyte Airbyte provides a code generator which bootstraps the scaffolding for our connector. ```bash -cd airbyte-integrations/connector-templates/generator -./generate.sh +$ cd airbyte-integrations/connector-templates/generator +$ ./generate.sh ``` This will bring up an interactive helper application. Use the arrow keys to pick a template from the list. Select the `Configuration Based Source` template and then input the name of your connector. The application will create a new directory in `airbyte/airbyte-integrations/connectors/` with the name of your new connector. diff --git a/docs/connector-development/config-based/tutorial/2-install-dependencies.md b/docs/connector-development/config-based/tutorial/2-install-dependencies.md index 06bc75dc38d6..b165557c62d2 100644 --- a/docs/connector-development/config-based/tutorial/2-install-dependencies.md +++ b/docs/connector-development/config-based/tutorial/2-install-dependencies.md @@ -8,10 +8,10 @@ If this is the case on your machine, substitute the `python` commands with `pyth The subsequent `python` invocations will use the virtual environment created for the connector. ```bash -cd ../../connectors/source-exchange-rates-tutorial -python -m venv .venv -source .venv/bin/activate -pip install -r requirements.txt +$ cd ../../connectors/source-exchange-rates-tutorial +$ python -m venv .venv +$ source .venv/bin/activate +$ pip install -r requirements.txt ``` These steps create an initial python environment, and install the dependencies required to run an API Source connector. @@ -19,7 +19,7 @@ These steps create an initial python environment, and install the dependencies r Let's verify everything works as expected by running the Airbyte `spec` operation: ```bash -python main.py spec +$ python main.py spec ``` You should see an output similar to the one below: diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 04a669c658dc..59c22734a0c9 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -53,7 +53,7 @@ connectionSpecification: Because of the sensitive nature of the access key, we recommend storing this config in the `secrets` directory because it is ignored by git. ```bash -echo '{"access_key": "", "base": "USD"}' > secrets/config.json +$ echo '{"access_key": "", "base": "USD"}' > secrets/config.json ``` ## Updating the connector definition @@ -154,7 +154,7 @@ definitions: type: RecordSelector extractor: type: DpathExtractor - field_pointer: [] + field_pointer: [ ] requester: type: HttpRequester name: "{{ options['name'] }}" @@ -197,7 +197,7 @@ check: We can now run the `check` operation, which verifies the connector can connect to the API source. ```bash -python main.py check --config secrets/config.json +$ python main.py check --config secrets/config.json ``` which should now succeed with logs similar to: diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index 6520250ef0a4..f1714bcb0c4b 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -29,20 +29,20 @@ Let's define the stream schema in `source-exchange-rates-tutorial/source_exchang You can download the JSON file describing the output schema with all currencies [here](https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json) for convenience and place it in `schemas/`. ```bash -curl https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json > source_exchange_rates_tutorial/schemas/rates.json +$ curl https://raw.githubusercontent.com/airbytehq/airbyte/master/airbyte-cdk/python/docs/tutorials/http_api_source_assets/exchange_rates.json > source_exchange_rates_tutorial/schemas/rates.json ``` We can also delete the boilerplate schema files ```bash -rm source_exchange_rates_tutorial/schemas/customers.json -rm source_exchange_rates_tutorial/schemas/employees.json +$ rm source_exchange_rates_tutorial/schemas/customers.json +$ rm source_exchange_rates_tutorial/schemas/employees.json ``` Reading from the source can be done by running the `read` operation ```bash -python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json +$ python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json ``` The logs should show that 1 record was read from the stream. @@ -55,7 +55,7 @@ The logs should show that 1 record was read from the stream. The `--debug` flag can be set to print out debug information, including the outgoing request and its associated response ```bash -python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --debug +$ python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --debug ``` ## Next steps diff --git a/docs/connector-development/config-based/tutorial/5-incremental-reads.md b/docs/connector-development/config-based/tutorial/5-incremental-reads.md index b886b37a658f..70164c4dab94 100644 --- a/docs/connector-development/config-based/tutorial/5-incremental-reads.md +++ b/docs/connector-development/config-based/tutorial/5-incremental-reads.md @@ -79,7 +79,7 @@ streams: You can test these changes by executing the `read` operation: ```bash -python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json +$ python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json ``` By reading the output record, you should see that we read historical data instead of the latest exchange rate. @@ -225,7 +225,7 @@ check: Running the `read` operation will now read all data for all days between start_date and now: ```bash -python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json +$ python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json ``` The operation should now output more than one record: @@ -280,7 +280,7 @@ We can simulate incremental syncs by creating a state file containing the last s Running the `read` operation will now only read data for dates later than the given state: ```bash -python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --state integration_tests/sample_state.json +$ python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json --state integration_tests/sample_state.json ``` There shouldn't be any data read if the state is today's date: diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index eb9aa8591dbe..4f25b55f1582 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -31,8 +31,8 @@ and `integration_tests/abnormal_state.json` with You can run the acceptance tests with the following commands: ```bash -docker build . -t airbyte/source-exchange-rates-tutorial:dev -python -m pytest integration_tests -p integration_tests.acceptance +$ docker build . -t airbyte/source-exchange-rates-tutorial:dev +$ python -m pytest integration_tests -p integration_tests.acceptance ``` ## Next steps: From db243a8c6d16de2af5fc3627eb0ba854ada282ff Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:29:34 -0700 Subject: [PATCH 75/92] More readings --- .../config-based/tutorial/1-create-source.md | 6 +++++- .../config-based/tutorial/3-connecting-to-the-API-source.md | 5 ++++- .../config-based/tutorial/4-reading-data.md | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/connector-development/config-based/tutorial/1-create-source.md b/docs/connector-development/config-based/tutorial/1-create-source.md index bc3c10a87993..c2eebb5ace80 100644 --- a/docs/connector-development/config-based/tutorial/1-create-source.md +++ b/docs/connector-development/config-based/tutorial/1-create-source.md @@ -25,4 +25,8 @@ For this walkthrough, we'll refer to our source as `exchange-rates-tutorial`. ## Next steps -Next, [we'll install dependencies required to run the connector](2-install-dependencies.md) \ No newline at end of file +Next, [we'll install dependencies required to run the connector](2-install-dependencies.md) + +## More readings + +- [Connector generator](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connector-templates/generator/README.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md index 59c22734a0c9..ca6abec250be 100644 --- a/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md +++ b/docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md @@ -216,4 +216,7 @@ Next, we'll [extract the records from the response](4-reading-data.md) - [Connector definition YAML file](../yaml-structure.md) - [Config-based connectors overview](../overview.md) - [Authentication](../authentication.md) -- [Request options providers](../request-options.md) \ No newline at end of file +- [Request options providers](../request-options.md) +- [Schema definition](../../cdk-python/schemas.md) +- [Connector specification reference](../../connector-specification-reference.md) +- [Beginner's guide to catalog](../../../understanding-airbyte/beginners-guide-to-catalog.md) \ No newline at end of file diff --git a/docs/connector-development/config-based/tutorial/4-reading-data.md b/docs/connector-development/config-based/tutorial/4-reading-data.md index f1714bcb0c4b..a59c7d997b1f 100644 --- a/docs/connector-development/config-based/tutorial/4-reading-data.md +++ b/docs/connector-development/config-based/tutorial/4-reading-data.md @@ -66,4 +66,5 @@ Next, we'll [enhance the connector to read data for a given date, which will ena ## More readings -- [Record selector](../record-selector.md) \ No newline at end of file +- [Record selector](../record-selector.md) +- [Catalog guide](https://docs.airbyte.io/understanding-airbyte/beginners-guide-to-catalog) \ No newline at end of file From cc0d76c8e0b604fdcfa17c4d2c08ce180bd99562 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:36:20 -0700 Subject: [PATCH 76/92] link to existing config-based connectors --- docs/connector-development/config-based/overview.md | 10 +++++++++- .../config-based/tutorial/6-testing.md | 5 ++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 37a175adba75..09ad7f62a69c 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -137,4 +137,12 @@ This class can then be referred from the yaml file using its fully qualified cla pagination_strategy: class_name: "my_connector_module.MyPaginationStrategy" my_field: "hello world" -``` \ No newline at end of file +``` + +## Sample connectors + +The following connectors can serve as example of what production-ready config-based connectors look like + +- [Greenhouse](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-greenhouse) +- [Sendgrid](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-sendgrid) +- [Sentry](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-sentry) diff --git a/docs/connector-development/config-based/tutorial/6-testing.md b/docs/connector-development/config-based/tutorial/6-testing.md index 4f25b55f1582..90129993c5ac 100644 --- a/docs/connector-development/config-based/tutorial/6-testing.md +++ b/docs/connector-development/config-based/tutorial/6-testing.md @@ -46,4 +46,7 @@ See your [Contributiong guide]() on how to get started releasing your connector. - [Error handling](../error-handling.md) - [Pagination](../pagination.md) - [Testing connectors](../../testing-connectors/README.md) -- [Contribution guide](../../../contributing-to-airbyte/README.md) \ No newline at end of file +- [Contribution guide](../../../contributing-to-airbyte/README.md) +- [Greenhouse source](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-greenhouse) +- [Sendgrid source](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-sendgrid) +- [Sentry source](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/source-sentry) \ No newline at end of file From 6cbdaa06bc7c7a89b22838c324deaac9baa3e282 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:45:24 -0700 Subject: [PATCH 77/92] index --- .../config-based/index.md | 26 +++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 docs/connector-development/config-based/index.md diff --git a/docs/connector-development/config-based/index.md b/docs/connector-development/config-based/index.md new file mode 100644 index 000000000000..02d4703ca11b --- /dev/null +++ b/docs/connector-development/config-based/index.md @@ -0,0 +1,26 @@ +# Index + +## From scratch + +- [Overview](overview.md) +- [Yaml structure](overview.md) +- [Reference docs](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) + +## Concepts + +- [Authentication](authentication.md) +- [Error handling](error-handling.md) +- [Pagination](pagination.md) +- [Record selection](record-selector.md) +- [Request options](request-options.md) +- [Stream slicers](stream-slicers.md) + +## Tutorial + +0. [Getting started](tutorial/0-getting-started.md) +1. [Creating a source](tutorial/1-create-source.md) +2. [Installing dependencies](tutorial/2-install-dependencies.md) +3. [Connecting to the API](tutorial/3-connecting-to-the-API-source.md) +4. [Reading data](tutorial/4-reading-data.md) +5. [Incremental reads](tutorial/5-incremental-reads.md) +6. [Testing](tutorial/6-testing.md) \ No newline at end of file From 619bf37973df106cb30bf05edf621e80fee7ac3e Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 08:51:58 -0700 Subject: [PATCH 78/92] update --- docs/connector-development/config-based/record-selector.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index 5dc281734220..03f12864cf37 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -20,9 +20,7 @@ selector: If the root of the response is a json object representing a single record, the record can be extracted and wrapped in an array. -For example, - -Given a response body of the form +For example, given a response body of the form ```json { From 9a4f1c94db4b115badac7ddca849cc8009155f1b Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 10:27:00 -0700 Subject: [PATCH 79/92] delete broken link --- docs/connector-development/config-based/overview.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 09ad7f62a69c..0543f08f5819 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -4,7 +4,6 @@ The goal of this document is to give enough technical specifics to understand how config-based connectors work. When you're ready to start building a connector, you can start with [the tutorial](../../../config-based/tutorial/0-getting-started.md) or dive into the [reference documentation](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html) -See the [motivation section](./motivation.md) for more information on the motivation driving this framework. ## Overview From 5337868e22eb0b84f07608b96a2547cd2c4f9f90 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 11:28:13 -0700 Subject: [PATCH 80/92] supported features --- .../config-based/overview.md | 34 ++++++++++++------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 0543f08f5819..c555cbd5ac71 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -7,19 +7,7 @@ When you're ready to start building a connector, you can start with [the tutoria ## Overview -In building over 100 API connectors, we observed that most API connectors are implemented in a formulaic approach: - -1. Implement the API's authentication mechanism -2. Describe the schema of the data returned by the API -3. Make requests to the API's URL containing the data of interest -4. Implement pagination strategy -5. Implement error handling and rate limiting -6. Decode the data returned from the API -7. Keep track of what data was already synced. - -Each of these problems have a finite number of solutions. For instance, most APIs use one of 3 standard pagination mechanism. - -The CDK's config-based interface uses a declarative approach to the problem and allows the developers to specify __what__ data they want to read from a source and abstracts the specifics of the __how__. +The CDK's config-based interface uses a declarative approach to building source connectors for REST APIs. Config-based connectors work by parsing a YAML configuration describing the Source, then running the configured connector using a Python backend. @@ -27,6 +15,26 @@ The process then submits HTTP requests to the API endpoint, and extracts records See the [connector definition section](yaml-structure.md) for more information on the YAML file describing the connector. +## Supported features + +| Feature | Support | +|-----------------------|---------------------------------------------------| +| Transport protocol | HTTP | +| HTTP methods | GET, POST | +| Data format | Json | +| Resource type | Collections
Sub-collection | +| Pagination | Page limit
Offset
Cursor | +| Authentication | Header based
OAuth 2.0 | +| Sync mode | Full refresh
Incremental | +| Schema discovery | Only static schemas | +| Record transformation | Adding fields
Removing fields
| +| Error detection | From HTTP status code
From error message | +| Backoff | Exponential
Constant
Derived from headers | +| Filtering records | :heavy_check_mark: | +| Throttling | :x: | + +If a feature you require is not supported, you can [request the feature](../../contributing-to-airbyte/README.md#requesting-new-features) and use the [Python CDK](../cdk-python/README.md). + ## Source Config-based connectors are a declarative way to define HTTP API sources. From e4919d50192acd9514361c1fa8f50018991fee6c Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 11:56:13 -0700 Subject: [PATCH 81/92] update --- .../config-based/overview.md | 29 +++++++++---------- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index c555cbd5ac71..e12368254e02 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -17,21 +17,20 @@ See the [connector definition section](yaml-structure.md) for more information o ## Supported features -| Feature | Support | -|-----------------------|---------------------------------------------------| -| Transport protocol | HTTP | -| HTTP methods | GET, POST | -| Data format | Json | -| Resource type | Collections
Sub-collection | -| Pagination | Page limit
Offset
Cursor | -| Authentication | Header based
OAuth 2.0 | -| Sync mode | Full refresh
Incremental | -| Schema discovery | Only static schemas | -| Record transformation | Adding fields
Removing fields
| -| Error detection | From HTTP status code
From error message | -| Backoff | Exponential
Constant
Derived from headers | -| Filtering records | :heavy_check_mark: | -| Throttling | :x: | +| Feature | Support | +|-----------------------|-------------------------------------------------------| +| Transport protocol | HTTP | +| HTTP methods | GET, POST | +| Data format | Json | +| Resource type | Collections
Sub-collection | +| Pagination | Page limit
Offset
Cursor | +| Authentication | Header based
OAuth 2.0 | +| Sync mode | Full refresh
Incremental | +| Schema discovery | Only static schemas | +| Record transformation | Field selection
Adding fields
Removing fields
| +| Error detection | From HTTP status code
From error message | +| Backoff | Exponential
Constant
Derived from headers | +| Filtering records | :heavy_check_mark: | If a feature you require is not supported, you can [request the feature](../../contributing-to-airbyte/README.md#requesting-new-features) and use the [Python CDK](../cdk-python/README.md). From e27fade19c0bc05160336fa493cb0b7dd795c847 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 12:12:29 -0700 Subject: [PATCH 82/92] Add some links --- .../config-based/error-handling.md | 4 +++ .../config-based/overview.md | 28 +++++++++---------- 2 files changed, 18 insertions(+), 14 deletions(-) diff --git a/docs/connector-development/config-based/error-handling.md b/docs/connector-development/config-based/error-handling.md index cea3d4d3cdd2..921f39b872b5 100644 --- a/docs/connector-development/config-based/error-handling.md +++ b/docs/connector-development/config-based/error-handling.md @@ -7,6 +7,8 @@ Other behaviors can be configured through the `Requester`'s `error_handler` fiel ## Defining errors +### From status code + Response filters can be used to define how to handle requests resulting in responses with a specific HTTP status code. For instance, this example will configure the handler to also retry responses with 404 error: @@ -31,6 +33,8 @@ requester: action: IGNORE ``` +### From error message + Errors can also be defined by parsing the error message. For instance, this error handler will ignores responses if the error message contains the string "ignorethisresponse" diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index e12368254e02..31d124a441e3 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -17,20 +17,20 @@ See the [connector definition section](yaml-structure.md) for more information o ## Supported features -| Feature | Support | -|-----------------------|-------------------------------------------------------| -| Transport protocol | HTTP | -| HTTP methods | GET, POST | -| Data format | Json | -| Resource type | Collections
Sub-collection | -| Pagination | Page limit
Offset
Cursor | -| Authentication | Header based
OAuth 2.0 | -| Sync mode | Full refresh
Incremental | -| Schema discovery | Only static schemas | -| Record transformation | Field selection
Adding fields
Removing fields
| -| Error detection | From HTTP status code
From error message | -| Backoff | Exponential
Constant
Derived from headers | -| Filtering records | :heavy_check_mark: | +| Feature | Support | +|--------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Transport protocol | HTTP | +| HTTP methods | GET, POST | +| Data format | Json | +| Resource type | Collections
Sub-collection | +| [Pagination](./pagination.md) | [Page limit](./pagination.md#page-increment)
[Offset](./pagination.md#offset-increment)
[Cursor](./pagination.md#cursor) | +| [Authentication](./authentication.md) | [Header based](./authentication.md#ApiKeyAuthenticator)
[Bearer](./authentication.md#BearerAuthenticator)
[Basic](./authentication.md#BasicHttpAuthenticator)
[OAuth](./authentication.md#OAuth) | +| Sync mode | Full refresh
Incremental | +| Schema discovery | Only static schemas | +| [Stream slicing](./stream-slicers.md) | [Datetime](./stream-slicers.md#Datetime), [lists](./stream-slicers.md#list-stream-slicer), [parent-resource id](./stream-slicers.md#Substream-slicer) | +| [Record transformation](./record-selector.md) | [Field selection](./record-selector.md#selecting-a-field)
[Adding fields](./record-selector.md#adding-fields)
[Removing fields](./record-selector.md#removing-fields)
[Filtering records](./record-selector.md#filtering-records) | +| [Error detection](./error-handling.md) | [From HTTP status code](./error-handling.md#from-status-code)
[From error message](./error-handling.md#from-error-message) | +| [Backoff strategies](./error-handling.md#Backoff-Strategies) | [Exponential](./error-handling.md#Exponential-backoff)
[Constant](./error-handling.md#Constant-Backoff)
[Derived from headers](./error-handling.md#Wait-time-defined-in-header) | If a feature you require is not supported, you can [request the feature](../../contributing-to-airbyte/README.md#requesting-new-features) and use the [Python CDK](../cdk-python/README.md). From 048bddbfd6c26ccdb15b016b7cd0af972aa9ab66 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:47:25 -0700 Subject: [PATCH 83/92] Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 31d124a441e3..88dc73ff191f 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -67,7 +67,7 @@ There is currently only one implementation, the `SimpleRetriever`, which is defi 1. Requester: Describes how to submit requests to the API source 2. Paginator: Describes how to navigate through the API's pages -3. Record selector: Describes how to select records from an HTTP response +3. Record selector: Describes how to extract records from an HTTP response 4. Stream Slicer: Describes how to partition the stream, enabling incremental syncs and checkpointing Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. From 019aad2605a9c829297aa491cc603fd568f6ade2 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:47:36 -0700 Subject: [PATCH 84/92] Update docs/connector-development/config-based/record-selector.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> --- docs/connector-development/config-based/record-selector.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/record-selector.md b/docs/connector-development/config-based/record-selector.md index 03f12864cf37..bcaee21e6cdb 100644 --- a/docs/connector-development/config-based/record-selector.md +++ b/docs/connector-development/config-based/record-selector.md @@ -163,7 +163,7 @@ stream: - type: AddFields fields: - path: [ "start_date" ] - value: { { stream_slice[ 'start_date' ] } } + value: {{ stream_slice[ 'start_date' ] }} ``` Fields can also be added in a nested object by writing the fields' path as a list. From cc308f2bfb768cd7de7b380ab92169d5860790ec Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:47:42 -0700 Subject: [PATCH 85/92] Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 88dc73ff191f..3d639f378b4d 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -117,7 +117,7 @@ The only implementation as of now is `CheckStream`, which tries to read a record Any builtin components can be overloaded by a custom Python class. To create a custom component, define a new class in a new file in the connector's module. The class must implement the interface of the component it is replacing. For instance, a pagination strategy must implement `airbyte_cdk.sources.declarative.requesters.paginators.strategies.pagination_strategy.PaginationStrategy`. -The class must also be a dataclass where each field represent an argument to configure from the yaml file, and an `InitVar` named options. +The class must also be a dataclass where each field represents an argument to configure from the yaml file, and an `InitVar` named options. For example: From e3cffc8a15fde16a58cd6a14903427177b7e3db0 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:47:59 -0700 Subject: [PATCH 86/92] Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 3d639f378b4d..5589891cf823 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -71,7 +71,7 @@ There is currently only one implementation, the `SimpleRetriever`, which is defi 4. Stream Slicer: Describes how to partition the stream, enabling incremental syncs and checkpointing Each of those components (and their subcomponents) are defined by an explicit interface and one or many implementations. -The developer can choose and configure the implementation they need depending on specifications of the integrations they are building against. +The developer can choose and configure the implementation they need depending on specifications of the integration they are building against. Since the `Retriever` is defined as part of the Stream configuration, different Streams for a given Source can use different `Retriever` definitions if needed. From 785a3e4cfd33a3c868ccec4886ac1d055838d1ee Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:48:44 -0700 Subject: [PATCH 87/92] Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index 5589891cf823..a2a0d1ad4893 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -94,7 +94,7 @@ More details on the paginator can be found in the [pagination section](paginatio ## Requester The `Requester` defines how to prepare HTTP requests to send to the source API. -There currently is only one implementation, the `HttpRequester`, which is defined by +There is currently only one implementation, the `HttpRequester`, which is defined by 1. A base url: The root of the API source 2. A path: The specific endpoint to fetch data from for a resource From 2db76945c4a2bb967629e26aec570aca89ba2e83 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:50:02 -0700 Subject: [PATCH 88/92] mention the unit --- docs/connector-development/config-based/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/overview.md b/docs/connector-development/config-based/overview.md index a2a0d1ad4893..0a9350d43c05 100644 --- a/docs/connector-development/config-based/overview.md +++ b/docs/connector-development/config-based/overview.md @@ -56,7 +56,7 @@ A stream is defined by: 4. [Data retriever](overview.md#data-retriever): Describes how to retrieve the data from the API 5. [Cursor field](../cdk-python/incremental-stream.md) (Optional): Field to use used as stream cursor. Can either be a string, or a list of strings if the cursor is a nested field. 6. [Transformations](./record-selector.md#transformations) (Optional): A set of transformations to be applied on the records read from the source before emitting them to the destination -7. [Checkpoint interval](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#state--checkpointing) (Optional): Defines the interval at which incremental syncs should be checkpointed. +7. [Checkpoint interval](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol/#state--checkpointing) (Optional): Defines the interval, in number of records, at which incremental syncs should be checkpointed. More details on streams and sources can be found in the [basic concepts section](../cdk-python/basic-concepts.md). From 2a7d5fc1c598ed3e7b211392b867753fd814133f Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:51:31 -0700 Subject: [PATCH 89/92] headers --- docs/connector-development/config-based/pagination.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/connector-development/config-based/pagination.md b/docs/connector-development/config-based/pagination.md index 1e7f18bbb072..b305fddaa9cc 100644 --- a/docs/connector-development/config-based/pagination.md +++ b/docs/connector-development/config-based/pagination.md @@ -83,6 +83,8 @@ The `CursorPaginationStrategy` outputs a token by evaluating its `cursor_value` This cursor value can be used to request the next page of record. +#### Cursor paginator in request parameters + In this example, the next page of record is defined by setting the `from` request parameter to the id of the last record read: ```yaml @@ -102,6 +104,8 @@ the first request will be sent as `https://cloud.airbyte.com/api/get_data` Assuming the id of the last record fetched is 1000, the next request will be sent as `https://cloud.airbyte.com/api/get_data?from=1000` +#### Cursor paginator in path + Some APIs directly point to the URL of the next page to fetch. In this example, the URL of the next page is extracted from the response headers: ```yaml From eba032220423de9ce5d11d9da01c67ee8ad3d532 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 14:53:46 -0700 Subject: [PATCH 90/92] remove mentions of interpolating on stream slice, etc. --- docs/connector-development/config-based/request-options.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/connector-development/config-based/request-options.md b/docs/connector-development/config-based/request-options.md index f5f75a38549e..daf6a5069a80 100644 --- a/docs/connector-development/config-based/request-options.md +++ b/docs/connector-development/config-based/request-options.md @@ -35,12 +35,6 @@ requester: key: value ``` -In addition to $options, the provider can also access the following arguments for [string interpolation](yaml-structure.md#string-interpolation): - -- stream_slice -- stream_state -- next_page_token - ## Authenticators It is also possible for authenticators to set request parameters or headers as needed. From 83531210e22da1162643c65f3922f6f54c872e13 Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 15:09:00 -0700 Subject: [PATCH 91/92] update --- docs/connector-development/config-based/authentication.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/connector-development/config-based/authentication.md b/docs/connector-development/config-based/authentication.md index f626234e6c00..d855a5aa7c55 100644 --- a/docs/connector-development/config-based/authentication.md +++ b/docs/connector-development/config-based/authentication.md @@ -32,7 +32,7 @@ More information on bearer authentication can be found [here](https://swagger.io ### BasicHttpAuthenticator The `BasicHttpAuthenticator` set the "Authorization" header with a (USER ID/password) pair, encoded using base64 as per [RFC 7617](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme). -The following definition will set the header "Authorization" with a value "Basic " +The following definition will set the header "Authorization" with a value "Basic {encoded credentials}" ```yaml authenticator: From e6637d3ceaace0b14a3c9238c4549dabbff6e96b Mon Sep 17 00:00:00 2001 From: Alexandre Girard Date: Fri, 12 Aug 2022 15:16:57 -0700 Subject: [PATCH 92/92] exclude config-based docs --- docusaurus/docusaurus.config.js | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docusaurus/docusaurus.config.js b/docusaurus/docusaurus.config.js index 7b0a247057e1..38619eb9d7aa 100644 --- a/docusaurus/docusaurus.config.js +++ b/docusaurus/docusaurus.config.js @@ -61,7 +61,8 @@ const config = { sidebarCollapsible: true, sidebarPath: require.resolve('./sidebars.js'), editUrl: 'https://github.com/airbytehq/airbyte/blob/master/docs', - path: '../docs' + path: '../docs', + exclude: ['**/connector-development/config-based/**'] }, blog: false, theme: {