Skip to content

Commit

Permalink
Source Google Search Console: add custom analytics stream (#16433)
Browse files Browse the repository at this point in the history
* Source Google Search Console: add custom analytics stream

* Source Google Search Console: fix flake warnings

* Source Google Search Console: validate custom report string

* Source Google Search Console: update catalog to include custom report to acceptance tests

* Source Google Search Console: update abnormal state for custom report

* auto-bump connector version [ci skip]

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
  • Loading branch information
1 parent 8852d52 commit 55f875b
Show file tree
Hide file tree
Showing 12 changed files with 165 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -387,7 +387,7 @@
- name: Google Search Console
sourceDefinitionId: eb4c9e00-db83-4d63-a386-39cfa91012a8
dockerRepository: airbyte/source-google-search-console
dockerImageTag: 0.1.13
dockerImageTag: 0.1.14
documentationUrl: https://docs.airbyte.io/integrations/sources/google-search-console
icon: googlesearchconsole.svg
sourceType: api
Expand Down
10 changes: 9 additions & 1 deletion airbyte-config/init/src/main/resources/seed/source_specs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3826,7 +3826,7 @@
- - "client_secret"
oauthFlowOutputParameters:
- - "refresh_token"
- dockerImage: "airbyte/source-google-search-console:0.1.13"
- dockerImage: "airbyte/source-google-search-console:0.1.14"
spec:
documentationUrl: "https://docs.airbyte.io/integrations/sources/google-search-console"
connectionSpecification:
Expand Down Expand Up @@ -3939,6 +3939,14 @@
type: "string"
description: "The email of the user which has permissions to access\
\ the Google Workspace Admin APIs."
custom_reports:
order: 4
type: "string"
title: "Custom Reports (Optional)"
description: "A JSON array describing the custom reports you want to sync\
\ from Google Search Console. See <a href=\"https://docs.airbyte.com/integrations/sources/google-search-console#step-2-set-up-the-google-search-console-connector-in-airbyte\"\
>the docs</a> for more information about the exact format you can use\
\ to fill out this field."
supportsNormalization: false
supportsDBT: false
supported_destination_sync_modes: []
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ RUN pip install .
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.1.13
LABEL io.airbyte.version=0.1.14
LABEL io.airbyte.name=airbyte/source-google-search-console
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,21 @@ tests:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog.json"
empty_streams: []
timeout_seconds: 1800
full_refresh:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/catalog.json"
timeout_seconds: 1800
incremental:
- config_path: "secrets/config.json"
configured_catalog_path: "integration_tests/configured_catalog_incremental.json"
timeout_seconds: 1800
future_state_path: "integration_tests/abnormal_state.json"
cursor_paths:
search_analytics_by_country: [ "https://airbyte.io", "web", "date" ]
search_analytics_by_country: [ "https://airbyte.io", "web", "image" ]
search_analytics_by_device: [ "https://airbyte.io", "web", "date" ]
search_analytics_by_page: [ "https://airbyte.io", "web", "date" ]
search_analytics_by_query: [ "https://airbyte.io", "web", "date" ]
search_analytics_all_fields: [ "https://airbyte.io", "web", "date" ]
custom_dimensions: [ "https://airbyte.io", "web", "date" ]
Original file line number Diff line number Diff line change
Expand Up @@ -94,5 +94,21 @@
"date": "2023-08-28"
}
}
},
"custom_dimensions": {
"https://airbyte.io": {
"web": {
"date": "2023-08-28"
},
"news": {
"date": "2023-08-28"
},
"image": {
"date": "2023-08-28"
},
"video": {
"date": "2023-08-28"
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,15 @@
},
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite"
},
{
"stream": {
"name": "custom_dimensions",
"json_schema": {},
"supported_sync_modes": ["full_refresh", "incremental"]
},
"sync_mode": "full_refresh",
"destination_sync_mode": "overwrite"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,18 @@
"sync_mode": "incremental",
"cursor_field": ["date"],
"destination_sync_mode": "append"
},
{
"stream": {
"name": "custom_dimensions",
"json_schema": {},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": true,
"default_cursor_field": ["date"]
},
"sync_mode": "incremental",
"cursor_field": ["date"],
"destination_sync_mode": "append"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,18 @@
"sync_mode": "incremental",
"cursor_field": ["date"],
"destination_sync_mode": "append"
},
{
"stream": {
"name": "custom_dimensions",
"json_schema": {},
"supported_sync_modes": ["full_refresh", "incremental"],
"source_defined_cursor": true,
"default_cursor_field": ["date"]
},
"sync_mode": "incremental",
"cursor_field": ["date"],
"destination_sync_mode": "append"
}
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,20 @@
#

import json
from typing import Any, List, Mapping, Tuple
from typing import Any, List, Mapping, Optional, Tuple

import pendulum
from airbyte_cdk.logger import AirbyteLogger
from airbyte_cdk.models import SyncMode
from airbyte_cdk.sources import AbstractSource
from airbyte_cdk.sources.streams import Stream
from airbyte_cdk.sources.streams.http.auth import Oauth2Authenticator
from jsonschema import validate
from source_google_search_console.service_account_authenticator import ServiceAccountAuthenticator
from source_google_search_console.streams import (
SearchAnalyticsAllFields,
SearchAnalyticsByCountry,
SearchAnalyticsByCustomDimensions,
SearchAnalyticsByDate,
SearchAnalyticsByDevice,
SearchAnalyticsByPage,
Expand All @@ -23,6 +25,18 @@
Sites,
)

custom_reports_schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 1},
"dimensions": {"type": "array", "items": {"type": "string", "minLength": 1}},
},
"required": ["name", "dimensions"],
},
}


class SourceGoogleSearchConsole(AbstractSource):
def check_connection(self, logger: AirbyteLogger, config: Mapping[str, Any]) -> Tuple[bool, Any]:
Expand Down Expand Up @@ -62,8 +76,22 @@ def streams(self, config: Mapping[str, Any]) -> List[Stream]:
SearchAnalyticsAllFields(**stream_config),
]

streams = streams + self.get_custom_reports(config=config, stream_config=stream_config)

return streams

def get_custom_reports(self, config: Mapping[str, Any], stream_config: Mapping[str, Any]) -> List[Optional[Stream]]:
if "custom_reports" not in config:
return []

reports = json.loads(config["custom_reports"])
validate(reports, custom_reports_schema)

return [
type(report["name"], (SearchAnalyticsByCustomDimensions,), {})(dimensions=report["dimensions"], **stream_config)
for report in reports
]

@staticmethod
def get_stream_kwargs(config: Mapping[str, Any]) -> Mapping[str, Any]:
authorization = config.get("authorization", {})
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@
}
}
]
},
"custom_reports": {
"order": 4,
"type": "string",
"title": "Custom Reports (Optional)",
"description": "A JSON array describing the custom reports you want to sync from Google Search Console. See <a href=\"https://docs.airbyte.com/integrations/sources/google-search-console#step-2-set-up-the-google-search-console-connector-in-airbyte\">the docs</a> for more information about the exact format you can use to fill out this field."
}
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -295,3 +295,48 @@ class SearchAnalyticsByQuery(SearchAnalytics):

class SearchAnalyticsAllFields(SearchAnalytics):
dimensions = ["date", "country", "device", "page", "query"]


class SearchAnalyticsByCustomDimensions(SearchAnalytics):
def __init__(self, dimensions: List[str], *args, **kwargs):
super(SearchAnalyticsByCustomDimensions, self).__init__(*args, **kwargs)
self.dimensions = dimensions

def get_json_schema(self) -> Mapping[str, Any]:
try:
return super(SearchAnalyticsByCustomDimensions, self).get_json_schema()
except IOError:
schema: Mapping[str, Any] = {
"$schema": "http://json-schema.org/draft-07/schema#",
"type": ["null", "object"],
"additionalProperties": True,
"properties": {
"clicks": {"type": ["null", "integer"]},
"ctr": {"type": ["null", "number"], "multipleOf": 1e-25},
"date": {"type": ["null", "string"], "format": "date"},
"impressions": {"type": ["null", "integer"]},
"position": {"type": ["null", "number"], "multipleOf": 1e-25},
"search_type": {"type": ["null", "string"]},
"site_url": {"type": ["null", "string"]},
},
}

dimension_properties = self.dimension_to_property_schema()
schema["properties"].update(dimension_properties)

return schema

def dimension_to_property_schema(self) -> dict:
dimension_to_property_schema_map = {
"country": [{"country": {"type": ["null", "string"]}}],
"date": [],
"device": [{"device": {"type": ["null", "string"]}}],
"page": [{"page": {"type": ["null", "string"]}}],
"query": [{"query": {"type": ["null", "string"]}}],
}
properties = {}
for dimension in sorted(self.dimensions):
fields = dimension_to_property_schema_map[dimension]
for field in fields:
properties = {**properties, **field}
return properties
36 changes: 20 additions & 16 deletions docs/integrations/sources/google-search-console.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,16 @@ At the end of this process, you should have JSON credentials to this Google Serv
4. Click Authenticate your account to sign in with Google and authorize your account.
5. Fill in the `site_urls` field.
5. Fill in the `start date` field.
6. You should be ready to sync data.
6. Fill in the `custom reports` (optionally) in format `{"name": "<report-name>", "dimensions": ["<dimension-name>", ...]}`
7. You should be ready to sync data.

### For Airbyte Open Source:

1. Fill in the `service_account_info` and `email` fields for authentication.
2. Fill in the `site_urls` field.
3. Fill in the `start date` field.
4. You should be ready to sync data.
4. Fill in the `custom reports` (optionally) in format `{"name": "<report-name>", "dimensions": ["<dimension-name>", ...]}`
5. You should be ready to sync data.


## Supported sync modes
Expand All @@ -98,6 +100,7 @@ The google search console source connector supports the following [sync modes](h
* [Analytics report by device](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query)
* [Analytics report by page](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query)
* [Analytics report by query](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query)
* Analytics report by custom dimensions


## Performance considerations
Expand All @@ -117,18 +120,19 @@ This connector attempts to back off gracefully when it hits Reports API's rate l

## Changelog

| Version | Date | Pull Request | Subject |
|:---------| :--- | :--- | :--- |
| `0.1.13` | 2022-07-21 | [14924](https://github.com/airbytehq/airbyte/pull/14924) | Remove `additionalProperties` field from specs |
| `0.1.12` | 2022-05-04 | [12482](https://github.com/airbytehq/airbyte/pull/12482) | Update input configuration copy |
| `0.1.11` | 2022-01-05 | [9186](https://github.com/airbytehq/airbyte/pull/9186) [9194](https://github.com/airbytehq/airbyte/pull/9194) | Fix incremental sync: keep all urls in state object |
| `0.1.10` | 2021-12-23 | [9073](https://github.com/airbytehq/airbyte/pull/9073) | Add slicing by date range |
| `0.1.9` | 2021-12-22 | [9047](https://github.com/airbytehq/airbyte/pull/9047) | Add 'order' to spec.json props |
| `0.1.8` | 2021-12-21 | [8248](https://github.com/airbytehq/airbyte/pull/8248) | Enable Sentry for performance and errors tracking |
| `0.1.7` | 2021-11-26 | [7431](https://github.com/airbytehq/airbyte/pull/7431) | Add default `end_date` param value |
| `0.1.6` | 2021-09-27 | [6460](https://github.com/airbytehq/airbyte/pull/6460) | Update OAuth Spec File |
| `0.1.4` | 2021-09-23 | [6394](https://github.com/airbytehq/airbyte/pull/6394) | Update Doc link Spec File |
| `0.1.3` | 2021-09-23 | [6405](https://github.com/airbytehq/airbyte/pull/6405) | Correct Spec File |
| `0.1.2` | 2021-09-17 | [6222](https://github.com/airbytehq/airbyte/pull/6222) | Correct Spec File |
| Version | Date | Pull Request | Subject |
|:---------|:-----------| :--- |:------------------------------------------------------------|
| `0.1.14` | 2022-09-08 | [16433](https://github.com/airbytehq/airbyte/pull/16433) | Add custom analytics stream. |
| `0.1.13` | 2022-07-21 | [14924](https://github.com/airbytehq/airbyte/pull/14924) | Remove `additionalProperties` field from specs |
| `0.1.12` | 2022-05-04 | [12482](https://github.com/airbytehq/airbyte/pull/12482) | Update input configuration copy |
| `0.1.11` | 2022-01-05 | [9186](https://github.com/airbytehq/airbyte/pull/9186) [9194](https://github.com/airbytehq/airbyte/pull/9194) | Fix incremental sync: keep all urls in state object |
| `0.1.10` | 2021-12-23 | [9073](https://github.com/airbytehq/airbyte/pull/9073) | Add slicing by date range |
| `0.1.9` | 2021-12-22 | [9047](https://github.com/airbytehq/airbyte/pull/9047) | Add 'order' to spec.json props |
| `0.1.8` | 2021-12-21 | [8248](https://github.com/airbytehq/airbyte/pull/8248) | Enable Sentry for performance and errors tracking |
| `0.1.7` | 2021-11-26 | [7431](https://github.com/airbytehq/airbyte/pull/7431) | Add default `end_date` param value |
| `0.1.6` | 2021-09-27 | [6460](https://github.com/airbytehq/airbyte/pull/6460) | Update OAuth Spec File |
| `0.1.4` | 2021-09-23 | [6394](https://github.com/airbytehq/airbyte/pull/6394) | Update Doc link Spec File |
| `0.1.3` | 2021-09-23 | [6405](https://github.com/airbytehq/airbyte/pull/6405) | Correct Spec File |
| `0.1.2` | 2021-09-17 | [6222](https://github.com/airbytehq/airbyte/pull/6222) | Correct Spec File |
| `0.1.1` | 2021-09-22 | [6315](https://github.com/airbytehq/airbyte/pull/6315) | Verify access to all sites when performing connection check |
| `0.1.0` | 2021-09-03 | [5350](https://github.com/airbytehq/airbyte/pull/5350) | Initial Release |
| `0.1.0` | 2021-09-03 | [5350](https://github.com/airbytehq/airbyte/pull/5350) | Initial Release |

0 comments on commit 55f875b

Please sign in to comment.