-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 New Source: Facebook Pages #5158
Conversation
/test connector=connectors/source-facebook-pages
|
# extra_fields: no | ||
# exact_order: no | ||
# extra_records: yes | ||
# incremental: # TODO if your connector does not implement incremental sync, remove this block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment why it is commented / not supported ( complex state or something else?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit complicated task, need further investigation how to implement it using graphql endpoints
|
||
dependencies { | ||
implementation files(project(':airbyte-integrations:bases:source-acceptance-test').airbyteDocker.outputs) | ||
implementation files(project(':airbyte-integrations:bases:base-python').airbyteDocker.outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line (13th) is deprecated but still exists in template-generator, need to be removed (look on another connectors for example)
"todo-stream-name": { | ||
"todo-field-name": "value" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fields should be similar to your state.json
, not todo-x
"access_token": config["access_token"], | ||
} | ||
|
||
response = requests.get("https://graph.facebook.com/v11.0/me", params=params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually some stream is used here last times (examples https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-klaviyo/source_klaviyo/source.py#L65, https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-posthog/source_posthog/source.py#L55 etc )
from .metrics import PAGE_FIELDS, PAGE_METRICS, POST_FIELDS, POST_METRICS | ||
|
||
|
||
class FacebookStream(HttpStream, ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably FacebookPagesStream
is more clear, what do you think
return f"{self._page_id}/posts" | ||
|
||
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]: | ||
records = response.json().get(self.data_field) or [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
records = response.json().get(self.data_field, [])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case self.data_field
is empty (None, False, etc) - it's better to have []
as a result
""" | ||
|
||
def path(self, **kwargs) -> str: | ||
if self._page_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not this parameter mandatory or what happen if it would not be provided?
|
/test connector=connectors/source-facebook-pages
|
/test connector=connectors/source-facebook-pages
|
/test connector=connectors/source-facebook-pages
|
/test connector=connectors/source-facebook-pages
|
/test connector=connectors/source-facebook-marketing
|
/test connector=connectors/source-facebook-marketing
|
@@ -89,7 +89,7 @@ jobs: | |||
DRIFT_INTEGRATION_TEST_CREDS: ${{ secrets.DRIFT_INTEGRATION_TEST_CREDS }} | |||
EXCHANGE_RATES_TEST_CREDS: ${{ secrets.EXCHANGE_RATES_TEST_CREDS }} | |||
FACEBOOK_MARKETING_TEST_INTEGRATION_CREDS: ${{ secrets.FACEBOOK_MARKETING_TEST_INTEGRATION_CREDS }} | |||
FACEBOOK_MARKETING_API_TEST_INTEGRATION_CREDS: ${{ secrets.FACEBOOK_MARKETING_API_TEST_INTEGRATION_CREDS }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one was unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a merged change made by @keu , commit #24f310de
airbyte-integrations/connectors/source-facebook-pages/source_facebook_pages/streams.py
Show resolved
Hide resolved
|
||
class Post(FacebookPagesStream): | ||
""" | ||
API docs: https://developers.facebook.com/docs/graph-api/reference/post/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this link mentions /{post-id}
but doesn't mention {page-id}/posts
- is that link documented anywhere? is it same as feed
? if so then we need to handle pagination
Also seems to be a paginated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's the same as the feed endpoint.
Updated, pagination added
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]: | ||
records = response.json().get(self.data_field) or [] | ||
|
||
for insights in records: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
contains insightResult nodes which don't seem to have an insights
field. Why do we have this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insightResult contains a list of dicts with "insights" field which contains data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies if I'm being silly here but I don't see this in the insightResults documentation. Where is it defined?
def path(self, **kwargs) -> str: | ||
return f'{self._page_id}/posts/?fields=insights.metric({",".join(POST_METRICS)})' | ||
|
||
def request_params( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use DRY, this should be defined at the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
return {"access_token": self._access_token} | ||
|
||
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]: | ||
data = response.json().get("insights") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the logic around data_field
can be significantly simplified and DRY'd at the base class level. Data field should be the "path" in the record where the data field can be found. Then the top level parse_response
should have a consistent impl that recursively goes into the record until it gets the desired field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
PostInsigths class has a different logic for data_field, so for this class we have the parse_response method overridden
): | ||
super().__init__(**kwargs) | ||
self._access_token = access_token | ||
self._start_date = start_date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not using this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
airbyte-integrations/connectors/source-facebook-pages/source_facebook_pages/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-pages/source_facebook_pages/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-pages/source_facebook_pages/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-pages/source_facebook_pages/streams.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-pages/source_facebook_pages/streams.py
Outdated
Show resolved
Hide resolved
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]: | ||
records = response.json().get(self.data_field) or [] | ||
|
||
for insights in records: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apologies if I'm being silly here but I don't see this in the insightResults documentation. Where is it defined?
airbyte-integrations/connectors/source-facebook-pages/unit_tests/unit_test.py
Outdated
Show resolved
Hide resolved
yield response.json() | ||
|
||
else: | ||
data_fields = self.data_field.split("_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we make data_field
an array instead of doing string parsing? while reading the code it's not clear that data_field = "insights_data"
is actually referring to a nested field. It's also problematic if the data field ever has an underscore in the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this was not addressed
In insightResults, we can see the result for 1 metric. So, if we have 1 metric and 1 post, the result will be:
But in the request for Post Insights we have many posts and many metrics, so the INSIGHTS.DATA field contains a list of insightsResult objects for each metric, and the DATA field contains a list of all posts. |
@gaart looks good, can you address the last comment about data field and run tests? |
/test connector=connectors/source-facebook-pages
|
/test connector=connectors/source-facebook-pages
|
@sherifnada done |
/test connector=connectors/source-facebook-pages
|
/publish connector=connectors/source-facebook-pages
|
@@ -0,0 +1,21 @@ | |||
{ | |||
"documentationUrl": "https://docs.airbyte.io/integrations/sources/facebook-pages", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, this documentation is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a placeholder doc in #5847. Please populate the content later on.
What
Facebook Pages connector.
#1247
Pre-merge Checklist
Expand the checklist which is relevant for this PR.
Connector checklist
airbyte_secret
in the connector's spec./gradlew :airbyte-integrations:connectors:<name>:integrationTest
./test connector=connectors/<name>
command as documented here is passing.README.md
docs/SUMMARY.md
if it's a new connectordocs/integrations/<source or destination>/<name>
.docs/integrations/...
. See changelog exampledocs/integrations/README.md
contains a reference to the new connector/publish
command described hereConnector Generator checklist
-scaffold
in their name) have been updated with the latest scaffold by running./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates
then checking in your changes