Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 New Source: Facebook Pages #5158

Merged
merged 17 commits into from
Sep 1, 2021
Merged

🎉 New Source: Facebook Pages #5158

merged 17 commits into from
Sep 1, 2021

Conversation

gaart
Copy link
Contributor

@gaart gaart commented Aug 3, 2021

What

Facebook Pages connector.
#1247

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in the connector's spec
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Documentation updated
    • README.md
    • docs/SUMMARY.md if it's a new connector
    • Created or updated reference docs in docs/integrations/<source or destination>/<name>.
    • Changelog in the appropriate page in docs/integrations/.... See changelog example
    • docs/integrations/README.md contains a reference to the new connector
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

Connector Generator checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@github-actions github-actions bot added the area/connectors Connector related issues label Aug 3, 2021
@gaart
Copy link
Contributor Author

gaart commented Aug 3, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1094300626
❌ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1094300626

@gaart gaart linked an issue Aug 5, 2021 that may be closed by this pull request
# extra_fields: no
# exact_order: no
# extra_records: yes
# incremental: # TODO if your connector does not implement incremental sync, remove this block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment why it is commented / not supported ( complex state or something else?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit complicated task, need further investigation how to implement it using graphql endpoints


dependencies {
implementation files(project(':airbyte-integrations:bases:source-acceptance-test').airbyteDocker.outputs)
implementation files(project(':airbyte-integrations:bases:base-python').airbyteDocker.outputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line (13th) is deprecated but still exists in template-generator, need to be removed (look on another connectors for example)

"todo-stream-name": {
"todo-field-name": "value"
}
}
Copy link
Contributor

@vovavovavovavova vovavovavovavova Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields should be similar to your state.json, not todo-x

"access_token": config["access_token"],
}

response = requests.get("https://graph.facebook.com/v11.0/me", params=params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from .metrics import PAGE_FIELDS, PAGE_METRICS, POST_FIELDS, POST_METRICS


class FacebookStream(HttpStream, ABC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably FacebookPagesStream is more clear, what do you think

return f"{self._page_id}/posts"

def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
records = response.json().get(self.data_field) or []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

records = response.json().get(self.data_field, [])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case self.data_field is empty (None, False, etc) - it's better to have [] as a result

"""

def path(self, **kwargs) -> str:
if self._page_id:
Copy link
Contributor

@vovavovavovavova vovavovavovavova Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not this parameter mandatory or what happen if it would not be provided?

@vovavovavovavova
Copy link
Contributor

vovavovavovavova commented Aug 5, 2021

  1. there are few questions in comments
  2. you need to ask Eugene / (anyone else with rights) to add credentials from lastpass to github to avoid missing secrets, like git CI shows now.
  3. you will need to upgrade source-definitions. See this comment and files changed ( 4 first files changed) as an example 🎉 Posthog: New Source  #3768 (review)
  4. also you will need to upgrade documentation (details are present in connector checklist and examples)

@vovavovavovavova
Copy link
Contributor

vovavovavovavova commented Aug 10, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1117793779
❌ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1117793779

@gaart
Copy link
Contributor Author

gaart commented Aug 11, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1121468832
❌ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1121468832

@yevhenii-ldv
Copy link
Contributor

yevhenii-ldv commented Aug 11, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1121696239
❌ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1121696239

@gaart
Copy link
Contributor Author

gaart commented Aug 12, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1122967786
❌ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1122967786

@yevhenii-ldv
Copy link
Contributor

yevhenii-ldv commented Aug 12, 2021

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1123974721
❌ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1123974721

@gaart
Copy link
Contributor Author

gaart commented Aug 13, 2021

/test connector=connectors/source-facebook-marketing

🕑 connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1126684195
❌ connectors/source-facebook-marketing https://github.com/airbytehq/airbyte/actions/runs/1126684195

@gaart gaart requested review from Phlair and tuliren August 25, 2021 10:45
@@ -89,7 +89,7 @@ jobs:
DRIFT_INTEGRATION_TEST_CREDS: ${{ secrets.DRIFT_INTEGRATION_TEST_CREDS }}
EXCHANGE_RATES_TEST_CREDS: ${{ secrets.EXCHANGE_RATES_TEST_CREDS }}
FACEBOOK_MARKETING_TEST_INTEGRATION_CREDS: ${{ secrets.FACEBOOK_MARKETING_TEST_INTEGRATION_CREDS }}
FACEBOOK_MARKETING_API_TEST_INTEGRATION_CREDS: ${{ secrets.FACEBOOK_MARKETING_API_TEST_INTEGRATION_CREDS }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one was unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a merged change made by @keu , commit #24f310de


class Post(FacebookPagesStream):
"""
API docs: https://developers.facebook.com/docs/graph-api/reference/post/,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this link mentions /{post-id} but doesn't mention {page-id}/posts - is that link documented anywhere? is it same as feed? if so then we need to handle pagination

Also seems to be a paginated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the same as the feed endpoint.
Updated, pagination added

def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
records = response.json().get(self.data_field) or []

for insights in records:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data contains insightResult nodes which don't seem to have an insights field. Why do we have this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insightResult contains a list of dicts with "insights" field which contains data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies if I'm being silly here but I don't see this in the insightResults documentation. Where is it defined?

def path(self, **kwargs) -> str:
return f'{self._page_id}/posts/?fields=insights.metric({",".join(POST_METRICS)})'

def request_params(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use DRY, this should be defined at the base class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

return {"access_token": self._access_token}

def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
data = response.json().get("insights")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic around data_field can be significantly simplified and DRY'd at the base class level. Data field should be the "path" in the record where the data field can be found. Then the top level parse_response should have a consistent impl that recursively goes into the record until it gets the desired field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.
PostInsigths class has a different logic for data_field, so for this class we have the parse_response method overridden

):
super().__init__(**kwargs)
self._access_token = access_token
self._start_date = start_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are not using this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]:
records = response.json().get(self.data_field) or []

for insights in records:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologies if I'm being silly here but I don't see this in the insightResults documentation. Where is it defined?

yield response.json()

else:
data_fields = self.data_field.split("_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we make data_field an array instead of doing string parsing? while reading the code it's not clear that data_field = "insights_data" is actually referring to a nested field. It's also problematic if the data field ever has an underscore in the name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this was not addressed

@gaart
Copy link
Contributor Author

gaart commented Aug 30, 2021

@sherifnada

apologies if I'm being silly here but I don't see this in the insightResults documentation. Where is it defined?

In insightResults, we can see the result for 1 metric. So, if we have 1 metric and 1 post, the result will be:

{
  "data": [
    {
      "insights":
        {
          "data": [insightResults]
          "paging": [],
          "id": "112704783733939_361771145493967"
    }

]

But in the request for Post Insights we have many posts and many metrics, so the INSIGHTS.DATA field contains a list of insightsResult objects for each metric, and the DATA field contains a list of all posts.

@gaart gaart requested a review from sherifnada August 30, 2021 14:34
@sherifnada
Copy link
Contributor

@gaart looks good, can you address the last comment about data field and run tests?

@gaart
Copy link
Contributor Author

gaart commented Aug 31, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1185032948
❌ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1185032948

@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 06:31 Inactive
@gaart
Copy link
Contributor Author

gaart commented Aug 31, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1185706769
✅ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1185706769

@jrhizor jrhizor temporarily deployed to more-secrets August 31, 2021 10:15 Inactive
@gaart
Copy link
Contributor Author

gaart commented Aug 31, 2021

@sherifnada done

@gaart
Copy link
Contributor Author

gaart commented Sep 1, 2021

/test connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1190847203
✅ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1190847203

@jrhizor jrhizor temporarily deployed to more-secrets September 1, 2021 15:47 Inactive
@gaart
Copy link
Contributor Author

gaart commented Sep 1, 2021

/publish connector=connectors/source-facebook-pages

🕑 connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1190885083
✅ connectors/source-facebook-pages https://github.com/airbytehq/airbyte/actions/runs/1190885083

@jrhizor jrhizor temporarily deployed to more-secrets September 1, 2021 15:58 Inactive
@gaart gaart merged commit 7361263 into master Sep 1, 2021
@gaart gaart deleted the gaart/1247-facebook-pages branch September 1, 2021 16:11
@@ -0,0 +1,21 @@
{
"documentationUrl": "https://docs.airbyte.io/integrations/sources/facebook-pages",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, this documentation is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a placeholder doc in #5847. Please populate the content later on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Facebook Pages
9 participants