Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connector Facebook-Marketing: update insights streams with custom entries for fields, breakdowns and action_breakdowns #4864

Conversation

vladimir-remar
Copy link
Contributor

What

In our integration with the Facebook-Marketing connector when we trying to synchronize the API data from insights all attempts were unsuccessful (see logs on Notes). After experiencing the same type of behavior in Stitch, the solution is to request less fields to Facebook API. For thar reason, it is proposed is to add custom dynamic schemas to reduce the number of fields that are sent in each request. This means, users will be able to choose the fields that are requested and obtained in the response. After seeing the positive results both in Stitch and in a custom Facebook-Marketing connector created by us, we propose the following alternative.

How

The solution is inspired by the connector of Google-Analytics-Singer that adds "custom_reports" that are configured in the config. With this idea we propose the following:

  1. Add to the config an optional field custom_insights_fields that contains the fields separated by commas that we want to send in each request to the insights api.
  2. In the Streams file, create new classes (CustomAdsInsights, CustomAdsInsightsAgeAndGender, CustomAdsInsightsCountry, CustomAdsInsightsRegion, CustomAdsInsightsDma, CustomAdsInsightsPlatformAndDevice) with the prefix "Custom" to differentiate them from the rest and that these classes inherit the behavior of the master class AdsInsights and only varying the final resulting schema and fields in these new classes.
  3. In the Source file modify the streams method so that it is updated if the new custom_insights_fields field is configured.

Notes

I'm adding some logs and some captures.
error.log
Screenshot from 2021-07-20 17-45-24
Screenshot from 2021-07-20 17-46-15

I acknowledge this first commit is missing the Documentation updates and the docker version bump.

Pre-merge Checklist

Expand the checklist which is relevant for this PR.

Connector checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • Secrets are annotated with airbyte_secret in the connector's spec
  • Credentials added to Github CI if needed and not already present. instructions for injecting secrets into CI.
  • Unit & integration tests added as appropriate (and are passing)
    • Community members: please provide proof of this succeeding locally e.g: screenshot or copy-paste acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • /test connector=connectors/<name> command as documented here is passing.
    • Community members can skip this, Airbyters will run this for you.
  • Code reviews completed
  • Documentation updated
    • README.md
    • docs/SUMMARY.md if it's a new connector
    • Created or updated reference docs in docs/integrations/<source or destination>/<name>.
    • Changelog in the appropriate page in docs/integrations/.... See changelog example
    • docs/integrations/README.md contains a reference to the new connector
    • Build status added to build page
  • Build is successful
  • Connector version bumped like described here
  • New Connector version released on Dockerhub by running the /publish command described here
  • No major blockers
  • PR merged into master branch
  • Follow up tickets have been created
  • Associated tickets have been closed & stakeholders notified

Connector Generator checklist

  • Issue acceptance criteria met
  • PR name follows PR naming conventions
  • If adding a new generator, add it to the list of scaffold modules being tested
  • The generator test modules (all connectors with -scaffold in their name) have been updated with the latest scaffold by running ./gradlew :airbyte-integrations:connector-templates:generator:testScaffoldTemplates then checking in your changes
  • Documentation which references the generator is updated as needed.

@github-actions github-actions bot added the area/connectors Connector related issues label Jul 20, 2021
@sherifnada
Copy link
Contributor

@vladimir-remar would this be supplanted by the solution in #2227 ?

@vladimir-remar
Copy link
Contributor Author

@sherifnada Yes, but, depending on the implementation, the Facebook-Marketing connector will need some additional changes as the request to the insights API has to explicitly include the limited number of fields.
Since we do not know when solution to #2227 will be implemented, does it make sense to merge the solution I propose in the PR? The only way to synchronize the Facebook-Marketing data with our destination, as of now, is to implement a custom Facebook connector.

@vladimir-remar
Copy link
Contributor Author

@sherifnada any thoughts or suggestions?

@marcosmarxm marcosmarxm requested a review from keu August 24, 2021 17:33
@marcosmarxm
Copy link
Member

@keu is possible to review this contribution?

@marcosmarxm marcosmarxm self-assigned this Aug 27, 2021
@vladimir-remar
Copy link
Contributor Author

vladimir-remar commented Sep 20, 2021

@marcosmarxm Hi again.
After this time it makes sense to think of an approach more similar to the one proposed in this PR, as I said the only way we can synchronize the data from facebook is using the custom streams (at least for now).
I also take into account the #2227 but I don't know the status.

@keu
Copy link
Contributor

keu commented Sep 20, 2021

@vladimir-remar Insights is already pretty custom streams. As I remember after my last refactor it should be easy to make it fully custom. The only reason why I didn't is that it breaks backward compatibility.

You just need to move all hardcoded attributes to config and build instances of the InsightsStream in streams method.
So the end result will replace all insights streams with a list of configuration objects in the config.

{
"insights":
 [
   {
    "name": "text",
    "fields": [],
    "breakdowns": [],
    "action_breakdowns": [],
   }
 ]
}

@vladimir-remar
Copy link
Contributor Author

vladimir-remar commented Sep 20, 2021

@vladimir-remar Insights is already pretty custom streams. As I remember after my last refactor it should be easy to make it fully custom. The only reason why I didn't is that it breaks backward compatibility.

You just need to move all hardcoded attributes to config and build instances of the InsightsStream in streams method.
So the end result will replace all insights streams with a list of configuration objects in the config.

{
"insights":
 [
   {
    "name": "text",
    "fields": [],
    "breakdowns": [],
    "action_breakdowns": [],
   }
 ]
}

@keu Ok, for each Insights stream we will have a different configuration and if some insights were not in the json object we would have to define default values for the streams that are not included in this object, right?

@keu
Copy link
Contributor

keu commented Sep 21, 2021

@keu Ok, for each Insights stream we will have a different configuration and if some insights were not in the json object we would have to define default values for the streams that are not included in this object, right?

I see two options to keep existing insights streams:

  1. define existing insights streams as default streams in the streams method. Don't know how to handle naming conflict, probably with the last guy wins strategy.
  2. make a default value for an array of streams in the spec.json. Not sure if it is possible.

Plus come up with a good auto-naming strategy (breakdowns joined with _ probably)

@vladimir-remar
Copy link
Contributor Author

@keu Hello, I did some changes in order to improve the new approach, can you take a look?

@CLAassistant
Copy link

CLAassistant commented Sep 27, 2021

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@keu keu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job in general, just a few comments

@vladimir-remar vladimir-remar marked this pull request as ready for review September 29, 2021 11:13
@vladimir-remar vladimir-remar changed the title Connector Facebook-Marketing: update streams with custom streams Connector Facebook-Marketing: update insights streams with custom entries for fields, breakdowns and action_breakdowns Sep 29, 2021
@marcosmarxm
Copy link
Member

@vladimir-remar awesome work! thanks so much for this contribution. Small change, could you solve the conflict in source.py?

@keu can you final review this contribution?

@vladimir-remar
Copy link
Contributor Author

@vladimir-remar awesome work! thanks so much for this contribution. Small change, could you solve the conflict in source.py?

@marcosmarxm done!

Copy link
Contributor

@keu keu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@keu
Copy link
Contributor

keu commented Oct 4, 2021

@vladimir-remar don't forget to run ./gradlew format and tests

@vladimir-remar
Copy link
Contributor Author

@vladimir-remar Thanks. No need to bump, I will do it myself, just please add CHANGELOG to the facebook-marketing.md file.

@keu done!

@keu keu requested a review from sherifnada October 5, 2021 15:43
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Oct 5, 2021
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimir-remar thanks for the contribution! I certainly understand the value of having something like this that works well. Some points I'd like to make:

I'm wary of introducing a completely custom insights stream because the Facebook API is very volatile. Specifically: it is difficult to predict which inputs of fields/breakdowns/action_breakdowns are going to cause the insights sync to succeed or fail. For all of our current insights streams, we've done meticulous testing to make sure those requests succeed consistently. So the issue I'm worried about is a user might see this feature, try to use it, get a failure, and then assume the FB connector is unreliable and file support requests with Airbyte when really the issue is from the FB side. The error messages provided by the FB API tend to be pretty opaque too and unhelpful, e.g: Job Failed without any extra information.

This is all to say that, I think all of this should be made very very clear in the connector's documentation (docs/integrations/sources/facebook-marketing.md), that this is a "use with caution" or "experimental" feature. We should add a section about these custom insights streams in the docs describing all the above caveats.

Additionally, it would be great to have a way to tell the user ahead of time if their particular selection will work or not. For example, right now if they choose an incorrect field name, they will only know that problem at sync time not connector setup time.

WDYT?

insights_custom_streams = list()

for insight in insights:
args["name"] = insight.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this mutating the same config over and over, so in the end all the streams have the same config? can we copy the args instead? e.g:

arg_copy = copy.deepcopy(args)
arg_copy["fields"] = ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada the streams being created each time, so config never shared

@@ -126,3 +148,28 @@ def spec(self, *args, **kwargs) -> ConnectorSpecification:
),
),
)

def _update_insights_streams(self, insights, args, streams) -> List[Type[Stream]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the following approach for these custom streams:

  1. The "standard" streams offered by the connector are always available (if a user doesn't want them they can always just deselect them)
  2. If the user inputs any custom streams, they are named as custom_<user-input-name> and appended to the list of streams

This way it is very very obvious to the user what is happening. This is especially important as the connector's config is updated over time e.g: a user might call a stream ads_insights today, then remove it next week, and now the data represented in that stream is mixed with the "standard" ads insight stream and the "custom" insight stream. Making the suggestion above to make such situations much less likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the change in the lastest commit

@vladimir-remar
Copy link
Contributor Author

To @keu thank you for help me in the spec limitation, was very helpful.
To @sherifnada I added some changes.

  • A way to verify the insights before the sync, in the check part.
  • An update to the documentation, warning the use for users.

return True


def expand_local_ref(schema, resolver=None, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada thanks to @keu
With this, we can set properly the insights in the UI.

@vladimir-remar
Copy link
Contributor Author

@sherifnada after the latest changes, the user will only be allowed to configure the fields provided by Facebook, but the combination of these will be left to the user's choice.
WDYT?

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimir-remar thanks for adding the validations! just a couple of comments and I think we are good to go

@@ -26,6 +35,17 @@
)


class InsightConfig(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vladimir-remar did you validate that the json output by spec works with the UI via the instructions here? Just checking because I don't remember if we support a list of objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada we do actually :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada thanks to @keu, I attached some images from UI
Screenshot from 2021-10-14 11-40-23
It will be filled like this
Screenshot from 2021-10-14 11-42-16
And it will look like this with two elements
Screenshot from 2021-10-14 11-42-50
and from destination side
Screenshot from 2021-10-14 11-44-12

Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @vladimir-remar !

@keu could you assign to someone on the team for publishing?

@yevhenii-ldv
Copy link
Contributor

Hi @vladimir-remar, your changes was merged and published in PR #7158

@zestyping
Copy link
Contributor

This is awesome!! Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants