-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source Facebook Marketing: improve sleeps time in rate limit handler #10698
🎉 Source Facebook Marketing: improve sleeps time in rate limit handler #10698
Conversation
@vladimir-remar it's possible to run a sync comparing the current method and the method proposed by you? Or, there is any docs in Facebook maybe explaining about the best approach to do rate limit waiting? |
Hello @marcosmarxm Thanks for the answer. We did a sync with (ads_insights, ads_insights_country and ads_insights_platform_and_device), I attached the log, as you will see, with my suggestion the sync will keeps around 90%-94% of usage pausing 1min between requests until usage is low. This increase the success chances otherwise it will fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vladimir-remar, thank you for this contribution.
I made several suggestions 😄, let me know what you think!
Could you please also:
- bump the connector version in the dockerfile and in
airbyte-config/init/src/main/resources/seed/source_definitions.yaml
- update the changelog in
docs/integrations/sources/facebook-marketing.md
call_rate_threshold = 95 # maximum percentage of call limit utilization | ||
min_call_rate_threslold = 90 # minimum percentage of call limit utilization | ||
pause_interval_minimum = pendulum.duration(minutes=1) # default pause interval if reached or close to call rate limit | ||
pause_interval_minimum_at_max_usage = pendulum.duration(minutes=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call_rate_threshold = 95 # maximum percentage of call limit utilization | |
min_call_rate_threslold = 90 # minimum percentage of call limit utilization | |
pause_interval_minimum = pendulum.duration(minutes=1) # default pause interval if reached or close to call rate limit | |
pause_interval_minimum_at_max_usage = pendulum.duration(minutes=5) | |
MAX_RATE, MAX_PAUSE_INTERVAL = (95, pendulum.duration(minutes=5)) | |
MIN_RATE, MAX_PAUSE_INTERVAL = (90, pendulum.duration(minutes=1) |
@@ -104,17 +112,22 @@ def handle_call_rate_limit(self, response, params): | |||
max_usage = max(max_usage, usage) | |||
max_pause_interval = max(max_pause_interval, pause_interval) | |||
|
|||
if max_usage > self.call_rate_threshold: | |||
if max_usage >= self.call_rate_threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if max_usage >= self.call_rate_threshold: | |
max_rate, max_sleep_duration = self. min_max_rate_sleep_time["max"] | |
min_rate, min_sleep_duration = self. min_max_rate_sleep_time["min"] | |
if max_usage >= max_rate: |
airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/api.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/api.py
Outdated
Show resolved
Hide resolved
@@ -87,6 +89,12 @@ def _parse_call_rate_header(headers): | |||
|
|||
return usage, pause_interval | |||
|
|||
@staticmethod | |||
def sleep_time_handler(usage, pause_interval: pendulum.duration): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the definition of the custom pause interval according to usage
should be determined in this function and not in handle_call_rate_limit
.
I'd rename it to compute_pause_interval
and make it return the total seconds to sleep. Then I'd call sleep with this value from handle_call_rate_limit
. I think it would greatly improve the readability and maintainability of this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere Thanks, I did some changes, tell me what you think.
Hello @alafanechere let me know if there is something I can do to improve this PR, this change would be a big step for us. |
/test connector=connectors/source-facebook-marketing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @vladimir-remar for the changes, I still have an important suggestion. Could you please also bump the version in the Dockerfile?
airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/api.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/api.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/api.py
Outdated
Show resolved
Hide resolved
@vladimir-remar could you please fix the conflicts and share the output of your acceptance tests: |
Hello @alafanechere the problem here is the quota from the credentials that I have is exhauted, but even with quota I can't inccour in a full refresh run. I'm open to suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small fix and it's ready to merge!
airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/api.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes you made @vladimir-remar. I'm going to run the acceptance tests and merge if they pass.
usage = 0 | ||
pause_interval = self.MIN_PAUSE_INTERVAL | ||
|
||
if "batch" in params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
usage = 0 | |
pause_interval = self.MIN_PAUSE_INTERVAL | |
if "batch" in params: | |
if "batch" in params: | |
usage = 0 | |
pause_interval = self.MIN_PAUSE_INTERVAL |
/test connector=connectors/source-facebook-marketing
|
I'm requesting reviews from @sherifnada and @misteryeo because this connector is targeted for GA. |
just a comment, this change solves the problem. I saw some clients getting their sync finished because of 100% quota limit and not respecting the correct backoff time. The only drawback is: increase in time to finish the sync. thanks for this contribution @vladimir-remar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladimir-remar could you add unit tests for this functionality please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @vladimir-remar I wrote some unit tests for the functions you implemented.
Feel free to review these tests and add more cases if needed.
I have a remaining question about the _compute_pause_interval
function that I added in the review comment.
"""The sleep time will be calculated based on usage consumed.""" | ||
if usage >= self.MAX_RATE: | ||
return max(self.MAX_PAUSE_INTERVAL, pause_interval) | ||
return self.MIN_PAUSE_INTERVAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self.MIN_PAUSE_INTERVAL | |
return max(self.MIN_PAUSE_INTERVAL, pause_interval) |
If the usage is lower than the max rate but the pause interval is higher than the default minimum one: don't we want to be conservative and pause for the time the API requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the usage is lower than the max rate but the pause interval is higher than the default minimum one: don't we want to be conservative and pause for the time the API requests?
for sure, the idea is to pause the requests taking into account the time interval that the response returns.
/test connector=connectors/source-facebook-marketing
|
@alafanechere thanks for add the unitests |
@sherifnada thanks to @alafanechere who already did. |
/test connector=connectors/source-facebook-marketing
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your quick reply @vladimir-remar ! I'd love a final review from our connector team.
/publish connector=connectors/source-facebook-marketing
|
Thank you for your contribution and patience @vladimir-remar ! |
What
In our Facebook Marketing connection when we want to do a full refresh sync, this probably will take more than 1 day , the connector reach the rate limite often and it seems the default sleep time it is not enough.
I attached a image when the connector reaches the 100%
How
Set sleeps time to 1min between 90%-95% of usage and set to 5min when the usage is greater than or equal 95%
🚨 User Impact 🚨
Increase of time in connection runs, mostly in a full refresh sync mode, due to this new approach.
Pre-merge Checklist
Expand the relevant checklist and delete the others.
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described hereTests
Unit
Put your unit tests output here.
Integration
Put your integration tests output here.
Acceptance
Put your acceptance tests output here.