Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Facebook Marketing: improve sleeps time in rate limit handler #10698

Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ RUN pip install .
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]

LABEL io.airbyte.version=0.2.39
LABEL io.airbyte.version=0.2.40
LABEL io.airbyte.name=airbyte/source-facebook-marketing
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ class FacebookAPIException(Exception):
class MyFacebookAdsApi(FacebookAdsApi):
"""Custom Facebook API class to intercept all API calls and handle call rate limits"""

call_rate_threshold = 95 # maximum percentage of call limit utilization
pause_interval_minimum = pendulum.duration(minutes=1) # default pause interval if reached or close to call rate limit
MAX_RATE, MAX_PAUSE_INTERVAL = (95, pendulum.duration(minutes=5))
MIN_RATE, MIN_PAUSE_INTERVAL = (90, pendulum.duration(minutes=1))

@dataclass
class Throttle:
Expand Down Expand Up @@ -87,11 +87,17 @@ def _parse_call_rate_header(headers):

return usage, pause_interval

def compute_pause_interval(self, usage, pause_interval):
"""The sleep time will be calculated based on usage consumed."""
if usage >= self.MAX_RATE:
return max(self.MAX_PAUSE_INTERVAL, pause_interval)
return self.MIN_PAUSE_INTERVAL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return self.MIN_PAUSE_INTERVAL
return max(self.MIN_PAUSE_INTERVAL, pause_interval)

If the usage is lower than the max rate but the pause interval is higher than the default minimum one: don't we want to be conservative and pause for the time the API requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the usage is lower than the max rate but the pause interval is higher than the default minimum one: don't we want to be conservative and pause for the time the API requests?

for sure, the idea is to pause the requests taking into account the time interval that the response returns.


def handle_call_rate_limit(self, response, params):
if "batch" in params:
max_usage = 0
max_pause_interval = self.pause_interval_minimum
usage = 0
pause_interval = self.MIN_PAUSE_INTERVAL

if "batch" in params:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
usage = 0
pause_interval = self.MIN_PAUSE_INTERVAL
if "batch" in params:
if "batch" in params:
usage = 0
pause_interval = self.MIN_PAUSE_INTERVAL

for record in response.json():
# there are two types of failures:
# 1. no response (we execute batch until all inner requests has response)
Expand All @@ -100,21 +106,17 @@ def handle_call_rate_limit(self, response, params):
if "headers" not in record:
continue
headers = {header["name"].lower(): header["value"] for header in record["headers"]}
usage, pause_interval = self._parse_call_rate_header(headers)
max_usage = max(max_usage, usage)
max_pause_interval = max(max_pause_interval, pause_interval)

if max_usage > self.call_rate_threshold:
max_pause_interval = max(max_pause_interval, self.pause_interval_minimum)
logger.warning(f"Utilization is too high ({max_usage})%, pausing for {max_pause_interval}")
sleep(max_pause_interval.total_seconds())
usage_from_response, pause_interval_from_response = self._parse_call_rate_header(headers)
usage = max(usage, usage_from_response)
pause_interval = max(pause_interval_from_response, pause_interval)
else:
headers = response.headers()
usage, pause_interval = self._parse_call_rate_header(headers)
if usage > self.call_rate_threshold or pause_interval:
pause_interval = max(pause_interval, self.pause_interval_minimum)
logger.warning(f"Utilization is too high ({usage})%, pausing for {pause_interval}")
sleep(pause_interval.total_seconds())

if usage >= self.MIN_RATE:
sleep_time = self.compute_pause_interval(usage=usage, pause_interval=pause_interval)
logger.warning(f"Utilization is too high ({usage})%, pausing for {sleep_time}")
sleep(sleep_time).total_seconds()
vladimir-remar marked this conversation as resolved.
Show resolved Hide resolved

def _update_insights_throttle_limit(self, response: FacebookResponse):
"""
Expand Down
1 change: 1 addition & 0 deletions docs/integrations/sources/facebook-marketing.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ As a summary, custom insights allows to replicate only some fields, resulting in

| Version | Date | Pull Request | Subject |
| :--- | :--- | :--- | :--- |
| 0.2.40 | 2022-02-28 | [10698](https://github.com/airbytehq/airbyte/pull/10698) | Improve sleeps time in rate limit handler |
| 0.2.39 | 2022-03-09 | [10917](https://github.com/airbytehq/airbyte/pull/10917) | Retry connections when FB API returns error code 2 (temporary oauth error) |
| 0.2.38 | 2022-03-08 | [10531](https://github.com/airbytehq/airbyte/pull/10531) | Add `time_increment` parameter to custom insights |
| 0.2.37 | 2022-02-28 | [10655](https://github.com/airbytehq/airbyte/pull/10655) | Add Activities stream |
Expand Down