-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add throttling counters in gcsio. #31584
Conversation
4fddba3
to
2735987
Compare
5de78a2
to
aae6bc6
Compare
aae6bc6
to
f6979cc
Compare
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, left a few comments
predicate=_should_retry, on_error=ThrottlingHandler()) | ||
|
||
|
||
def get_retry(pipeline_options): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need branch here, and I don't think we need to introduce another pipeline option.
If user want to opt out reactive throttling they can disable autoscaling, and I'm checking if there is a dataglow service option controls this
In general it's good to constraint the number of pipeline option especially the tuning ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There may be a data flow service flag to opt-out the throttling reactive autoscaling, confirming with internal team
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
I think the feature flag could be useful here, because users may find performance regression and would like to go back to the previous code path where there are throttling counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this is a valid concern. Fine with the flag. However because the throttling metrics is targeted to add to more IOs, we can figure out a generic solution later (I would backend support of dataflow service option is the best place)
self._num_retries += 1 | ||
sleep_seconds = self._get_next_wait_time() | ||
ThrottlingHandler._THROTTLED_SECS.inc(math.ceil(sleep_seconds)) | ||
time.sleep(sleep_seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__call__
here is executed in a callback thread. If is not the place to handle the actual sleep
The Retry
object itself has already internal exponential backoff. See cloud-storage-client's DEFAULT_RETRY, it only assigned a predicator to Retry, and the later handles the actual backoff when retry happens.
on_error is just a callback thread. If this is a blocking thread, current change will cause double slepp; if it is a non-blocking thread, it does not reflect the actual retry time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All we need is the actual sleep time at each retry. Is it possible to obtain from the call back (similar to here :
beam/sdks/python/apache_beam/io/gcp/bigquery.py
Line 1293 in 30cce44
def retry_delay_callback(delay): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the retry implementation in google-cloud-storage or google api-core supports such a callback. If we really need to accurately capture the retry time for throttling purpose, we may need to implement our own retry class.
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
Reimplement throttling counters after gcsio migration to google-cloud-storage client library.
The previous implementation is at
beam/sdks/python/apache_beam/io/gcp/gcsio_overrides.py
Line 54 in bb044f4