Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add throttling counters in gcsio. #31584

Closed
wants to merge 4 commits into from

Conversation

shunping
Copy link
Contributor

@shunping shunping commented Jun 13, 2024

Reimplement throttling counters after gcsio migration to google-cloud-storage client library.

The previous implementation is at

cls._THROTTLED_SECS.inc(math.ceil(sleep_seconds))

@shunping shunping marked this pull request as draft June 13, 2024 00:43
@shunping shunping force-pushed the gcsio-throttling-counter branch 3 times, most recently from 4fddba3 to 2735987 Compare June 13, 2024 01:42
@shunping shunping force-pushed the gcsio-throttling-counter branch 2 times, most recently from 5de78a2 to aae6bc6 Compare June 13, 2024 02:44
@shunping
Copy link
Contributor Author

R: @Abacn and andrewsg

Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@shunping shunping marked this pull request as ready for review June 13, 2024 15:22
Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few comments

predicate=_should_retry, on_error=ThrottlingHandler())


def get_retry(pipeline_options):
Copy link
Contributor

@Abacn Abacn Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need branch here, and I don't think we need to introduce another pipeline option.

If user want to opt out reactive throttling they can disable autoscaling, and I'm checking if there is a dataglow service option controls this

In general it's good to constraint the number of pipeline option especially the tuning ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be a data flow service flag to opt-out the throttling reactive autoscaling, confirming with internal team

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

I think the feature flag could be useful here, because users may find performance regression and would like to go back to the previous code path where there are throttling counters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is a valid concern. Fine with the flag. However because the throttling metrics is targeted to add to more IOs, we can figure out a generic solution later (I would backend support of dataflow service option is the best place)

self._num_retries += 1
sleep_seconds = self._get_next_wait_time()
ThrottlingHandler._THROTTLED_SECS.inc(math.ceil(sleep_seconds))
time.sleep(sleep_seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__call__ here is executed in a callback thread. If is not the place to handle the actual sleep

The Retry object itself has already internal exponential backoff. See cloud-storage-client's DEFAULT_RETRY, it only assigned a predicator to Retry, and the later handles the actual backoff when retry happens.

on_error is just a callback thread. If this is a blocking thread, current change will cause double slepp; if it is a non-blocking thread, it does not reflect the actual retry time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All we need is the actual sleep time at each retry. Is it possible to obtain from the call back (similar to here :

def retry_delay_callback(delay):
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the retry implementation in google-cloud-storage or google api-core supports such a callback. If we really need to accurately capture the retry time for throttling purpose, we may need to implement our own retry class.

@shunping shunping marked this pull request as draft June 15, 2024 00:29
Copy link
Contributor

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Aug 14, 2024
Copy link
Contributor

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants