fix: retry `RESOURCE_EXHAUSTED` errors in `read_rows` #366

esert-g · 2021-12-31T00:30:03Z

BigQuery Storage Read API will start returning retryable RESOURCE_EXHAUSTED errors in 2022 when certain concurrency limits are hit, so this PR adds some code to handle them.

Tested with unit tests and system tests. System tests ran successfully on a test project that intentionally returns retryable RESOURCE_EXHAUSTED errors.

google/cloud/bigquery_storage_v1/reader.py

tswast · 2022-01-06T22:13:51Z

google/cloud/bigquery_storage_v1/reader.py

@@ -81,14 +83,12 @@ class ReadRowsStream(object):
    method to parse all messages into a :class:`pandas.DataFrame`.
    """

-    def __init__(self, wrapped, client, name, offset, read_rows_kwargs):
+    def __init__(
+        self, client, name, offset, read_rows_kwargs, retry_delay_callback=None


Technically a breaking change, though I don't expect anyone to be constructing a ReadRowsStream directly, except perhaps in unit tests.

We should have had this before, but please add a comment to this class's docstring like the following:

This object should not be created directly, but is returned by other methods in this library.

(Pulled from https://github.com/googleapis/python-pubsub/blob/main/google/cloud/pubsub_v1/futures.py)

Also, it's not clear to me why we need to make this breaking change to begin with.

Added the comment.

I removed the wrapped argument because ReadRowsStream assumes it will always get a valid stream with it. However, the RESOURCE_EXHAUSTED error can be the very first thing returned from the bq storage api server. If we keep the old api, we need to handle RESOURCE_EXHAUSTED errors in two different places. With this change we only need to handle it in one place.

tswast · 2022-01-06T22:16:12Z

google/cloud/bigquery_storage_v1/reader.py

@@ -106,6 +106,12 @@ def __init__(self, wrapped, client, name, offset, read_rows_kwargs):
            read_rows_kwargs (dict):
                Keyword arguments to use when reconnecting to a ReadRows
                stream.
+            retry_delay_callback (Optional[Callable[[float], None]]):


At first glance, I'm a bit confused why this parameter is necessary. Who needs a notification that we're going to sleep?

Is it just for unit testing? In that case, please remove this parameter and use the freezegun library, instead.

Users of the library may choose to provide this callback to be aware of delayed retries. When the users are aware of the delayed retry attempts, they can adjust their autoscaling algorithms. Apache Beam already does with the java sdk. My plan is to do the same with their python sdk.

tswast · 2022-01-06T22:19:58Z

google/cloud/bigquery_storage_v1/reader.py

    def _reconnect(self):
        """Reconnect to the ReadRows stream using the most recent offset."""
-        self._wrapped = self._client.read_rows(
+        return self._client.read_rows(


Won't this create a new ReadRowsStream?

Yes, this function doesn't do anything different than before. It just returns the stream instead of assigning to a member variable.

tswast · 2022-01-11T19:30:48Z

google/cloud/bigquery_storage_v1beta2/client.py

@@ -123,19 +130,12 @@ def read_rows(
            ValueError: If the parameters are invalid.
        """
        gapic_client = super(BigQueryReadClient, self)
-        stream = gapic_client.read_rows(


Note: A side-effect of this change is that some non-retryable errors won't happen until the user starts iterating through rows.

I would consider this a breaking change even more-so than the ReadRowsStream constructor change, as users who were expecting an exception now will get it later on.

I changed the code so read_rows is called in ReadRowsStream construction and any exception besides the retryable resource exhausted is propagated.

tswast · 2022-01-11T19:36:36Z

tests/unit/test_reader_v1.py

    # Don't reconnect on DeadlineException. This allows user-specified timeouts
    # to be respected.
-    mock_gapic_client.read_rows.assert_not_called()
+    mock_gapic_client.read_rows.assert_called_once()


Comment is out-of-date now. I believe this is the breaking change side-effect I mentioned earlier.

I'd prefer we don't break this behavior and find a way to implement the reconnect logic in BigQueryReadClient.read_rows and ReadRowsStream. Perhaps a helper function could be created to keep it DRY?

Removed comment. As mentioned above, behavior is the same as before.

tswast · 2022-01-12T19:58:04Z

tests/unit/test_reader_v1.py

-    # Don't reconnect on DeadlineException. This allows user-specified timeouts
-    # to be respected.
-    mock_gapic_client.read_rows.assert_not_called()


This was a rather important comment. I'd like to make sure we're still testing this behavior somehow (no reconnect on DeadlineException)

https://docs.python.org/3/library/unittest.mock.html#unittest.mock.Mock.reset_mock before line 211 may be helpful here, though possibly unnecessary if we move _reconnect out of the constructor.

read_rows will always be called once because that's what throws the DeadlineException. I updated the comment so I hope the intention is clear now.

tswast · 2022-01-12T20:01:52Z

google/cloud/bigquery_storage_v1/reader.py

        self._client = client
        self._name = name
        self._offset = offset
        self._read_rows_kwargs = read_rows_kwargs
+        self._retry_delay_callback = retry_delay_callback
+        self._reconnect()


Doing heavy work during construction/initialization time is a bit of an OO anti-pattern. I'd prefer we find another way to do this.

Removed this from construction and added it as an additional statement to wherever ReadRowsStream is constructed previously.

tswast · 2022-01-12T20:07:33Z

google/cloud/bigquery_storage_v1/reader.py

+            # ResourceExhausted errors are only retried if a valid
+            # RetryInfo is provided with the error.
+            # ResourceExhausted doesn't seem to have details/_details
+            # fields by default when it is generated by Python 3.6 unit
+            # tests, so we have to work around that.
+            # TODO: to remove this logic when we require
+            # google-api-core >= 2.2.0


Suggested change

# ResourceExhausted errors are only retried if a valid

# RetryInfo is provided with the error.

# ResourceExhausted doesn't seem to have details/_details

# fields by default when it is generated by Python 3.6 unit

# tests, so we have to work around that.

# TODO: to remove this logic when we require

# google-api-core >= 2.2.0

# ResourceExhausted errors are only retried if a valid

# RetryInfo is provided with the error.

#

# TODO: Remove hasattr logic when we require google-api-core >= 2.2.0.

# ResourceExhausted added details/_details in google-api-core 2.2.0.

Updated the comment as suggested.

BigQuery Storage Read API will start returning retryable RESOURCE_EXHAUSTED errors in 2022 when certain concurrency limits are hit, so this PR adds some code to handle them. Tested with unit tests and system tests. System tests ran successfully on a test project that intentionally returns retryable RESOURCE_EXHAUSTED errors.

esert-g requested a review from a team December 31, 2021 00:30

esert-g requested a review from a team as a code owner December 31, 2021 00:30

esert-g requested a review from loferris December 31, 2021 00:30

product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Dec 31, 2021

esert-g force-pushed the resource_exhausted branch 5 times, most recently from 167e3fc to 96caee6 Compare January 5, 2022 23:11

tswast self-requested a review January 5, 2022 23:54

esert-g force-pushed the resource_exhausted branch 4 times, most recently from 96caee6 to f6cbd60 Compare January 6, 2022 21:25

tswast requested changes Jan 6, 2022

View reviewed changes

tswast reviewed Jan 6, 2022

View reviewed changes

esert-g force-pushed the resource_exhausted branch 2 times, most recently from d2dddeb to 869914a Compare January 6, 2022 23:04

esert-g requested a review from tswast January 6, 2022 23:05

esert-g force-pushed the resource_exhausted branch 2 times, most recently from 41169fb to b207dc1 Compare January 7, 2022 23:24

parthea assigned tswast Jan 8, 2022

tswast requested changes Jan 11, 2022

View reviewed changes

esert-g force-pushed the resource_exhausted branch 3 times, most recently from a69b1ee to 3b97341 Compare January 12, 2022 02:08

esert-g requested a review from tswast January 12, 2022 02:35

tswast requested changes Jan 12, 2022

View reviewed changes

tswast reviewed Jan 12, 2022

View reviewed changes

esert-g force-pushed the resource_exhausted branch from 3b97341 to 214d80d Compare January 12, 2022 20:50

esert-g requested a review from tswast January 12, 2022 20:53

tswast approved these changes Jan 12, 2022

View reviewed changes

tswast added automerge Merge the pull request once unit tests and other checks pass. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Jan 12, 2022

tswast changed the title ~~Retryable RESOURCE_EXHAUSTED handling~~ fix: retry RESOURCE_EXHAUSTED errors in read_rows Jan 12, 2022

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jan 12, 2022

Merge branch 'main' into resource_exhausted

71c912e

tswast merged commit 33757d8 into googleapis:main Jan 12, 2022

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jan 12, 2022

release-please bot mentioned this pull request Jan 12, 2022

chore(main): release 2.11.0 #378

Merged

tswast mentioned this pull request Jan 13, 2022

chore: sync main into v3 branch googleapis/python-bigquery#1109

Merged

tswast mentioned this pull request Feb 17, 2022

Retry RESOURCE_EXHAUSTED errors googleapis/python-api-core#337

Open

kmjung mentioned this pull request May 16, 2022

Update google-cloud-bigquery-storage to require 2.11.0 or later apache/beam#17686

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: retry `RESOURCE_EXHAUSTED` errors in `read_rows` #366

fix: retry `RESOURCE_EXHAUSTED` errors in `read_rows` #366

esert-g commented Dec 31, 2021

tswast Jan 6, 2022

tswast Jan 6, 2022

esert-g Jan 6, 2022

tswast Jan 6, 2022

tswast Jan 6, 2022

esert-g Jan 6, 2022

tswast Jan 6, 2022

esert-g Jan 6, 2022

tswast Jan 11, 2022

esert-g Jan 12, 2022

tswast Jan 11, 2022

esert-g Jan 12, 2022

tswast Jan 12, 2022

tswast Jan 12, 2022

esert-g Jan 12, 2022

tswast Jan 12, 2022

esert-g Jan 12, 2022

tswast Jan 12, 2022

esert-g Jan 12, 2022

fix: retry RESOURCE_EXHAUSTED errors in read_rows #366

fix: retry RESOURCE_EXHAUSTED errors in read_rows #366

Conversation

esert-g commented Dec 31, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fix: retry `RESOURCE_EXHAUSTED` errors in `read_rows` #366

fix: retry `RESOURCE_EXHAUSTED` errors in `read_rows` #366