-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: retry failed query jobs in result()
#837
Changes from 22 commits
cc62448
61e1c38
093f9b4
b2ce74b
d930c83
bf7b4c6
0598197
c9644e9
2d0e067
0dcac01
026edba
0e764d2
c96d8b3
5058cb4
3c53172
eb51432
d6d958f
a05f75f
32e7050
3361a82
4c6ef5b
25b44a0
9ab84e4
d2bf840
4c004a5
be49c4b
9e4a011
204641b
bf051b0
b47244f
2377a1b
6b790e8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,3 +56,15 @@ def _should_retry(exc): | |
on ``DEFAULT_RETRY``. For example, to change the deadline to 30 seconds, | ||
pass ``retry=bigquery.DEFAULT_RETRY.with_deadline(30)``. | ||
""" | ||
|
||
|
||
@retry.Retry | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Default deadline is only 2 minutes. I expect we'll want more than that. Especially since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you want to use 4 minutes? 5? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's do 10 minutes since that was the recommendation the backend folks gave me. Should give us time for at least 2 tries retry, possibly 3. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
def DEFAULT_JOB_RETRY(exc): | ||
""" | ||
The default job retry object. | ||
""" | ||
if not hasattr(exc, "errors") or len(exc.errors) == 0: | ||
return False | ||
|
||
reason = exc.errors[0]["reason"] | ||
return reason == "rateLimitExceeded" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per: #707 (comment) there are certain I wonder if we should check if that's in the string representation of the exception? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the internal source code: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you mean There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can disregard the exception type and just look at the I mean the kind of job failure that customer encountered in #707 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# Copyright 2021 Google LLC | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# https://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import contextlib | ||
import threading | ||
import time | ||
|
||
import google.api_core.exceptions | ||
import google.cloud.bigquery | ||
import pytest | ||
|
||
|
||
def thread(func): | ||
thread = threading.Thread(target=func, daemon=True) | ||
thread.start() | ||
return thread | ||
|
||
|
||
@pytest.mark.parametrize("job_retry_on_query", [True, False]) | ||
def test_query_retry_539(bigquery_client, dataset_id, job_retry_on_query): | ||
""" | ||
Test job_retry | ||
|
||
See: https://github.com/googleapis/python-bigquery/issues/539 | ||
""" | ||
from google.api_core import exceptions | ||
from google.api_core.retry import if_exception_type, Retry | ||
|
||
table_name = f"{dataset_id}.t539" | ||
|
||
# Without a custom retry, we fail: | ||
with pytest.raises(google.api_core.exceptions.NotFound): | ||
bigquery_client.query(f"select count(*) from {table_name}").result() | ||
|
||
retry_notfound = Retry(predicate=if_exception_type(exceptions.NotFound)) | ||
|
||
job_retry = dict(job_retry=retry_notfound) if job_retry_on_query else {} | ||
job = bigquery_client.query(f"select count(*) from {table_name}", **job_retry) | ||
job_id = job.job_id | ||
|
||
# We can already know that the job failed, but we're not supposed | ||
# to find out until we call result, which is where retry happend | ||
assert job.done() | ||
assert job.exception() is not None | ||
|
||
@thread | ||
def create_table(): | ||
time.sleep(1) # Give the first retry attempt time to fail. | ||
with contextlib.closing(google.cloud.bigquery.Client()) as client: | ||
client.query(f"create table {table_name} (id int64)").result() | ||
|
||
job_retry = {} if job_retry_on_query else dict(job_retry=retry_notfound) | ||
[[count]] = list(job.result(**job_retry)) | ||
assert count == 0 | ||
|
||
# The job was retried, and thus got a new job id | ||
assert job.job_id != job_id | ||
|
||
# Make sure we don't leave a thread behind: | ||
create_table.join() | ||
bigquery_client.query(f"drop table {table_name}").result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want
None
to be a way to explicitly disable retries? It could be confusing if someone doesn't want jobs to retry for some reason.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll make None work to disable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done