Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: raise a custom exception if 400 BadRequest is encountered due to "internal error during execution" #23

Closed
bencaine1 opened this issue Feb 12, 2019 · 14 comments
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. wontfix This will not be worked on

Comments

@bencaine1
Copy link

OS: Linux dc32b7e8763a 4.9.0-6-amd64 googleapis/google-cloud-python#1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02) x86_64 x86_64 x86_64 GNU/Linux
Python version: Python 2.7.6
google-cloud-bigquery: 1.8.0

We're getting flaky 400 BadRequest errors on our query jobs. We've been seeing this issue for a while on and off, but last night starting at around 7pm we saw a spike in these failures.

These errors are not caught by the default Retry objects because 400 usually signifies a malformed query or a missing table, rather than a transient error.

A fix might be to add a clause catching 400s with this exact error message to _should_retry at https://github.com/googleapis/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/retry.py#L30 and/or RETRY_PREDICATE at https://github.com/googleapis/google-cloud-python/blob/master/api_core/google/api_core/future/polling.py#L32.

Code example

from google.api_core.future import polling
from google.cloud.bigquery import retry as bq_retry

query_job = self.gclient.query(query, job_config=config, retry=bq_retry.DEFAULT_RETRY.with_deadline(max_wait_secs))
query_job._retry = polling.DEFAULT_RETRY.with_deadline(max_wait_secs)
return query_job.result(timeout=max_wait_secs)

Stack trace

One example:

  File "/opt/conda/lib/python2.7/site-packages/verily/bigquery_wrapper/bq.py", line 108, in _wait_for_job
    return query_job.result(timeout=max_wait_secs)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2762, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 703, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 122, in result
    self._blocking_poll(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2736, in _blocking_poll
    super(QueryJob, self)._blocking_poll(timeout=timeout)
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 101, in _blocking_poll
    retry_(self._done_or_raise)()
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/future/polling.py", line 80, in _done_or_raise
    if not self.done():
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2723, in done
    location=self.location,
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 672, in _get_query_results
    retry, method="GET", path=path, query_params=extra_params
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/bigquery/client.py", line 382, in _call_api
    return call()
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 270, in retry_wrapped_func
    on_error=on_error,
  File "/opt/conda/lib/python2.7/site-packages/google/api_core/retry.py", line 179, in retry_target
    return target()
  File "/opt/conda/lib/python2.7/site-packages/google/cloud/_http.py", line 319, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 GET https://www.googleapis.com/bigquery/v2/projects/packard-campbell-synth/queries/9bcea2cb-1747-4a1e-9ac8-e1de40f00d08?timeoutMs=10000&location=US&maxResults=0: The job encountered an internal error during execution and was unable to complete successfully.
@tseaver
Copy link
Contributor

tseaver commented Feb 12, 2019

@tswast I'm pretty sure that this is a back-end issue: the API shouldn't be returning '400 Bad Request' for internal server errors. Can you confirm?

@tswast
Copy link
Contributor

tswast commented Feb 12, 2019

@shollyman Is this related to another BigQuery backend rollout?

@tswast
Copy link
Contributor

tswast commented Feb 12, 2019

I agree that 400 Bad Request is the wrong response code for this error.

@tswast
Copy link
Contributor

tswast commented Feb 12, 2019

I've filed bug 124319762 internally to track this issue. I see several similar reports internally, so this is likely not new behavior.

@bencaine1 If you have a support plan, I recommend filing a ticket with them to raise the priority of this issue on the BigQuery backend.

@tseaver
Copy link
Contributor

tseaver commented Feb 12, 2019

@tswast Do you want to leave this issue open (i.e., do you imagine we will be making changes here to work around the 400?).

@tswast
Copy link
Contributor

tswast commented Feb 12, 2019

Let's close this. The client workaround would be to look for certain text in the response body and ignore the response code, which I'd prefer not to do if we can avoid it.

@tswast tswast closed this as completed Feb 12, 2019
@barrywhart
Copy link

I saw this error again today. Has the underlying BigQuery issue been fixed? Is there another new issue with the same symptom?

If this bug continues to recur with various BigQuery bugs, I think there is (sadly) a case for having the client retry, because otherwise the non-Google customer application becomes responsible for retrying. That seems ... worse.

@tswast tswast reopened this Nov 22, 2019
@tswast
Copy link
Contributor

tswast commented Nov 22, 2019

Backend issue was closed as infeasible. Backend engineers say:

Connection error is the only retryable error returned as a job result, so you only need to add the logic to retry jobs.insert() if jobs.getQueryResult() returns 400 and the error reason set to the "jobBackendError" (which means that the job failed with the connection error), but I don't think you can reuse the job id for this case.

Since the job ID cannot be reused, this error is one that requires the whole job to be retried from the beginning. I think it's reasonable to do this, though will likely a bit difficult to do, as the failure won't be discovered until .result() is called.

@tswast tswast changed the title BigQuery: flaky 400 BadRequest errors BigQuery: retry queries from the beginning if 400 BadRequest is encountered due to "internal error during execution" Nov 22, 2019
@plamut plamut transferred this issue from googleapis/google-cloud-python Feb 4, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 4, 2020
@plamut plamut added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Feb 4, 2020
@pietrodn
Copy link

If the error reason is "jobBackendError", it should be definitely be included in the BigQuery error table, so that developers can deal with it appropriately.

@pietrodn
Copy link

If the job ID is not reusable, and it is not possible to retry the job from within the library, google-cloud-bigquery could catch this BadRequest exception and re-raise is as an InternalServerError exception, so that the application code using the library can easily retry the whole class of server errors transparently.

@HemangChothani
Copy link
Contributor

it is not possible to retry the job from within the library

@pietrodn It is possible to retry the job from within the library, for that need to use client.create_job method which creates a new 'job ID' on every retry.

def create_job(self, job_config, retry=DEFAULT_RETRY, timeout=None):

@tswast
Copy link
Contributor

tswast commented Dec 1, 2020

google-cloud-bigquery could catch this BadRequest exception and re-raise is as an InternalServerError exception,

I think this is a reasonable feature request.

@tswast tswast changed the title BigQuery: retry queries from the beginning if 400 BadRequest is encountered due to "internal error during execution" BigQuery: raise a custom exception if 400 BadRequest is encountered due to "internal error during execution" Dec 1, 2020
@tswast
Copy link
Contributor

tswast commented Dec 2, 2020

Some requirements for a custom exception:

  • Backwards compatible -- inherit from the Google API error base class
  • Preserves stacktrace -- use the exception wrapping mechanism to preserve all the context from the original exception.
  • Clear that it's a BigQuery-related custom exception from the name. Example: BigQueryServerError.
  • The message of this exception includes a code snippet on how to recreate the query job (to retry it) with Client.create_job.

@shollyman shollyman added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Aug 29, 2022
@chalmerlowe chalmerlowe added the wontfix This will not be worked on label Nov 2, 2022
@chalmerlowe
Copy link
Collaborator

At this point, going to close this item as "Will not fix" due to competing priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

9 participants