-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-248] Implement retryable errors for Spark/Databricks #293
Comments
Hey @grindheim, is there a reason this issue has been closed? I'm currently using |
@pgoslatara No, there was just no feedback at all for 26 days, so figured I'd just close it and possibly reopen it for dbt-databricks. But reopening it now. |
Thanks @grindheim! I'd really like to see this implemented, I'm not sure I have the time or the knowledge to undertake this but keeping this issue open for now may allow someone else jump in with a solution. |
@grindheim Thanks for opening the issue, and @pgoslatara thanks for the prompt to keep it open! Apologies for the delay in response from us. We're revamping the way we triage and maintain adapter plugin repositories. I think the topic is a good one; intermittent errors frustrate many users, and implementing retry on the adapter/connection level is the right way to go. The question is whether these errors are cropping up while:
We already have "naive" retry implemented for initial connection opening, in two different ways, if
dbt-spark/dbt/adapters/spark/connections.py Lines 534 to 543 in d7f1d38
We don't have any retry implemented during query execution. I think adding a set of retryable errors is a good idea, if we can confirm that they are consistently intermittent for all users. If that guarantee of consistency proves impossible, we could also pursue the approach recommended in dbt-labs/dbt-core#3303, whereby users can "bring their own" retryable error statuses (list of exceptions defined in |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Describe the bug
When using dbt-spark, we randomly but somewhat frequently experience some errors that could be handled by detecting them as retryable errors.
Based on our logs, I believe at least the following errors could be considered retryable. The models usually always runs successfully the next time they're run.
They're listed in order from most frequently experienced to least:
Ideally one could check whether the error message contains any of the above strings, and if so retry the query a number of times like in the BigQuery implementation - see comment from jtcohen6 linking to the implementation for BigQuery:
dbt-msft/dbt-sqlserver#119 (comment)
Steps To Reproduce
Seeing that these issues happen randomly, it's difficult to list a set of steps that will consistently reproduce the issues.
Expected behavior
If any of the listed errors happen, the connector should retry the given model X number of times, where ideally X is defined in the profile like for the BigQuery adapter (https://docs.getdbt.com/reference/warehouse-profiles/bigquery-profile/#retries).
Screenshots and log output
N/A
System information
The output of
dbt --version
:The operating system you're using:
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
The output of
python --version
:Python 3.9.10
Additional context
N/A
The text was updated successfully, but these errors were encountered: