Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry model on connection errors #119

Closed
infused-kim opened this issue Feb 24, 2021 · 18 comments
Closed

Retry model on connection errors #119

infused-kim opened this issue Feb 24, 2021 · 18 comments

Comments

@infused-kim
Copy link

Hey Guys,

Is it possible to implement re-trying of queries if the query fails due to non-model related errors?

For example, sometimes Azure SQL fails with errors like this:

[2021-02-24 05:50:02,777] {dbt_rpc.py:83} INFO - Failed model `base_signups`: Database Error in model base_thrive_leads_signups (models/02_base/base_signups.sql)
  ('08S01', '[08S01] [Microsoft][ODBC Driver 17 for SQL Server]TCP Provider: Error code 0x68 (104) (SQLExecDirectW)')
  compiled SQL at target/run/my_project/models/02_base/base_signups.sql

This is particularly frustrating in daily, automated runs through airlfow. I have solved it for now by adding retry logic through airflow that re-runs only failed and skipped models, but perhaps it would be worthwhile solving this at the dbt-sqlserver level for everyone?

Thanks, kim

@mikaelene
Copy link
Collaborator

Interesting. I have never had issues with that. Sounds like something to put in dbt-core? @jtcohen6 ?

@jtcohen6
Copy link
Contributor

@mikaelene There is retry logic within the connection object for some adapters, notably dbt-bigquery. Here we define which errors are intermittent, unlikely to recur, and thereby "retryable". Here is where those errors are caught, handled, and retried, up to the number of retries (docs) the user has defined:

I don't know what TCP Provider: Error code 0x68 is, what would cause it, or why it comes up intermittently, but all of that would be good to sort out before trying to handle it.

@majidaldo
Copy link

i was looking for a 'timeout' to wait until a connection is established.

@dataders
Copy link
Collaborator

i was looking for a 'timeout' to wait until a connection is established.

@majidaldo can you tell me more about what you're looking for with a timeout?

@majidaldo
Copy link

i was looking for a 'timeout' to wait until a connection is established.

@majidaldo can you tell me more about what you're looking for with a timeout?

On Azure SQL, you can have a db 'sleep'. It can take a few minutes to resume. In the meantime, connection attempts usually fail.

@dataders
Copy link
Collaborator

i was looking for a 'timeout' to wait until a connection is established.

@majidaldo can you tell me more about what you're looking for with a timeout?

On Azure SQL, you can have a db 'sleep'. It can take a few minutes to resume. In the meantime, connection attempts usually fail.

great point! this is something we've been thinking about w/ azure sql serverless as well. do you mind opening a separate issue for that scenario? I believe your ask is different than @infused-kim's.

@majidaldo
Copy link

i was looking for a 'timeout' to wait until a connection is established.

@majidaldo can you tell me more about what you're looking for with a timeout?

On Azure SQL, you can have a db 'sleep'. It can take a few minutes to resume. In the meantime, connection attempts usually fail.

great point! this is something we've been thinking about w/ azure sql serverless as well. do you mind opening a separate issue for that scenario? I believe your ask is different than @infused-kim's.

#162

@yduan-polo
Copy link

yduan-polo commented Jun 29, 2022

We've seen some intermittent database connection error with Redshift, something like

connection to server at "redshift cluster domain name", port 5439 failed: timeout expired

If there is way to configure retry for a specified times, that would be great!

@dataders
Copy link
Collaborator

@yduan-polo I think you want to bring up your error as an issue on the dbt-redshift adapter repo!

@jtcohen6
Copy link
Contributor

I think we've got one for just the thing: dbt-labs/dbt-redshift#96

(though the fix would more likely come in dbt-labs/dbt-core#5022)

@nysthee
Copy link

nysthee commented Aug 22, 2022

We've been encountering regular transient network issues as well which would be solved by a retry.
Are there any plans to support this as a profile parameter (like in Postgres / redshift) or not?

@dataders
Copy link
Collaborator

We've been encountering regular transient network issues as well which would be solved by a retry. Are there any plans to support this as a profile parameter (like in Postgres / redshift) or not?

we could! can you share the error messages you're seeing, as well as the SQL Server product that you're using (on-prem, Azure SQL, etc)

@nysthee
Copy link

nysthee commented Aug 23, 2022

Hi
This is azure sql.

Encountered an error while running operation: Database Error 123 ('28000', "[28000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Login failed for user ''. (18456) (SQLDriverConnect)")

@dataders
Copy link
Collaborator

Hi This is azure sql.

are you using serverless Azure SQL that auto-pauses? while frustrating, this is behavior & message we expect to see. Normally, ~30-60s later the database is resumed and the error message goes away.

Would you like for dbt to retry a connection every few seconds until the error is resumed. Also, if you do for example have the wrong password, would you like dbt to assume that the error is transient and keep trying every few seconds to re-connect?

not trying to be critical, i'm just narrowing down the use case

@nysthee
Copy link

nysthee commented Aug 25, 2022 via email

@sdebruyn
Copy link
Member

sdebruyn commented Oct 7, 2022

The dbt-sqlserver 1.2.0 release includes an option retries in the target configuration which should mitigate these kinds of issues. Let me know if that doesn't work for you so that I can reopen this issue.

@sdebruyn sdebruyn closed this as completed Oct 7, 2022
@alittlesliceoftom
Copy link

For reference for others finding this issue.

Code: https://github.com/dbt-msft/dbt-sqlserver/blob/master/dbt/adapters/sqlserver/sql_server_connection_manager.py

Inherited by synapse: https://github.com/microsoft/dbt-synapse/blob/master/dbt/adapters/synapse/synapse_connection_manager.py

Documented here: https://docs.getdbt.com/docs/core/connect-data-platform/mssql-setup

Does anyone have guidance on recommended number of retries? I was going to default to 3, but is there a tradeoff here?

@ka-weihe
Copy link

ka-weihe commented Jul 2, 2024

Does it actually retry after the initial connection for you guys?
#507

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants