-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement retries in BQ adapter #1963
Conversation
Could use a hand with the commented out assertion that the retry handling correctly logs. Not sure if it has to do with the test environment, but I'm getting: I also noticed that since 0.14.2 or 0.14.3, there are a few extra hoops to jump through in initiating a BigqueryConnectionsManager (helpful for unit testing). Where before it was possible to initiate a connection manager with an empty dict, or a very simple credentials object, it seems like now code in query_headers breaks trying to access 'query_comment' through dot notation, possibly because no reasonable default is set (https://github.com/fishtown-analytics/dbt/blob/dev/0.15.1/core/dbt/adapters/base/query_headers.py#L95) Curious if there is a reasonable way to get back to easier connection manager creation, potentially making it possible to add unit tests for the connection manager. Tagging @beckjake since the query_headers code was written by him. |
Hey @kconvey - sorry, I've been away for a bit. I don't quite follow the problem here, but your mock isn't quite right. The object you pass to the connection manager
In our unit tests we tend to just use dicts and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this really great first pass @kconvey!
Couple of things:
-
I think retries like this are really well suited for a
decorator
. I don't have particularly informed thoughts about which retry lib would be good to use here, or if we should write this decorator ourselves, but I largely think it would be cleaner and clearer to decorate retryable methods than to pass around function closures like you've done here. We can do this because each of these methods is idempotent and, with the exception of actually affecting some change in the database, these methods don't have any side effects. More info on this approach here: https://www.calazan.com/retry-decorator-for-python-3/ -
I want to remove the
create_view
,create_bigquery_table
, andcreate_date_partitioned_table
methods from the BigQuery plugin. These are vestiges of a time before BigQuery supportedcreate table|view .. as ()
statements and column partitioning! I think we should revert the changes in these methods and explicitly not support retries. We can additionally add a deprecation warning to show that these methods should not be used anymore and will be removed in a future release (maybe in a separate PR). Let me know if you feel strongly that we should not do that. -
I don't think we actually want to enforce the
query_timeout
here -- the default is 300s which will cause a lot of BigQuery projects to start failing for no reason. I don't really buy that per-model timeouts are a good idea, and I don't think I'd be in favor of implementing them for other databases that dbt supports. Instead, I think timeouts like these are better handled by orchestration tools at the level of a dbt invocation. If there is sufficient interest in supporting per-model timeouts, I'd rather support it via a model config and not a profile config. Thetimeout_seconds
config as it exists today is pretty heavy-handed! So, I'd be in favor of retaining the previous behavior, inconsistent as it may be.
I just threw a lot at you here - let me know what you think about all of it :)
Co-Authored-By: Drew Banin <drew@fishtownanalytics.com>
I went ahead and implemented the small changes you suggested. I'm also comfortable deferring improving timeout for later. Happy to add a deprecation warning in a follow up PR (curious what form it would be best in: a simple comment, logger warning, or something else, but can sort that out later). I did want to push back on the larger suggestion to refactor this as a decorator (for now), although I had initially been thinking along the same lines before the current proposed implementation. The problem(s) I see with doing this as a retry decorator are that:
Exception handling seems like it should occur outside of retrying, when you're ready to handle the final exception after retrying. If retrying is done at a decorator level, I would think exception handling would then be another, outermost decorator to ensure it takes place after retrying. Doubling down on decorators seems less intelligible than the current function closure, and might require more complicated changes to exception handling. To me, it makes more sense to maintain the granularity / status quo of the current exception handler (which has its advantages), and get this feature in, deferring some cleanup of both exception handling and retrying for later. Curious what you think! |
@beckjake I guess I'm wondering if it is possible to add a default Based on my attempt to trace through this: -In test, what you're passing to adapter as config is Is there any reason the query_comment field can't default to |
@kconvey I buy that! I think you'll want to rebase this one against dev/0.15.1 :) Adding Jake to review and follow up on query headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kconvey sorry, I never saw that question!
I suppose it's ok to add =None
there, although it seems a little funky - I don't think you're supposed to mock out Protocol
s!
I have an alternative suggestion that only involves changing the unit tests, which I think resolves the issue more completely. Let me know what you think.
I also had a couple suggestions for the unit tests that I found while I was making sure my suggestion wasn't crazy!
Pylint also has a number of complaints about indentation and assigning lambdas to things. I know it's clunky, but can you just appease the beast?
Clean up retries unit test's connection manager mocking Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
Tried to get all of the formatting changes, but may have missed some because we're using different linters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kicked off tests again, I also suggested the 3 changes that pylint is still failing over. We'll see how the bigquery tests go on azure, at least.
test/unit/test_bigquery_adapter.py
Outdated
# with self.assertLogs(logger.name) as logs: | ||
with self.assertRaises(DummyException): | ||
self.connections._retry_and_handle( | ||
"some sql", {'credentials': {'retries': 8}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a mock credentials object now, instead of a dict. Probably something like Mock(credentials=Mock(retries=8))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
test/unit/test_bigquery_adapter.py
Outdated
# self.assertIn( | ||
# 'WARNING:dbt:Retry attempt 1 of 8 after error: DummyException()', | ||
# logs.output) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this commented-out code? You can use pytest's stdout capture stuff if you can get it working in the tests instead, but otherwise I wouldn't bother too much about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it.
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming this is the last reason tests fail, this will be good to go.
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the integration tests are still failing
Needed to add return statements when using defs instead of lambdas. Oops. Hopefully this is passing now. Thanks for bearing with me! |
/azp run |
1 similar comment
/azp run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's up with azure here, but if this last try doesn't fix it I'm just going to merge this anyway.
Thanks for your contribution @kconvey, I'm excited to have this in dbt finally!
Uses the google.api_core.retry library to retry exceptions within the context of the exception handler, raising an exception to the handler once the error is unretryable, or configured/default retry quota has been exhausted.
As a side-effect of this, timeout is now correctly enforced.
#1579