Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement retries in BQ adapter #1963

Merged
merged 26 commits into from
Dec 12, 2019
Merged

Conversation

kconvey
Copy link
Contributor

@kconvey kconvey commented Nov 27, 2019

Uses the google.api_core.retry library to retry exceptions within the context of the exception handler, raising an exception to the handler once the error is unretryable, or configured/default retry quota has been exhausted.

As a side-effect of this, timeout is now correctly enforced.

#1579

@cla-bot cla-bot bot added the cla:yes label Nov 27, 2019
@kconvey
Copy link
Contributor Author

kconvey commented Dec 2, 2019

Could use a hand with the commented out assertion that the retry handling correctly logs. Not sure if it has to do with the test environment, but I'm getting:
E AssertionError: no logs of level INFO or higher triggered on dbt
These unit tests are passing for me locally.

I also noticed that since 0.14.2 or 0.14.3, there are a few extra hoops to jump through in initiating a BigqueryConnectionsManager (helpful for unit testing). Where before it was possible to initiate a connection manager with an empty dict, or a very simple credentials object, it seems like now code in query_headers breaks trying to access 'query_comment' through dot notation, possibly because no reasonable default is set (https://github.com/fishtown-analytics/dbt/blob/dev/0.15.1/core/dbt/adapters/base/query_headers.py#L95) Curious if there is a reasonable way to get back to easier connection manager creation, potentially making it possible to add unit tests for the connection manager.

Tagging @beckjake since the query_headers code was written by him.

@beckjake
Copy link
Contributor

beckjake commented Dec 3, 2019

Hey @kconvey - sorry, I've been away for a bit. I don't quite follow the problem here, but your mock isn't quite right. The object you pass to the connection manager __init__ should have at least two attributes - credentials and query_comment. Instead, you are passing in a credentials mock and giving it a query_comment attribute. So this might work better:

profile = Mock(credentials=credentials, query_comment=None)
self.connections = BigQueryConnectionManager(profile)

In our unit tests we tend to just use dicts and .from_dict() methods on these things to do this - see BaseTestBigQueryAdapter.get_adapter, for example.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this really great first pass @kconvey!

Couple of things:

  1. I think retries like this are really well suited for a decorator. I don't have particularly informed thoughts about which retry lib would be good to use here, or if we should write this decorator ourselves, but I largely think it would be cleaner and clearer to decorate retryable methods than to pass around function closures like you've done here. We can do this because each of these methods is idempotent and, with the exception of actually affecting some change in the database, these methods don't have any side effects. More info on this approach here: https://www.calazan.com/retry-decorator-for-python-3/

  2. I want to remove the create_view, create_bigquery_table, and create_date_partitioned_table methods from the BigQuery plugin. These are vestiges of a time before BigQuery supported create table|view .. as () statements and column partitioning! I think we should revert the changes in these methods and explicitly not support retries. We can additionally add a deprecation warning to show that these methods should not be used anymore and will be removed in a future release (maybe in a separate PR). Let me know if you feel strongly that we should not do that.

  3. I don't think we actually want to enforce the query_timeout here -- the default is 300s which will cause a lot of BigQuery projects to start failing for no reason. I don't really buy that per-model timeouts are a good idea, and I don't think I'd be in favor of implementing them for other databases that dbt supports. Instead, I think timeouts like these are better handled by orchestration tools at the level of a dbt invocation. If there is sufficient interest in supporting per-model timeouts, I'd rather support it via a model config and not a profile config. The timeout_seconds config as it exists today is pretty heavy-handed! So, I'd be in favor of retaining the previous behavior, inconsistent as it may be.

I just threw a lot at you here - let me know what you think about all of it :)

plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
Co-Authored-By: Drew Banin <drew@fishtownanalytics.com>
@kconvey
Copy link
Contributor Author

kconvey commented Dec 3, 2019

@drewbanin

I went ahead and implemented the small changes you suggested. I'm also comfortable deferring improving timeout for later. Happy to add a deprecation warning in a follow up PR (curious what form it would be best in: a simple comment, logger warning, or something else, but can sort that out later).

I did want to push back on the larger suggestion to refactor this as a decorator (for now), although I had initially been thinking along the same lines before the current proposed implementation. The problem(s) I see with doing this as a retry decorator are that:

  • You're retrying the entire method, which isn't necessary since the errors you want to retry are just coming from the code that touches the bigquery client & polls for results. The proposed solution is more granular in what it retries, which wastes less time retrying unrelated code (acquiring connection, etc.), and makes it more clear what you are retrying. This granularity is already the status quo by only running part of these methods within the exception handler.

  • You're retrying based on the exception that gets raised by the exception handler, not the bq client. This adds difficulty in determining whether the error raised was a retryable error before being filtered by the exception handler, while still having to raise the error-handled exception. In general this doesn't seem like an intuitive order in which to do retrying and error handling.

Exception handling seems like it should occur outside of retrying, when you're ready to handle the final exception after retrying. If retrying is done at a decorator level, I would think exception handling would then be another, outermost decorator to ensure it takes place after retrying. Doubling down on decorators seems less intelligible than the current function closure, and might require more complicated changes to exception handling.

To me, it makes more sense to maintain the granularity / status quo of the current exception handler (which has its advantages), and get this feature in, deferring some cleanup of both exception handling and retrying for later.

Curious what you think!

@kconvey
Copy link
Contributor Author

kconvey commented Dec 3, 2019

@beckjake I guess I'm wondering if it is possible to add a default None to query_comment somewhere like https://github.com/fishtown-analytics/dbt/blob/e51c942e91a94936f68f2965963d3b46f1257658/core/dbt/contracts/connection.py#L136

Based on my attempt to trace through this:
-Adapter gets init'd with a config
-That config gets passed to the ConnectionManager, where it becomes the profile field

-In test, what you're passing to adapter as config is config_from_parts_or_dicts()
-This config / profile has a credentials field and a query_comment field

Is there any reason the query_comment field can't default to None somewhere? It isn't clear to me where the contract for the config / profile to specifies that it needs a query_comment field.

@kconvey kconvey requested a review from drewbanin December 3, 2019 23:09
@dbt-labs dbt-labs deleted a comment from Sherm4nLC Dec 4, 2019
@drewbanin drewbanin requested review from beckjake and removed request for drewbanin December 9, 2019 19:02
@drewbanin
Copy link
Contributor

drewbanin commented Dec 9, 2019

@kconvey I buy that! I think you'll want to rebase this one against dev/0.15.1 :)

Adding Jake to review and follow up on query headers

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kconvey sorry, I never saw that question!

I suppose it's ok to add =None there, although it seems a little funky - I don't think you're supposed to mock out Protocols!

I have an alternative suggestion that only involves changing the unit tests, which I think resolves the issue more completely. Let me know what you think.

I also had a couple suggestions for the unit tests that I found while I was making sure my suggestion wasn't crazy!

Pylint also has a number of complaints about indentation and assigning lambdas to things. I know it's clunky, but can you just appease the beast?

test/unit/test_bigquery_adapter.py Outdated Show resolved Hide resolved
test/unit/test_bigquery_adapter.py Outdated Show resolved Hide resolved
test/unit/test_bigquery_adapter.py Outdated Show resolved Hide resolved
test/unit/test_bigquery_adapter.py Outdated Show resolved Hide resolved
@kconvey kconvey changed the base branch from dev/louisa-may-alcott to dev/0.15.1 December 9, 2019 21:48
kconvey and others added 2 commits December 9, 2019 17:09
Clean up retries unit test's connection manager mocking

Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
@kconvey
Copy link
Contributor Author

kconvey commented Dec 9, 2019

Tried to get all of the formatting changes, but may have missed some because we're using different linters.

@kconvey kconvey requested a review from beckjake December 9, 2019 23:48
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've kicked off tests again, I also suggested the 3 changes that pylint is still failing over. We'll see how the bigquery tests go on azure, at least.

plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
# with self.assertLogs(logger.name) as logs:
with self.assertRaises(DummyException):
self.connections._retry_and_handle(
"some sql", {'credentials': {'retries': 8}},
Copy link
Contributor

@beckjake beckjake Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a mock credentials object now, instead of a dict. Probably something like Mock(credentials=Mock(retries=8))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Comment on lines 335 to 338
# self.assertIn(
# 'WARNING:dbt:Retry attempt 1 of 8 after error: DummyException()',
# logs.output)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this commented-out code? You can use pytest's stdout capture stuff if you can get it working in the tests instead, but otherwise I wouldn't bother too much about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it.

kconvey and others added 4 commits December 10, 2019 12:55
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
@kconvey kconvey requested a review from beckjake December 11, 2019 15:55
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is the last reason tests fail, this will be good to go.

plugins/bigquery/dbt/adapters/bigquery/connections.py Outdated Show resolved Hide resolved
Co-Authored-By: Jacob Beck <beckjake@users.noreply.github.com>
@kconvey kconvey requested a review from beckjake December 11, 2019 17:47
Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the integration tests are still failing

@kconvey kconvey requested a review from beckjake December 11, 2019 20:22
@kconvey
Copy link
Contributor Author

kconvey commented Dec 11, 2019

Needed to add return statements when using defs instead of lambdas. Oops. Hopefully this is passing now. Thanks for bearing with me!

@beckjake
Copy link
Contributor

/azp run

1 similar comment
@beckjake
Copy link
Contributor

/azp run

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's up with azure here, but if this last try doesn't fix it I'm just going to merge this anyway.
Thanks for your contribution @kconvey, I'm excited to have this in dbt finally!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants