Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bigquery case sensitive caching issue (#1810) #1881

Merged
merged 5 commits into from
Nov 4, 2019

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Oct 31, 2019

Fixes #1810
Fixes #1887

Refactor Relations vs InformationSchemas to handle BQ better
Fix a bug where bigquery cached uppercase schema names wrong.
Use exists_ok, not_found_ok, and delete_contents flags on bigquery dataset operations instead of implementing them manually.

@cla-bot cla-bot bot added the cla:yes label Oct 31, 2019
@beckjake beckjake changed the title Fix/bigquery case sensitive Fix bigquery case sensitive caching issue (#1810) Oct 31, 2019
@beckjake beckjake force-pushed the fix/bigquery-case-sensitive branch 2 times, most recently from 1e9978c to 13c68c1 Compare November 1, 2019 17:01
@beckjake beckjake requested a review from drewbanin November 1, 2019 17:26
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! In testing, I noticed that mixed-case Relation identifiers are also not correctly identified in the cache lookup. Can we extend the logic here to also account for those? I tested this with a model named models/cAsE.sql

def check_schema_exists(self, database, schema):
return super().check_schema_exists(database, schema)
@available.parse(lambda *a, **k: False)
def check_schema_exists(self, database: str, schema: str) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is super clever!

@classmethod
def get_include_policy(cls, relation, information_schema_view):
schema = True
if information_schema_view in ('SCHEMATA', 'SCHEMATA_OPTIONS', None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is so good!

@beckjake beckjake force-pushed the fix/bigquery-case-sensitive branch 2 times, most recently from adfed29 to d5c7b0c Compare November 4, 2019 16:03
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment (let me know what you think) as I was scrolling through the code, but overall this looks great to me. Nice work!

@@ -294,7 +294,14 @@ def drop_dataset(self, database, schema):
client = conn.handle

with self.exception_handler('drop dataset'):
for table in client.list_tables(dataset):
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we replace this with:

client.delete_dataset(dataset_id, delete_contents=True, not_found_ok=True)

If the version of google-cloud-bigquery that we use supports it? If not, we can tackle it in a subsequent PR / release.

via the bq docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can set a minimum of 1.15.0 and be sure we have that flag available - it's hard to find any information about earlier versions.

Copy link
Contributor

@drewbanin drewbanin Nov 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - let's make that change in this PR then.

@beckjake beckjake requested a review from drewbanin November 4, 2019 18:03
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Jacob Beck added 5 commits November 4, 2019 12:23
Implement more things via macros
Refactor Relations vs InformationSchemas to handle BQ better
Fix a bug where bigquery cached uppercase schema names wrong
 - by using information_schema this just goes away :)
@beckjake beckjake force-pushed the fix/bigquery-case-sensitive branch from 1ea3cd7 to 670c26b Compare November 4, 2019 19:23
@beckjake
Copy link
Contributor Author

beckjake commented Nov 4, 2019

I had to rebase onto louisa-may-alcott and drop the setup.py changes - no actual code changes - because the query comments branch merged and it also had that change.

@beckjake beckjake force-pushed the fix/bigquery-case-sensitive branch from ffcf7af to 670c26b Compare November 4, 2019 20:41
@beckjake beckjake merged commit 31ca9a1 into dev/louisa-may-alcott Nov 4, 2019
@beckjake beckjake deleted the fix/bigquery-case-sensitive branch November 4, 2019 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use client.delete_dataset on BigQuery to drop datasets Make BigQuery cache lookups case-insensitive
2 participants