-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bigquery case sensitive caching issue (#1810) #1881
Conversation
1e9978c
to
13c68c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! In testing, I noticed that mixed-case Relation identifiers are also not correctly identified in the cache lookup. Can we extend the logic here to also account for those? I tested this with a model named models/cAsE.sql
def check_schema_exists(self, database, schema): | ||
return super().check_schema_exists(database, schema) | ||
@available.parse(lambda *a, **k: False) | ||
def check_schema_exists(self, database: str, schema: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is super clever!
@classmethod | ||
def get_include_policy(cls, relation, information_schema_view): | ||
schema = True | ||
if information_schema_view in ('SCHEMATA', 'SCHEMATA_OPTIONS', None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so good!
adfed29
to
d5c7b0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor comment (let me know what you think) as I was scrolling through the code, but overall this looks great to me. Nice work!
@@ -294,7 +294,14 @@ def drop_dataset(self, database, schema): | |||
client = conn.handle | |||
|
|||
with self.exception_handler('drop dataset'): | |||
for table in client.list_tables(dataset): | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we replace this with:
client.delete_dataset(dataset_id, delete_contents=True, not_found_ok=True)
If the version of google-cloud-bigquery
that we use supports it? If not, we can tackle it in a subsequent PR / release.
via the bq docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can set a minimum of 1.15.0
and be sure we have that flag available - it's hard to find any information about earlier versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - let's make that change in this PR then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implement more things via macros Refactor Relations vs InformationSchemas to handle BQ better Fix a bug where bigquery cached uppercase schema names wrong - by using information_schema this just goes away :)
1ea3cd7
to
670c26b
Compare
I had to rebase onto louisa-may-alcott and drop the |
ffcf7af
to
670c26b
Compare
Fixes #1810
Fixes #1887
Refactor Relations vs InformationSchemas to handle BQ better
Fix a bug where bigquery cached uppercase schema names wrong.
Use
exists_ok
,not_found_ok
, anddelete_contents
flags on bigquery dataset operations instead of implementing them manually.