Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse query comment and use as bigquery job labels. #3145

Merged
merged 10 commits into from
Mar 22, 2021
1 change: 1 addition & 0 deletions core/dbt/contracts/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,7 @@ def to_target_dict(self):
class QueryComment(dbtClassMixin):
comment: str = DEFAULT_QUERY_COMMENT
append: bool = False
job_label: bool = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config is query-comment, rather than query_comment (docs). Not exactly sure why we did this, but I guess kebab casing is common in dbt_project.yml / project-level configs. I think this should be job-label (instead of job_label) for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I switched the base class to HyphenatedDbtClassMixin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I may not understand what this base class does, since that change broke tests that I thought were unrelated. Reverted for now, but let me know if you have thoughts about the right way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at this, and I think I found a subtle bug in the casing logic. Should be fixed now.



class AdapterRequiredConfig(HasCredentials, Protocol):
Expand Down
33 changes: 29 additions & 4 deletions plugins/bigquery/dbt/adapters/bigquery/connections.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import json
import re
from contextlib import contextmanager
from dataclasses import dataclass
from functools import lru_cache
Expand Down Expand Up @@ -305,12 +307,24 @@ def raw_execute(self, sql, fetch=False, *, use_legacy_sql=False):

logger.debug('On {}: {}', conn.name, sql)

job_params = {'use_legacy_sql': use_legacy_sql}
labels = {}

if active_user:
job_params['labels'] = {
'dbt_invocation_id': active_user.invocation_id
}
labels['dbt_invocation_id'] = active_user.invocation_id

if self.profile.query_comment.job_label:
try:
comment_labels = json.loads(
self.query_header.comment.query_comment
)
labels.update({
_sanitize_label(key): _sanitize_label(str(value))
for key, value in comment_labels.items()
})
except (TypeError, ValueError):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the comment isn't valid json, should we still include it as a single job label (called query_comment), truncated to 128 characters if need be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I changed it.


job_params = {'use_legacy_sql': use_legacy_sql, 'labels': labels}

priority = conn.credentials.priority
if priority == Priority.Batch:
Expand Down Expand Up @@ -573,3 +587,14 @@ def _is_retryable(error):
e['reason'] == 'rateLimitExceeded' for e in error.errors):
return True
return False


_SANITIZE_LABEL_PATTERN = re.compile(r"[^a-z0-9_-]")


def _sanitize_label(value: str, max_length: int = 63) -> str:
"""Return a legal value for a BigQuery label."""
value = value.lower()
value = _SANITIZE_LABEL_PATTERN.sub("_", value)
value = value[: max_length - 1]
return value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anecdotally, this has the effect of appending an extra underscore at the end of my non-dict comment, e.g. comment: whatever in dbt_project.yml becomes query_comment: whatever_ in BigQuery, and comment: something else becomes query_comment: something_else_. Is that intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this is happening--there may be something upstream of this code adding trailing whitespace, because I don't get that behavior testing the function directly. I tried adding a strip(). Does that help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the strip() fixed it!

Screen Shot 2021-03-09 at 10 39 33 AM

(The numerous _ in the top example are all special characters.)