Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inject query comments (#1643) #1864

Merged
merged 8 commits into from
Nov 4, 2019
Merged

Conversation

beckjake
Copy link
Contributor

@beckjake beckjake commented Oct 24, 2019

Fixes #1643

In a user's dbt_project.yml, a new string field is accepted: query-comment. The string is used as the body of a macro that is evaluated with the following context:

  • dbt_version
  • env_var
  • modules
  • run_started_at
  • invocation_id
  • return
  • fromjson
  • tojson
  • log
  • var
  • target
  • all the macros that would be available in a model (even ones that will not work with the given context)!

It will be provided with two arguments:

  • connection_name, a string
  • node, which will either be null or a compiled/parsed node object.

The parsed macro is not available in any other context, and has no access to any macros.

Cleaned up contexts so things have a sensible inheritance pattern

  • it's probably not perfect but at least it's defined

query generator is an attribute on the connection manager

  • has a thread-local comment str
  • when acquiring a connection, set the comment str appropriately
    new 'connection_for' context manager: like connection_named, except also use the node to set the query string
    connection_named sets the query string using the connection name only.
  • has a special context

@cla-bot cla-bot bot added the cla:yes label Oct 24, 2019
@beckjake beckjake force-pushed the feature/query-comments branch 4 times, most recently from 82bd794 to fdef2f3 Compare October 25, 2019 16:10
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! The core query comment feature works great, and I feel so good about the changes you made to how the context is built up. Super nice work!

I have a couple of comments here, and the big one is around moving this config from profiles.yml to dbt_project.yml. My instinct is that this isn't a huge change, but definitely let me know if it's more involved than I'm picturing.

from dbt.contracts.graph.compiled import CompileResultNode


default_query_comment = '''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we push this into the global project? Or is that a bad idea?

Also, let's make the default super simple. Can we just include:

  • app='dbt'
  • dbt_version
  • node_id=node.unique_id
  • target_name=target.name

These other fields are going to be really helpful in specific use-cases, but I don't think they need to be provided in the default implementation.

Copy link
Contributor Author

@beckjake beckjake Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we push this into the global project? Or is that a bad idea?

I think we can realistically do this, or we can move it to dbt_project.yml, but not really both.

self.query_comment: str = initial

def add(self, sql: str) -> str:
# Make sure there are no trailing newlines.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could equivalently place this text inside of a block comment (/* ... */) to remove this logic. I'm not super concerned that someone would provide a closing block comment in one of these comment attributes. I think a block comment makes more sense here because the resulting json block in the SQL query will still be machine-readable.


class QueryStringSetter:
def __init__(self, config: HasCredentials):
if config.config.query_comment is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we grab the query comment string from a config called query_comment defined in the dbt_project.yml file instead of profiles.yml? My thinking here is twofold:

  • I imagine this is something users will want to version control for all dbt users (both human and machine).
  • If a user has multiple profiles defined in their dbt_project.yml file, then they'd be unable to use different query_comment configs for each profile.

@@ -255,7 +255,10 @@ def from_raw_profile_info(cls, raw_profile, profile_name, cli_vars,
target could not be found
:returns Profile: The new Profile object.
"""
# user_cfg is not rendered since it only contains booleans.
# user_cfg is not rendered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't totally follow what's going on here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • user_cfg doesn't just contain booleans, so I fixed the comment
  • this method is called directly by some unit tests, so it needs to try to extract the config from the raw_profile to get valid results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have changed when we moved the query comment from the profile into the project? If this is fine, then it does not offend me, just want to make sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it was wrong before! user_cfg still contains non-booleans (the printer width, for example).

run_started_at = None
invocation_id = None

if dbt.tracking.active_user is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could have sworn we made a change a while back to include invocation_id and run_started_at even if users have opted out of anonymous event tracking. Is it a substantial change to make that the case here? I imagine invocation_id and run_started_at will both be super useful in the context of query comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did, and we do. active_user is only None early on during initialization.


def get_target(self) -> Dict[str, Any]:
target = dict(
self.config.credentials.connection_info(with_aliases=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super smart to use connection_info here - nice call

@@ -45,7 +45,8 @@ def type(self):
return 'bigquery'

def _connection_keys(self):
return ('method', 'database', 'schema', 'location')
return ('method', 'database', 'schema', 'location', 'priority',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@drewbanin
Copy link
Contributor

drewbanin commented Oct 29, 2019 via email

@beckjake beckjake force-pushed the feature/query-comments branch from fdef2f3 to ae58199 Compare October 29, 2019 18:17
@beckjake beckjake requested a review from drewbanin October 29, 2019 18:47
@beckjake beckjake force-pushed the feature/query-comments branch 2 times, most recently from 3c377b3 to e6daba3 Compare October 30, 2019 17:18
Jacob Beck added 7 commits November 4, 2019 09:01
Make a fake "macro" that we parse specially with a single global context
Macro takes an argument (the node, may be none)
Users supply the text of the macro in their 'user_config' under a new 'query_comment'
No macros available
query generator is an attribute on the connection manager
 - has a thread-local comment str
 - when acquiring a connection, set the comment str
new 'connection_for' context manager: like connection_named, except also use the node to set the query string
Updated unit tests to account for query comments
Added a hacky, brittle integration test
  - log to a custom stream and read that
Trim down the "target" context value to use the opt-in connection_info
 - Make sure it contains a superset of the documented stuff
 - Make sure it does not contain any blacklisted items
Change some asserts to raise InternalExceptions because assert error messages in threads are useless
Add special handling to 'dbt debug' for this behavior
Rework the dependencies/requirements for adapters since they now require more of a config object
tests...
@beckjake beckjake force-pushed the feature/query-comments branch from df68f85 to ab9fcb4 Compare November 4, 2019 16:01
Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ship it!

@beckjake beckjake merged commit c4cd4fc into dev/louisa-may-alcott Nov 4, 2019
@beckjake beckjake deleted the feature/query-comments branch November 4, 2019 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Configurable query comments
2 participants