Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt init Interactive profile creation #3625

Merged
merged 22 commits into from
Oct 20, 2021
Merged

dbt init Interactive profile creation #3625

merged 22 commits into from
Oct 20, 2021

Conversation

NiallRees
Copy link
Contributor

@NiallRees NiallRees commented Jul 25, 2021

Resolves #3462

Overhaul dbt init to interactively create a profile within profiles.yml based on:

  1. profile_template.yml if configured within the project
  2. The chosen target's target_options.yml and user input
  3. The chosen target's sample_profiles.yml profile

in descending order of preference. The existing profiles.yml is updated using the current dbt_project.yml's profile name in the case of 2. and 3.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@cla-bot cla-bot bot added the cla:yes label Jul 25, 2021
@NiallRees NiallRees temporarily deployed to Redshift July 25, 2021 17:17 Inactive
@NiallRees NiallRees temporarily deployed to Redshift July 25, 2021 17:17 Inactive
@NiallRees NiallRees temporarily deployed to Bigquery July 25, 2021 17:17 Inactive
@NiallRees NiallRees temporarily deployed to Bigquery July 25, 2021 17:17 Inactive
@NiallRees NiallRees temporarily deployed to Snowflake July 25, 2021 17:17 Inactive
@NiallRees NiallRees temporarily deployed to Snowflake July 25, 2021 17:17 Inactive
@NiallRees NiallRees changed the title Initial dbt init Interactive profile creation Jul 25, 2021
@NiallRees NiallRees marked this pull request as draft July 25, 2021 17:17
@NiallRees
Copy link
Contributor Author

This is very much draft @jtcohen6 but would nevertheless appreciate your review on A. the behaviour and B. the implementation approach. I'm struggling to work out how to easily run the debug task from within the init task, if you have any ideas on that too! The task interfaces seem to be designed only to be run from the main.py context.

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NiallRees This is slick!!

I left some comments below where I was working through each step in my mind. Then I took a step back, thought through the ideal flow, and threw together a quick flowchart (whimsical link):

Screen Shot 2021-08-10 at 10 31 49 AM

What do you think? In particular, I'd love to:

  • Prefer click.prompt() over CLI flags/args wherever possible
  • Include backwards-compatible behavior for the old way (sample_profiles.yml), while adapter plugin maintainers upgrade to the new way (target_options.yml)
  • Ensure that dbt init never fails or raises an error. It's fine if it needs to skip steps due to missing information

core/dbt/task/init.py Show resolved Hide resolved
core/dbt/task/init.py Outdated Show resolved Hide resolved
core/dbt/main.py Outdated Show resolved Hide resolved
core/dbt/task/init.py Outdated Show resolved Hide resolved
core/dbt/task/init.py Outdated Show resolved Hide resolved
core/dbt/task/init.py Outdated Show resolved Hide resolved
core/dbt/task/init.py Outdated Show resolved Hide resolved
plugins/snowflake/dbt/include/snowflake/target_options.yml Outdated Show resolved Hide resolved
@NiallRees
Copy link
Contributor Author

@NiallRees This is slick!!

I left some comments below where I was working through each step in my mind. Then I took a step back, thought through the ideal flow, and threw together a quick flowchart (whimsical link):

Screen Shot 2021-08-10 at 10 31 49 AM

What do you think? In particular, I'd love to:

  • Prefer click.prompt() over CLI flags/args wherever possible
  • Include backwards-compatible behavior for the old way (sample_profiles.yml), while adapter plugin maintainers upgrade to the new way (target_options.yml)
  • Ensure that dbt init never fails or raises an error. It's fine if it needs to skip steps due to missing information

I think yes to all

I believe this addresses all three of your points. dbt init now takes no CLI arguments/flags and prompts as needed. If a target_options.yml is unavailable, sample_profiles.yml will be used. One key change on the latter is that it is no longer copied wholesale, but can be added to an existing profiles.yml. Curious for your thoughts on that.

@@ -1,17 +1,16 @@
import dbt.exceptions
from typing import Any, Dict, Optional
import yaml
import yaml.scanner
import oyaml as yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced yaml with oyaml in order to retain ordering when prompting for user input. Rather than keep both I just replaced every reference with oyaml. It may well be preferred that we leave all other imports as-is, let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwigley do you have any thoughts here?

Copy link
Contributor

@jtcohen6 jtcohen6 Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the switch from yaml to oyaml has broken one highly specific integration test, which checks the sorting behavior of the toyaml Jinja context method. The sort_keys argument is not being respected by oyaml.safe_dump. Glad we have a test for it!

https://github.com/dbt-labs/dbt/blob/3789acc5a7b3f71b4e333ac6e235c62ee0c957f5/test/integration/013_context_var_tests/tests/to_yaml.sql#L5

https://github.com/dbt-labs/dbt/blob/3789acc5a7b3f71b4e333ac6e235c62ee0c957f5/core/dbt/context/base.py#L416

I think my preference would probably be to avoid switching from yaml to oyaml wherever possible. I'm also wondering if there's another way we can preserve prompt order for target_options.yml, even if it requires an extra attribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem my reasoning for using oyaml was misplaced - I've removed it and apart from some changes to the order of dumped yaml in profiles.yml the rest of the behaviour is identical. So the order of the questions to the user is still the order of the keys in the target_options.yml. Hooray!

@NiallRees NiallRees requested a review from jtcohen6 August 15, 2021 21:38
@NiallRees NiallRees marked this pull request as ready for review August 15, 2021 21:39
@NiallRees
Copy link
Contributor Author

NiallRees commented Aug 15, 2021

Ready for your 👀 again @jtcohen6. Once I've got a thumbs up on what this is doing I'll get some integration tests written up. Still curious for your thoughts on running dbt debug automatically following the profile setup, and your ideas on the best way to do that.

@jtcohen6
Copy link
Contributor

Great work here @NiallRees! It's so cool to see this coming together, and in a way that's honestly fun to use.

Some misc feedback:

  • Let's overwrite the name and profile in dbt_project.yml with the input to What is the desired project name? Otherwise, the profile created for new projects is always called default. (It will also resolve Use init argument as project name in dbt_project.yml #3677 by the by.) Simplest would be to move dbt_project.yml out from the starter project file and into a templated f-string; I'm sure there's a better way, that still manages to preserve the comments/hints.
  • Numeric profile fields (threads, port) are wrapped in single quotes, so these targets fail on validation. Is there a way to avoid this? Check the type of user inputs?
  • I'm unsure about appending the contents of sample_profiles.yml to profiles.yml, for adapters that do not yet support target_options.yml. It messes up the YAML formatting right now (I'm sure that's fixable), but also, many sample profiles make use of comments (e.g. dbt-presto's, spark's), and those would not be copied, leaving entries like method: none with no explanation, or both entries from an either/or choice.
  • The desired behavior around init+debug is basically just dbt init && dbt debug, right? I'm sure you could lightly refactor the debug task's methods, so that the init task could import and call test_connection()... but what do you think about an even simpler solution? In our log messages to say Profile ... written to ..., we could also include a note like, Run "dbt debug" to validate your new connection profile.

@leahwicz I'd be curious to get your take on this, since you and I did some work on init recently. Namely:

  • We'll need to make an engineering judgment on whether the yamloyaml replacement across the board is something we'd be comfortable with; oyaml describes itself as a drop-in replacement for PyYAML that preserves dict ordering.
  • Let's think about the "programmatic" entry-points to init, e.g. for the dbt Cloud IDE (which doesn't use the real init task today, but for consistency's sake, I'd feel better if it could). Do we need to preserve a "streamlined," CLI flag-based version of init that just scaffolds the basic file structure, using a provided project name, and doesn't worry about profiles.yml at all?

core/dbt/task/init.py Outdated Show resolved Hide resolved
@NiallRees
Copy link
Contributor Author

  • Let's think about the "programmatic" entry-points to init, e.g. for the dbt Cloud IDE (which doesn't use the real init task today, but for consistency's sake, I'd feel better if it could). Do we need to preserve a "streamlined," CLI flag-based version of init that just scaffolds the basic file structure, using a provided project name, and doesn't worry about profiles.yml at all?

@jtcohen6 @leahwicz Just so I know where to head - should this PR need to concern itself with this, given that you've said dbt Cloud doesn't use the init task currently?

@NiallRees NiallRees requested a review from jtcohen6 October 4, 2021 12:42
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NiallRees Amazing work :)

I only managed to find one bug, which is (I think) a quick fix in the create_profile_using_profile_template method.

@leahwicz I think this is ready for review from an engineer on the Core team! The big question for me is around leveraging dbt.clients.system methods in lieu of of direct file access.

else:
return True

def create_profile_using_profile_template(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I believe this method needs profile_name passed as an argument, from the run entry point
  • This method will raise an exception if the template file is improperly formatted (e.g. missing a top-level key). It's tough to debug, since dbt doesn't log anything. What do you think of putting the call to create_profile_using_profile_template in a try/except that falls back to standard profile prompting/creation if it fails for any reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It retrieves the profile_name from profile_template.yml, I'm not sure it needs to be passed from the run entry point - there is a possibility that profile_template.yml's profile name differs from that of the project.
  2. Good idea

@@ -2,12 +2,12 @@
# Name your project! Project names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: 'my_new_project'
name: '{project_name}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

to connect to your database. You can find this file by running:

{open_cmd} {profiles_path}
Your new dbt project "{project_name}" was created!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(low priority)

This message feels like lot of text after the interactive bite-sized chunks. I wonder if we can do something cool with click, to either:

  • clear the terminal, leaving only the welcome message
  • prompting with each item one by one, so that the user "acknowledges" each: project created!, here's a link to the docs, need help?, happy modeling!

@@ -1,17 +1,16 @@
import dbt.exceptions
from typing import Any, Dict, Optional
import yaml
import yaml.scanner
import oyaml as yaml
Copy link
Contributor

@jtcohen6 jtcohen6 Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the switch from yaml to oyaml has broken one highly specific integration test, which checks the sorting behavior of the toyaml Jinja context method. The sort_keys argument is not being respected by oyaml.safe_dump. Glad we have a test for it!

https://github.com/dbt-labs/dbt/blob/3789acc5a7b3f71b4e333ac6e235c62ee0c957f5/test/integration/013_context_var_tests/tests/to_yaml.sql#L5

https://github.com/dbt-labs/dbt/blob/3789acc5a7b3f71b4e333ac6e235c62ee0c957f5/core/dbt/context/base.py#L416

I think my preference would probably be to avoid switching from yaml to oyaml wherever possible. I'm also wondering if there's another way we can preserve prompt order for target_options.yml, even if it requires an extra attribute

return False
logger.debug(f"No sample profile found for {adapter}.")
else:
with open(sample_profiles_path, "r") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, we probably want to lead on the dbt.clients.system module for cross-OS support / error handling. I'm thinking about load_file_contents and write_file in particular... though we'll need to adjust that method to support the r+ and a modes you're using here.

cc @leahwicz: I'm pretty fuzzy on this stuff, so you should correct me if I'm wrong here. My sense is, the more we can lean on clients.system methods for all file operations, the better-served we'll be in a world with storage adapters etc. Alternatively, we could simply say that since init is a CLI-only task, we don't care about needing to support it outside of the local file system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So two things being unpacked here:

1: Should we use dbt.clients.system for file read/writes that will work well across platforms?
This makes sense, although I fear that we kind of re-invented the wheel when we made that module. Python 3.4+ contains the pathlib module which does the same thing (and a lot more). Assuming you only need basic read/writes I would highly suggest using that instead of dbt.clients.system

2: Does it make sense to support SAs here once they're fully available?
Probably not. I guess there might be an edge case where a user would want to bootstrap an adapter skeleton into a remote filesystem or something... but it's the edgiest of edge cases.

TL;DR pathlib FTW!

Comment on lines +307 to +316
self.copy_starter_repo(project_name)
os.chdir(project_name)
with open("dbt_project.yml", "r+") as f:
content = f"{f.read()}".format(
project_name=project_name,
profile_name=project_name
)
f.seek(0)
f.write(content)
f.truncate()
Copy link
Contributor

@jtcohen6 jtcohen6 Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about the "programmatic" entry-points to init, e.g. for the dbt Cloud IDE (which doesn't use the real init task today, but for consistency's sake, I'd feel better if it could). Do we need to preserve a "streamlined," CLI flag-based version of init that just scaffolds the basic file structure, using a provided project name, and doesn't worry about profiles.yml at all?

@jtcohen6 @leahwicz Just so I know where to head - should this PR need to concern itself with this, given that you've said dbt Cloud doesn't use the init task currently?

I think you've done it! This snippet of code is exactly what we'd want in the "programmatic" version. I could imagine wrapping this into a create_project_files method, re-adding a flag to init like --starter-project-only, and then supporting this workflow as:

    def run(self):
        """Entry point for the init task."""
        if self.args.starter_project_only:
            project_name = self.args.starter_project_only
            self.create_project_files(project_name)
            return
            
        ... otherwise proceed interactively ...
dbt init --starter-project-only my_cool_project_name

Or, the equivalent interface into the dbt-core API ;)

@leahwicz What do you think? I don't think we need to make that change in this PR, but I'm satisfied knowing it would be a small lift.

@NiallRees NiallRees requested a review from jtcohen6 October 6, 2021 21:45
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go from my perspective. Amazing work, @NiallRees—we simply must have this in v1.0. The next big undertaking will be writing really clear documentation :)

@leahwicz Could we have a member of the Core team give this a code review? I'm particularly interested in acceptable use of native python file interaction, vs. accessing all local files via dbt.clients.system module methods. I'll feel much more comfortable giving final approval + merging after an engineer has taken a look.

@kwigley kwigley self-requested a review October 12, 2021 12:57
Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NiallRees It's a pleasure and a privilege to approve this PR. On behalf of a few thousand future initializers — thank you!

@jtcohen6 jtcohen6 merged commit 11436fe into dbt-labs:main Oct 20, 2021
@NiallRees NiallRees deleted the nw/interactive_profile_creation branch October 20, 2021 16:41
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Initial

* Further dev

* Make mypy happy

* Further dev

* Existing tests passing

* Functioning integration test

* Passing integration test

* Integration tests

* Add changelog entry

* Add integration test for init outside of project

* Fall back to target_options.yml when invalid profile_template.yml is provided

* Use built-in yaml with exception of in init

* Remove oyaml and fix tests

* Update dbt_project.yml in test comparison

* Create the profiles directory if it doesn't exist

* Use safe_load

* Update integration test

Co-authored-by: Jeremy Cohen <jeremy@dbtlabs.com>

automatic commit by git-black, original commits:
  11436fe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add command to auto-populate profiles.yml
4 participants