Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command to auto-populate profiles.yml #3462

Closed
NiallRees opened this issue Jun 16, 2021 · 5 comments · Fixed by #3625
Closed

Add command to auto-populate profiles.yml #3462

NiallRees opened this issue Jun 16, 2021 · 5 comments · Fixed by #3625
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! init Issues related to initializing the dbt starter project

Comments

@NiallRees
Copy link
Contributor

NiallRees commented Jun 16, 2021

Describe the feature

This may have been suggested already but I couldn't find a duplicate issue from a quick search.

Configuring the profiles.yml is non-beginner friendly, and creating instructions on how to do so is complicated especially due to the differences between Mac and Windows environments. Introducing a dbt command which would add a profile to profiles.yml (and create the file if needed) using the name from the dbt_project.yml would be extremely useful. In addition, it is typical that only the schema, username and password are unique between individual profiles.ymls, so a default set of values specified in e.g. dbt_project.yml could reduce the number of options required to be entered.

Describe alternatives you've considered

Creating my own script to do this.

Who will this benefit?

Anyone who introduces dbt to new users frequently and struggles to write a simple and comprehensive set of instructions for configuring a profiles.yml.

Are you interested in contributing this feature?

With some guidance, sure.

@NiallRees NiallRees added enhancement New feature or request triage labels Jun 16, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 21, 2021

Introducing a dbt command which would add a profile to profiles.yml (and create the file if needed) using the name from the dbt_project.yml would be extremely useful

Right on—I think this is what dbt init should do!

In fact, it already does a few of these things:

To get this where we'd want it to be (perhaps more controversial):

  • init already takes a project_name as its argument (dbt init newproj), but today it only uses it to name the new file directory. In addition, we might find a way to pass the project name to both the name and profile parameters of dbt_project.yml. (Today, the former is always my_new_project and the latter is always default. Not good!)
  • init should add a new profile, also named using profile_name, to profiles.yml—even if the file already exists, so long as a profile by that name doesn't yet exist.

I'm curious to get your thoughts on those pieces in particular.

Adjacent to the above:

Configuring the profiles.yml is non-beginner friendly

I agree! We're planning to give the syntax here another look (#1958) ahead of releasing v1.0 later this year. My plan would be a more intuitive syntax going forward, with backwards compatibility for the current syntax, of course.

In additional, it is typical that only the schema, username and password are unique between individual profiles.ymls, so a default set of values specified in e.g. dbt_project.yml could reduce the number of options required to be entered.

I don't think dbt_project.yml is the right place to store any credential-y information, since it's checked into version control. I do think part of our profile syntax rethink should include the ability to set default profile-level values, and override them per target. Today, that's sort of possible with YAML anchors, but it's more peer-to-peer than true hierarchical inheritance—and talk about a tricky syntax.

@jtcohen6 jtcohen6 added init Issues related to initializing the dbt starter project and removed triage labels Jun 21, 2021
@NiallRees
Copy link
Contributor Author

NiallRees commented Jun 22, 2021

Thanks for your always thorough response @jtcohen6.

To be clear (and apologies if you've understood this already), I see this is a separate function to creating a dbt project. The use-case is where a dbt project already exists, and we're onboarding a new user. As I see currently, this is a distinct function from dbt init.

The workflow I'm imagining:

  1. The user cds into their newly cloned dbt project from their organisation
  2. They run a magical command dbt setup_my_profile (but more imaginatively named!)
  3. They are prompted to provide only the unique parameters to their workflow, being e.g. username and password if Snowflake or OAuth for BigQuery etc.
  4. Something similar to dbt debug runs automatically to confirm their connection
  5. They can dbt away and not even know that profiles.yml exists.

Are we on the same page?

If these profile defaults aren't stored in dbt_project.yml then perhaps we could introduce a new, committed file in the project which holds default profile parameters common to all users such as adapter type, account name, etc. I understand the concern of committing credentials, though I think we could allow the user to set any profile configuration through this new default file if they wish to even if we advise them not to.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jun 24, 2021

To be clear (and apologies if you've understood this already), I see this is a separate function to creating a dbt project. The use-case is where a dbt project already exists, and we're onboarding a new user.

Ah, I totally missed this piece. Ok, I'm with you - thanks for clarifying.

As I see currently, this is a distinct function from dbt init.

This is what I'm less sure about. In my view, dbt init is actually two things today:

  • A way to scaffold file structures for new projects
  • A way to set up local machines for using dbt, i.e. by creating ~/.dbt/profiles.yml

So I do see both of these within the scope of a (fancier, more powerful, more interactive) dbt init command, which would include prompts for:

  • Do you want to create a new project or use an existing one? (perhaps inferred from whether dbt_project.yml is in the current working directory)
  • Do you want to add a profile to ~/.dbt/profiles.yml? (whether or not the file already exists; infer the profile name from the preexisting or just-created project name)
  • What's your account ID, username, password, etc?
  • Do you want to test the connection? (i.e. run dbt debug)

To accomplish this, we could rework dbt init to use a more rigorous library for interactive setup, e.g. cookiecutter. I think I've had this conversation four times in the past week :) (cc @iknox-fa @kwigley)

Two reservations:

  • We should be cautious about hiding from users the fact that their password is being stored, in plaintext, in a local file. @leahwicz made this point, and it's a really good one. I'm not convinced that "not even know that profiles.yml exists" should be one of our goals here; we should seek to make it as easy and streamlined as possible, but not entirely invisible.
  • I'm still really hesitant to have dbt cross the Rubicon of storing credential information outside profiles.yml / inside canonically version-controlled files. I totally get the convenience of a committed file/structured object that stores "standard" connection parameters so they can be consistent across colleagues (Redshift host, Snowflake account, etc). I've done a manual version of this before by means of some README instructions, a checked-in samples.profiles.yml, or by means of environment variables.

@NiallRees
Copy link
Contributor Author

NiallRees commented Jun 25, 2021

So I do see both of these within the scope of a (fancier, more powerful, more interactive) dbt init command, which would include prompts for:

Do you want to create a new project or use an existing one? (perhaps inferred from whether dbt_project.yml is in the current working directory)
Do you want to add a profile to ~/.dbt/profiles.yml? (whether or not the file already exists; infer the profile name from the preexisting or just-created project name)
What's your account ID, username, password, etc?
Do you want to test the connection? (i.e. run dbt debug)

This would be great.

On reservation 1 - great point, interestingly AWS don't do this with aws configure, only the documentation explains where your credentials are being stored vs CLI output. Explicit is good though, and a CLI output along the lines of Your profile for <dbt_project_name> has been configured in <file_path>. sounds good.

On 2. - I'm not suggesting we should commit credentials, but that standard parameters should be configurable in a committed file. Moving a step beyond using a samples.profiles.yml or listing these values in the README and instead having them autopopulated into profiles.yml from some committed file when running our new dbt init command would be excellent.

Is this likely to get on the roadmap in the next couple of months given you've discussed this several times now? If so - I'll leave it with you, otherwise I'd be interested to take it on as a contribution.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Jul 6, 2021

Is this likely to get on the roadmap in the next couple of months given you've discussed this several times now?

All yours! We've done everything init-related that we're considering must-have ahead of v1.0. I'd love to see you have a go with cookiecutter, if you're up for it :)

@jtcohen6 jtcohen6 added the good_first_issue Straightforward + self-contained changes, good for new contributors! label Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good_first_issue Straightforward + self-contained changes, good for new contributors! init Issues related to initializing the dbt starter project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants