Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt init Interactive profile creation #3625

Merged
merged 22 commits into from
Oct 20, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions core/dbt/clients/yaml_helper.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
import dbt.exceptions
from typing import Any, Dict, Optional
import yaml
import yaml.scanner
import oyaml as yaml
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced yaml with oyaml in order to retain ordering when prompting for user input. Rather than keep both I just replaced every reference with oyaml. It may well be preferred that we leave all other imports as-is, let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwigley do you have any thoughts here?

Copy link
Contributor

@jtcohen6 jtcohen6 Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the switch from yaml to oyaml has broken one highly specific integration test, which checks the sorting behavior of the toyaml Jinja context method. The sort_keys argument is not being respected by oyaml.safe_dump. Glad we have a test for it!

https://github.com/dbt-labs/dbt/blob/3789acc5a7b3f71b4e333ac6e235c62ee0c957f5/test/integration/013_context_var_tests/tests/to_yaml.sql#L5

https://github.com/dbt-labs/dbt/blob/3789acc5a7b3f71b4e333ac6e235c62ee0c957f5/core/dbt/context/base.py#L416

I think my preference would probably be to avoid switching from yaml to oyaml wherever possible. I'm also wondering if there's another way we can preserve prompt order for target_options.yml, even if it requires an extra attribute

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would seem my reasoning for using oyaml was misplaced - I've removed it and apart from some changes to the order of dumped yaml in profiles.yml the rest of the behaviour is identical. So the order of the questions to the user is still the order of the keys in the target_options.yml. Hooray!


# the C version is faster, but it doesn't always exist
try:
from yaml import (
from oyaml import (
CLoader as Loader,
CSafeLoader as SafeLoader,
CDumper as Dumper
)
except ImportError:
from yaml import ( # type: ignore # noqa: F401
from oyaml import ( # type: ignore # noqa: F401
Loader, SafeLoader, Dumper
)

Expand Down
15 changes: 0 additions & 15 deletions core/dbt/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,6 @@ def run_from_args(parsed):

with track_run(task):
results = task.run()

return task, results


Expand Down Expand Up @@ -360,20 +359,6 @@ def _build_init_subparser(subparsers, base_subparser):
Initialize a new DBT project.
'''
)
sub.add_argument(
'project_name',
type=str,
help='''
Name of the new project
''',
)
sub.add_argument(
'--adapter',
type=str,
help='''
Write sample profiles.yml for which adapter
''',
)
sub.set_defaults(cls=init_task.InitTask, which='init', rpc_method=None)
return sub

Expand Down
256 changes: 214 additions & 42 deletions core/dbt/task/init.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import copy
import os
import shutil
from typing import Tuple

import oyaml as yaml
import click
from jinja2 import Template

import dbt.config
import dbt.clients.system
Expand All @@ -10,7 +16,7 @@

from dbt.include.starter_project import PACKAGE_PATH as starter_project_directory

from dbt.task.base import BaseTask
from dbt.task.base import BaseTask, move_to_nearest_project_dir

DOCS_URL = 'https://docs.getdbt.com/docs/configure-your-profile'
SLACK_URL = 'https://community.getdbt.com/'
Expand All @@ -19,11 +25,7 @@
IGNORE_FILES = ["__init__.py", "__pycache__"]

ON_COMPLETE_MESSAGE = """
Your new dbt project "{project_name}" was created! If this is your first time
using dbt, you'll need to set up your profiles.yml file -- this file will tell dbt how
to connect to your database. You can find this file by running:

{open_cmd} {profiles_path}
Your new dbt project "{project_name}" was created!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(low priority)

This message feels like lot of text after the interactive bite-sized chunks. I wonder if we can do something cool with click, to either:

  • clear the terminal, leaving only the welcome message
  • prompting with each item one by one, so that the user "acknowledges" each: project created!, here's a link to the docs, need help?, happy modeling!


For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:
Expand All @@ -46,33 +48,36 @@ def copy_starter_repo(self, project_name):
shutil.copytree(starter_project_directory, project_name,
ignore=shutil.ignore_patterns(*IGNORE_FILES))

def create_profiles_dir(self, profiles_dir):
def create_profiles_dir(self, profiles_dir: str) -> bool:
"""Create the user's profiles directory if it doesn't already exist."""
if not os.path.exists(profiles_dir):
msg = "Creating dbt configuration folder at {}"
logger.info(msg.format(profiles_dir))
dbt.clients.system.make_directory(profiles_dir)
return True
return False

def create_profiles_file(self, profiles_file, sample_adapter):
def create_profile_from_sample(self, adapter: str):
"""Create a profile entry using the adapter's sample_profiles.yml"""
# Line below raises an exception if the specified adapter is not found
load_plugin(sample_adapter)
adapter_path = get_include_paths(sample_adapter)[0]
sample_profiles_path = adapter_path / 'sample_profiles.yml'
load_plugin(adapter)
adapter_path = get_include_paths(adapter)[0]
sample_profiles_path = adapter_path / "sample_profiles.yml"

if not sample_profiles_path.exists():
logger.debug(f"No sample profile found for {sample_adapter}, skipping")
return False

if not os.path.exists(profiles_file):
msg = "With sample profiles.yml for {}"
logger.info(msg.format(sample_adapter))
shutil.copyfile(sample_profiles_path, profiles_file)
return True

return False
logger.debug(f"No sample profile found for {adapter}.")
else:
with open(sample_profiles_path, "r") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, we probably want to lead on the dbt.clients.system module for cross-OS support / error handling. I'm thinking about load_file_contents and write_file in particular... though we'll need to adjust that method to support the r+ and a modes you're using here.

cc @leahwicz: I'm pretty fuzzy on this stuff, so you should correct me if I'm wrong here. My sense is, the more we can lean on clients.system methods for all file operations, the better-served we'll be in a world with storage adapters etc. Alternatively, we could simply say that since init is a CLI-only task, we don't care about needing to support it outside of the local file system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So two things being unpacked here:

1: Should we use dbt.clients.system for file read/writes that will work well across platforms?
This makes sense, although I fear that we kind of re-invented the wheel when we made that module. Python 3.4+ contains the pathlib module which does the same thing (and a lot more). Assuming you only need basic read/writes I would highly suggest using that instead of dbt.clients.system

2: Does it make sense to support SAs here once they're fully available?
Probably not. I guess there might be an edge case where a user would want to bootstrap an adapter skeleton into a remote filesystem or something... but it's the edgiest of edge cases.

TL;DR pathlib FTW!

# Ignore the name given in the sample_profiles.yml
profile = list(yaml.load(f).values())[0]
profiles_filepath, profile_name = self.write_profile(profile)
logger.info(
f"Profile {profile_name} written to {profiles_filepath} "
"using sample configuration. Once updated "
"you'll be able to start developing with dbt."
)

def get_addendum(self, project_name, profiles_path):
def get_addendum(self, project_name: str, profiles_path: str) -> str:
open_cmd = dbt.clients.system.open_dir_cmd()

return ON_COMPLETE_MESSAGE.format(
Expand All @@ -83,29 +88,196 @@ def get_addendum(self, project_name, profiles_path):
slack_url=SLACK_URL
)

def run(self):
project_dir = self.args.project_name
sample_adapter = self.args.adapter
if not sample_adapter:
try:
# pick first one available, often postgres
sample_adapter = next(_get_adapter_plugin_names())
except StopIteration:
logger.debug("No adapters installed, skipping")
def generate_target_from_input(
self,
target_options: dict,
target: dict = {}
) -> dict:
"""Generate a target configuration from target_options and user input.
"""
target_options_local = copy.deepcopy(target_options)
for key, value in target_options_local.items():
if key.startswith("_choose"):
choice_type = key[8:].replace("_", " ")
option_list = list(value.keys())
options_msg = "\n".join([
f"[{n+1}] {v}" for n, v in enumerate(option_list)
])
click.echo(options_msg)
numeric_choice = click.prompt(
f"Desired {choice_type} option (enter a number)", type=click.INT
)
choice = option_list[numeric_choice - 1]
# Complete the chosen option's values in a recursive call
target = self.generate_target_from_input(
target_options_local[key][choice], target
)
else:
if key.startswith("_fixed"):
# _fixed prefixed keys are not presented to the user
target[key[7:]] = value
elif isinstance(value, str) and (value[0] + value[-1] == "[]"):
# A square bracketed value is used as a hint
hide_input = key == "password"
target[key] = click.prompt(
f"{key} ({value[1:-1]})", hide_input=hide_input
)
elif isinstance(value, list):
# A list can be used to provide both a hint and a default
target[key] = click.prompt(
f"{key} ({value[0]})", default=value[1]
)
else:
# All other values are used as defaults
target[key] = click.prompt(
key, default=target_options_local[key]
)
return target

profiles_dir = dbt.config.PROFILES_DIR
profiles_file = os.path.join(profiles_dir, 'profiles.yml')
def get_profile_name_from_current_project(self) -> str:
"""Reads dbt_project.yml in the current directory to retrieve the
profile name.
"""
with open("dbt_project.yml") as f:
dbt_project = yaml.load(f)
return dbt_project["profile"]

def write_profile(
self, profile: dict, profile_name: str = None
) -> Tuple[str, str]:
"""Given a profile, write it to the current project's profiles.yml.
This will overwrite any profile with a matching name."""
profiles_file = os.path.join(dbt.config.PROFILES_DIR, "profiles.yml")
profile_name = (
profile_name or self.get_profile_name_from_current_project()
)
if os.path.exists(profiles_file):
with open(profiles_file, "r+") as f:
profiles = yaml.load(f) or {}
profiles[profile_name] = profile
f.seek(0)
yaml.dump(profiles, f)
f.truncate()
else:
profiles = {profile_name: profile}
with open(profiles_file, "w") as f:
yaml.dump(profiles, f)
return profiles_file, profile_name

def create_profile_from_target_options(self, target_options: dict):
"""Create and write a profile using the supplied target_options."""
target = self.generate_target_from_input(target_options)
profile = {
"outputs": {
"dev": target
},
"target": "dev"
}
profiles_filepath, profile_name = self.write_profile(profile)
logger.info(
f"Profile {profile_name} written to {profiles_filepath} using "
"your supplied values."
)

def create_profile_from_scratch(self, adapter: str):
"""Create a profile without defaults using target_options.yml if available, or
sample_profiles.yml as a fallback."""
# Line below raises an exception if the specified adapter is not found
load_plugin(adapter)
adapter_path = get_include_paths(adapter)[0]
target_options_path = adapter_path / "target_options.yml"

if target_options_path.exists():
with open(target_options_path) as f:
target_options = yaml.load(f)
self.create_profile_from_target_options(target_options)
else:
# For adapters without a target_options.yml defined, fallback on
# sample_profiles.yml
self.create_profile_from_sample(adapter)

def check_if_can_write_profile(self, profile_name: str = None) -> bool:
profiles_file = os.path.join(dbt.config.PROFILES_DIR, "profiles.yml")
if not os.path.exists(profiles_file):
return True
profile_name = (
profile_name or self.get_profile_name_from_current_project()
)
with open(profiles_file, "r") as f:
profiles = yaml.load(f) or {}
if profile_name in profiles.keys():
response = click.confirm(
f"The profile {profile_name} already exists in "
f"{profiles_file}. Continue and overwrite it?"
)
return response
else:
return True

def create_profile_using_profile_template(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I believe this method needs profile_name passed as an argument, from the run entry point
  • This method will raise an exception if the template file is improperly formatted (e.g. missing a top-level key). It's tough to debug, since dbt doesn't log anything. What do you think of putting the call to create_profile_using_profile_template in a try/except that falls back to standard profile prompting/creation if it fails for any reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It retrieves the profile_name from profile_template.yml, I'm not sure it needs to be passed from the run entry point - there is a possibility that profile_template.yml's profile name differs from that of the project.
  2. Good idea

"""Create a profile using profile_template.yml"""
with open("profile_template.yml") as f:
profile_template = yaml.load(f)
profile_name = list(profile_template["profile"].keys())[0]
self.check_if_can_write_profile(profile_name)
render_vars = {}
for template_variable in profile_template["vars"]:
render_vars[template_variable] = click.prompt(template_variable)
profile = profile_template["profile"][profile_name]
profile_str = yaml.dump(profile)
profile_str = Template(profile_str).render(vars=render_vars)
profile = yaml.load(profile_str)
profiles_filepath, _ = self.write_profile(profile, profile_name)
logger.info(
f"Profile {profile_name} written to {profiles_filepath} using "
"profile_template.yml and your supplied values."
)

def ask_for_adapter_choice(self) -> str:
"""Ask the user which adapter (database) they'd like to use."""
click.echo("Which database would you like to use?")
available_adapters = list(_get_adapter_plugin_names())
click.echo("\n".join([
f"[{n+1}] {v}" for n, v in enumerate(available_adapters)
]))
numeric_choice = click.prompt("Enter a number", type=click.INT)
return available_adapters[numeric_choice - 1]
NiallRees marked this conversation as resolved.
Show resolved Hide resolved

def run(self):
profiles_dir = dbt.config.PROFILES_DIR
self.create_profiles_dir(profiles_dir)
if sample_adapter:
self.create_profiles_file(profiles_file, sample_adapter)

if os.path.exists(project_dir):
raise RuntimeError("directory {} already exists!".format(
project_dir
))
try:
move_to_nearest_project_dir(self.args)
in_project = True
except dbt.exceptions.RuntimeException:
in_project = False

self.copy_starter_repo(project_dir)
if in_project:
logger.info("Setting up your profile.")
if os.path.exists("profile_template.yml"):
self.create_profile_using_profile_template()
else:
if not self.check_if_can_write_profile():
return
adapter = self.ask_for_adapter_choice()
self.create_profile_from_scratch(
adapter
)
NiallRees marked this conversation as resolved.
Show resolved Hide resolved
else:
project_dir = click.prompt("What is the desired project name?")
if os.path.exists(project_dir):
logger.info(
f"Existing project found at directory {project_dir}"
)
return

addendum = self.get_addendum(project_dir, profiles_dir)
logger.info(addendum)
self.copy_starter_repo(project_dir)
os.chdir(project_dir)
if not self.check_if_can_write_profile():
return
adapter = self.ask_for_adapter_choice()
self.create_profile_from_scratch(
adapter
)
logger.info(self.get_addendum(project_dir, profiles_dir))
2 changes: 1 addition & 1 deletion core/scripts/upgrade_dbt_schema_tests_v1_to_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import os
import re
import sys
import yaml
import oyaml as yaml

LOGGER = logging.getLogger('upgrade_dbt_schema')
LOGFILE = 'upgrade_dbt_schema_tests_v1_to_v2.txt'
Expand Down
3 changes: 2 additions & 1 deletion core/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,9 @@ def read(fname):
],
install_requires=[
'Jinja2==2.11.3',
'PyYAML>=3.11',
'oyaml>=1.0',
'agate>=1.6,<1.6.2',
'click>=8,<9',
'colorama>=0.3.9,<0.4.5',
'dataclasses>=0.6,<0.9;python_version<"3.7"',
'hologram==0.0.14',
Expand Down
3 changes: 3 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
[mypy]
mypy_path = ./third-party-stubs
namespace_packages = True

[mypy-oyaml.*]
ignore_missing_imports = True
14 changes: 14 additions & 0 deletions plugins/bigquery/dbt/include/bigquery/target_options.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
_fixed_type: bigquery
_choose_authentication_method:
oauth:
_fixed_method: oauth
service_account:
_fixed_method: service-account
keyfile: '[/path/to/bigquery/keyfile.json]'
project: '[GCP project id]'
dataset: '[the name of your dbt dataset]'
threads: '[1 or more]'
timeout_seconds: 300
location: '[one of US or EU]'
_fixed_priority: interactive
_fixed_retries: 1
Loading