Skip to content

Commit

Permalink
package management (#542)
Browse files Browse the repository at this point in the history
* semver resolution

* cleanup

* remove unnecessary comment

* add test for multiples on both sides

* add resolve_to_specific_version

* local registry

* hacking out deps

* Buck pkg mgmt (#645)

* only load hooks and archives once (#540)

* sets schema for node before parsing raw sql (#541)

* Fix/env vars (#543)

* fix for bad env_var exception

* overwrite target with compiled values

* fixes env vars, adds test. Auto-compile profile/target args

* improvements for code that runs in hooks (#544)

* improvements for code that runs in hooks

* fix error message note

* typo

* Update CHANGELOG.md

* bump version (#546)

* add scope to service account json creds initializer (#547)

* bump 0.9.0a3 --> 0.9.0a4 (#548)

* Fix README links (#554)

* Update README.md

* handle empty profiles.yml file (#555)

* return empty string (instead of None) to avoid polluting rendered sql (#566)

* tojson was added in jinja 2.9 (#563)

* tojson was added in jinja 2.9

* requirements

* fix package-defined schema test macros (#562)

* fix package-defined schema test macros

* create a dummy Relation in parsing

* fix for bq quoting (#565)

* bump snowflake, remove pyasn1 (#570)

* bump snowflake, remove pyasn1

* change requirements.txt

* allow macros to return non-text values (#571)

* revert jinja version, implement tojson hack (#572)

* bump to 090a5

* update changelog

* bump (#574)

* 090 docs (#575)

* 090 docs

* Update CHANGELOG.md

* Update CHANGELOG.md

* Raise CompilationException on duplicate model (#568)

* Raise CompilationException on duplicate model

Extend tests

* Ignore disabled models in parse_sql_nodes

Extend tests for duplicate model

* Fix preexisting models

* Use double quotes consistently

Rename model-1 to model-disabled

* Fix unit tests

* Raise exception on duplicate model across packages

Extend tests

* Make run_started_at timezone aware (#553) (#556)

* Make run_started_at timezone aware

Set run_started_at timezone to UTC
Enable timezone change in models
Extend requirements
Extend tests

* Address comments from code review

Create modules namespace to context
Move pytz to modules
Add new dependencies to setup.py

* Add warning for missing constraints. Fixes #592 (#600)

* Add warning for missing constraints. Fixes #592

* fix unit tests

* fix schema tests used in, or defined in packages (#599)

* fix schema tests used in, or defined in packages

* don't hardcode dbt test namespace

* fix/actually run tests

* rm junk

* run hooks in correct order, fixes #590 (#601)

* run hooks in correct order, fixes #590

* add tests

* fix tests

* pep8

* change req for snowflake to fix crypto install issue (#612)

From cffi callback <function _verify_callback at 0x06BF2978>:
Traceback (most recent call last):
  File "c:\projects\dbt\.tox\pywin\lib\site-packages\OpenSSL\SSL.py", line 313, in wrapper
    _lib.X509_up_ref(x509)
AttributeError: module 'lib' has no attribute 'X509_up_ref'
From cffi callback <function _verify_callback at 0x06B8CF60>:

* Update python version in Makefile from 3.5 to 3.6 (#613)

* Fix/snowflake custom schema (#626)

* Fixes already opened transaction issue

For #602

* Fixes #621

* Create schema in archival flow (#625)

* Fix for pre-hooks outside of transactions (#623)

* Fix for pre-hooks outside of transactions #576

* improve tests

* Fixes already opened transaction issue (#622)

For #602

* Accept string for postgres port number (#583) (#624)

* Accept string for postgres port number (#583)

* s/str/basestring/g

* print correct run time (include hooks) (#607)

* add support for late binding views (Redshift) (#614)

* add support for late binding views (Redshift)

* fix bind logic

* wip for get_columns_in_table

* fix get_columns_in_table

* fix for default value in bind config

* pep8

* skip tests that depend on nonexistent or disabled models (#617)

* skip tests that depend on nonexistent or disabled models

* pep8, Fixes #616

* refactor

* fix for adapter macro called within packages (#630)

* fix for adapter macro called within packages

* better error message

* Update CHANGELOG.md (#632)

* Update CHANGELOG.md

* Update CHANGELOG.md

* Bump version: 0.9.0 → 0.9.1

* more helpful exception for registry funcs

* Rework deps to support local & git

* pylint and cleanup

* make modules directory first

* Refactor registry client for cleanliness and better error handling

* init converter script

* create modules directory only if non-existent

* Only check the hub registry for registry packages

* Incorporate changes from Drew's branch

Diff of original changes:
https://github.com/fishtown-analytics/dbt/pull/591/files

* lint

* include a portion of the actual name in destination directory

* Install dependencies using actual name; better exceptions

* Error if two dependencies have same name

* Process dependencies one level at a time

Included in this change is a refactor of the deps run function for
clarity.

Also I changed the resolve_version function to update the object in
place. I prefer the immutability of this function as it was, but the
rest of the code doesn't really operate that way. And I ran into some
bugs due to this discrepancy.

* update var name

* Provide support for repositories in project yml

* Download files in a temp directory

The downloads directory causes problems with the run command because
this directory is not a dbt project. Need to download it elsewhere.

* pin some versions

* pep8-ify

* some PR feedback changes around logging

* PR feedback round 2

* Fix for redshift varchar bug (#647)

* Fix for redshift varchar bug

* pep8 on a sql string, smh

* Set global variable overrides on the command line with --vars (#640)

* Set global variable overrides on the command line with --vars

* pep8

* integration tests for cli vars

* Seed rewrite (#618)

* loader for seed data files

* Functioning rework of seed task

* Make CompilerRunner fns private and impl. SeedRunner.compile

Trying to distinguish between the public/private interface for this
class. And the SeedRunner doesn't need the functionality in the compile
function, it just needs a compile function to exist for use in the
compilation process.

* Test changes and fixes

* make the DB setup script usable locally

* convert simple copy test to use seeed

* Fixes to get Snowflake working

* New seed flag and make it non-destructive by default

* Convert update SQL script to another seed

* cleanup

* implement bigquery csv load

* context handling of StringIO

* Better typing

* strip seeder and csvkit dependency

* update bigquery to use new data typing and to fix unicode issue

* update seed test

* fix abstract functions in base adapter

* support time type

* try pinning crypto, pyopenssl versions

* remove unnecessary version pins

* insert all at once, rather than one query per row

* do not quote field names on creation

* bad

* quiet down parsedatetime logger

* pep8

* UI updates + node conformity for seed nodes

* add seed to list of resource types, cleanup

* show option for CSVs

* typo

* pep8

* move agate import to avoid strange warnings

* deprecation warning for --drop-existing

* quote column names in seed files

* revert quoting change (breaks Snowflake). Hush warnings

* use hub url

* Show installed version, silence semver regex warnings

* sort versions to make tests deterministic. Prefer higher versions

* pep8, fix comparison functions for py3

* make compare function return value in {-1, 0, 1}

* fix for deleting git dirs on windows?

* use system client rmdir instead of shutil directly

* debug logging to identify appveyor issue

* less restrictive error retry

* rm debug logging

* s/version/revision for git packages

* more s/version/revision, deprecation cleanup

* remove unused semver codepath

* plus symlinks!!!

* get rid of reference to removed function
  • Loading branch information
cmcarthur authored and drewbanin committed Feb 27, 2018
1 parent 6783966 commit 5fbcd12
Show file tree
Hide file tree
Showing 13 changed files with 1,231 additions and 124 deletions.
73 changes: 73 additions & 0 deletions converter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
#!/usr/bin/env python
import json
import yaml
import sys
import argparse
from datetime import datetime, timezone
import dbt.clients.registry as registry


def yaml_type(fname):
with open(fname) as f:
return yaml.load(f)


def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--project", type=yaml_type, default="dbt_project.yml")
parser.add_argument("--namespace", required=True)
return parser.parse_args()


def get_full_name(args):
return "{}/{}".format(args.namespace, args.project["name"])


def init_project_in_packages(args, packages):
full_name = get_full_name(args)
if full_name not in packages:
packages[full_name] = {
"name": args.project["name"],
"namespace": args.namespace,
"latest": args.project["version"],
"assets": {},
"versions": {},
}
return packages[full_name]


def add_version_to_package(args, project_json):
project_json["versions"][args.project["version"]] = {
"id": "{}/{}".format(get_full_name(args), args.project["version"]),
"name": args.project["name"],
"version": args.project["version"],
"description": "",
"published_at": datetime.now(timezone.utc).astimezone().isoformat(),
"packages": args.project.get("packages") or [],
"works_with": [],
"_source": {
"type": "github",
"url": "",
"readme": "",
},
"downloads": {
"tarball": "",
"format": "tgz",
"sha1": "",
},
}


def main():
args = parse_args()
packages = registry.packages()
project_json = init_project_in_packages(args, packages)
if args.project["version"] in project_json["versions"]:
raise Exception("Version {} already in packages JSON"
.format(args.project["version"]),
file=sys.stderr)
add_version_to_package(args, project_json)
print(json.dumps(packages, indent=2))

if __name__ == "__main__":
main()
6 changes: 6 additions & 0 deletions dbt/adapters/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,12 @@ def get_columns_in_table(cls, profile, schema_name, table_name,
raise dbt.exceptions.NotImplementedException(
'`get_columns_in_table` is not implemented for this adapter!')

@classmethod
def get_columns_in_table(cls, profile, schema_name, table_name,
model_name=None):
raise dbt.exceptions.NotImplementedException(
'`get_columns_in_table` is not implemented for this adapter!')

@classmethod
def check_schema_exists(cls, profile, schema, model_name=None):
conn = cls.get_connection(profile, model_name)
Expand Down
32 changes: 31 additions & 1 deletion dbt/clients/git.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import re
import os.path

from dbt.clients.system import run_cmd, rmdir
Expand Down Expand Up @@ -29,7 +30,7 @@ def checkout(cwd, repo, branch=None):
if branch is None:
branch = 'master'

logger.info(' Checking out branch {}.'.format(branch))
logger.debug(' Checking out branch {}.'.format(branch))

run_cmd(cwd, ['git', 'remote', 'set-branches', 'origin', branch])
run_cmd(cwd, ['git', 'fetch', '--tags', '--depth', '1', 'origin', branch])
Expand Down Expand Up @@ -59,3 +60,32 @@ def get_current_sha(cwd):

def remove_remote(cwd):
return run_cmd(cwd, ['git', 'remote', 'rm', 'origin'])


def clone_and_checkout(repo, cwd, dirname=None, remove_git_dir=False,
branch=None):
_, err = clone(repo, cwd, dirname=dirname, remove_git_dir=remove_git_dir)
exists = re.match("fatal: destination path '(.+)' already exists",
err.decode('utf-8'))
directory = None
start_sha = None
if exists:
directory = exists.group(1)
logger.debug('Updating existing dependency %s.', directory)
else:
matches = re.match("Cloning into '(.+)'", err.decode('utf-8'))
directory = matches.group(1)
logger.debug('Pulling new dependency %s.', directory)
full_path = os.path.join(cwd, directory)
start_sha = get_current_sha(full_path)
checkout(full_path, repo, branch)
end_sha = get_current_sha(full_path)
if exists:
if start_sha == end_sha:
logger.debug(' Already at %s, nothing to do.', start_sha[:7])
else:
logger.debug(' Updated checkout from %s to %s.',
start_sha[:7], end_sha[:7])
else:
logger.debug(' Checked out at %s.', end_sha[:7])
return directory
61 changes: 61 additions & 0 deletions dbt/clients/registry.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
from functools import wraps
import six
import requests
from dbt.exceptions import RegistryException
from dbt.utils import memoized
import os

if os.getenv('DBT_PACKAGE_HUB_URL'):
DEFAULT_REGISTRY_BASE_URL = os.getenv('DBT_PACKAGE_HUB_URL')
else:
DEFAULT_REGISTRY_BASE_URL = 'https://hub.getdbt.com/'


def _get_url(url, registry_base_url=None):
if registry_base_url is None:
registry_base_url = DEFAULT_REGISTRY_BASE_URL

return '{}{}'.format(registry_base_url, url)


def _wrap_exceptions(fn):
@wraps(fn)
def wrapper(*args, **kwargs):
try:
return fn(*args, **kwargs)
except requests.exceptions.ConnectionError as e:
six.raise_from(
RegistryException('Unable to connect to registry hub'), e)
return wrapper


@_wrap_exceptions
def _get(path, registry_base_url=None):
url = _get_url(path, registry_base_url)
resp = requests.get(url)
resp.raise_for_status()
return resp.json()


def index(registry_base_url=None):
return _get('api/v1/index.json', registry_base_url)


index_cached = memoized(index)


def packages(registry_base_url=None):
return _get('api/v1/packages.json', registry_base_url)


def package(name, registry_base_url=None):
return _get('api/v1/{}.json'.format(name), registry_base_url)


def package_version(name, version, registry_base_url=None):
return _get('api/v1/{}/{}.json'.format(name, version), registry_base_url)


def get_available_versions(name):
response = package(name)
return list(response['versions'])
76 changes: 73 additions & 3 deletions dbt/clients/system.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,12 @@
import shutil
import subprocess
import sys
import tarfile
import requests
import stat

import dbt.compat
import dbt.exceptions

from dbt.logger import GLOBAL_LOGGER as logger

Expand Down Expand Up @@ -92,19 +96,61 @@ def make_file(path, contents='', overwrite=False):
return False


def make_symlink(source, link_path):
"""
Create a symlink at `link_path` referring to `source`.
"""
if not supports_symlinks():
dbt.exceptions.system_error('create a symbolic link')

return os.symlink(source, link_path)


def supports_symlinks():
return getattr(os, "symlink", None) is not None


def write_file(path, contents=''):
make_directory(os.path.dirname(path))
dbt.compat.write_file(path, contents)

return True


def _windows_rmdir_readonly(func, path, exc):
exception_val = exc[1]
if exception_val.errno == errno.EACCES:
os.chmod(path, stat.S_IWUSR)
func(path)
else:
raise


def rmdir(path):
"""
Make a file at `path` assuming that the directory it resides in already
exists. The file is saved with contents `contents`
Recursively deletes a directory. Includes an error handler to retry with
different permissions on Windows. Otherwise, removing directories (eg.
cloned via git) can cause rmtree to throw a PermissionError exception
"""
return shutil.rmtree(path)
logger.debug("DEBUG** Window rmdir sys.platform: {}".format(sys.platform))
if sys.platform == 'win32':
onerror = _windows_rmdir_readonly
else:
onerror = None

return shutil.rmtree(path, onerror=onerror)


def remove_file(path):
return os.remove(path)


def path_exists(path):
return os.path.lexists(path)


def path_is_symlink(path):
return os.path.islink(path)


def open_dir_cmd():
Expand Down Expand Up @@ -133,3 +179,27 @@ def run_cmd(cwd, cmd):
logger.debug('STDERR: "{}"'.format(err))

return out, err


def download(url, path):
response = requests.get(url)
with open(path, 'wb') as handle:
for block in response.iter_content(1024*64):
handle.write(block)


def rename(from_path, to_path, force=False):
if os.path.exists(to_path) and force:
rmdir(to_path)
os.rename(from_path, to_path)


def untar_package(tar_path, dest_dir, rename_to=None):
tar_dir_name = None
with tarfile.open(tar_path, 'r') as tarball:
tarball.extractall(dest_dir)
tar_dir_name = os.path.commonprefix(tarball.getnames())
if rename_to:
downloaded_path = os.path.join(dest_dir, tar_dir_name)
desired_path = os.path.join(dest_dir, rename_to)
dbt.clients.system.rename(downloaded_path, desired_path, force=True)
1 change: 1 addition & 0 deletions dbt/context/common.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import json
import os
import pytz
import voluptuous

from dbt.adapters.factory import get_adapter
Expand Down
21 changes: 10 additions & 11 deletions dbt/deprecations.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,21 +11,19 @@ def show(self, *args, **kwargs):
logger.info("* Deprecation Warning: {}\n".format(desc))
active_deprecations.add(self.name)

# Leaving this as an example. Make sure to add new ones to deprecations_list
# - Connor
#
# class DBTRunTargetDeprecation(DBTDeprecation):
# name = 'run-target'
# description = """profiles.yml configuration option 'run-target' is
# deprecated. Please use 'target' instead. The 'run-target' option will be
# removed (in favor of 'target') in DBT version 0.7.0"""

class DBTRepositoriesDeprecation(DBTDeprecation):
name = "repositories"
description = """dbt_project.yml configuration option 'repositories' is
deprecated. Please use 'packages' instead. The 'repositories' option will
be removed in a later version of DBT."""


class SeedDropExistingDeprecation(DBTDeprecation):
name = 'drop-existing'
description = """The --drop-existing argument has been deprecated. Please
use --full-refresh instead. The --drop-existing option will be removed in a
future version of dbt."""
description = """The --drop-existing argument to `dbt seed` has been
deprecated. Please use --full-refresh instead. The --drop-existing option
will be removed in a future version of dbt."""


def warn(name, *args, **kwargs):
Expand All @@ -44,6 +42,7 @@ def warn(name, *args, **kwargs):
active_deprecations = set()

deprecations_list = [
DBTRepositoriesDeprecation(),
SeedDropExistingDeprecation()
]

Expand Down
Loading

0 comments on commit 5fbcd12

Please sign in to comment.