diff --git a/.gitignore b/.gitignore index 1dbc687de01..37c536b9cec 100644 --- a/.gitignore +++ b/.gitignore @@ -23,6 +23,7 @@ var/ *.egg-info/ .installed.cfg *.egg +logs/ # PyInstaller # Usually these files are written by a python script from a template @@ -60,3 +61,6 @@ target/ #Ipython Notebook .ipynb_checkpoints + +#Emacs +*~ diff --git a/CHANGELOG.md b/CHANGELOG.md index b3ed1a5b1e2..16c3a649fa9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ ## dbt 0.6.1 (unreleased) +#### Bugfixes + +- respect `config` options in profiles.yml ([#255](https://github.com/analyst-collective/dbt/pull/255)) + #### Changes - add `--debug` flag, replace calls to `print()` with a global logger ([#256](https://github.com/analyst-collective/dbt/pull/256)) @@ -62,7 +66,7 @@ Use `{{ target }}` to interpolate profile variables into your model definitions. ```sql -- only use the last week of data in development -select * from events +select * from events {% if target.name == 'dev' %} where created_at > getdate() - interval '1 week' @@ -227,7 +231,7 @@ As `dbt` has grown, we found this implementation to be a little unwieldy and har The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/analyst-collective/dbt/milestone/16 . -As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns: +As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns: @@ -244,7 +248,7 @@ See https://github.com/analyst-collective/dbt/releases/tag/v0.5.1 ## dbt release 0.5.1 -### 0. tl;dr +### 0. tl;dr 1. Raiders of the Lost Archive -- version your raw data to make historical queries more accurate 2. Column type resolution for incremental models (no more `Value too long for character type` errors) @@ -281,7 +285,7 @@ The archived tables will mirror the schema of the source tables they're generate 1. `valid_from`: The timestamp when this archived row was inserted (and first considered valid) 1. `valid_to`: The timestamp when this archived row became invalidated. The first archived record for a given `unique_key` has `valid_to = NULL`. When newer data is archived for that `unique_key`, the `valid_to` field of the old record is set to the `valid_from` field of the new record! -1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row). +1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row). dbt models can be built on top of these archived tables. The most recent record for a given `unique_key` is the one where `valid_to` is `null`. @@ -289,7 +293,7 @@ To run this archive process, use the command `dbt archive`. After testing and co ### 2. Incremental column expansion https://github.com/analyst-collective/dbt/issues/175 -Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field. +Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field. In practice, this error looks like: ``` @@ -485,7 +489,7 @@ models: post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())" ``` -Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks. +Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks. Finally, note that the `grant` statement uses the (hopefully familiar) `{{this}}` syntax whereas the `insert` statement uses the `{{this.name}}` syntax. When DBT creates a model: - A temp table is created @@ -516,7 +520,7 @@ config: ![windows](https://pbs.twimg.com/profile_images/571398080688181248/57UKydQS.png) ---- +--- dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections. @@ -540,7 +544,7 @@ pip install -U dbt # To run models dbt run # same as before -# to dry-run models +# to dry-run models dbt run --dry # previously dbt test # to run schema tests @@ -553,10 +557,10 @@ Previously, dbt calculated "new" incremental records to insert by querying for r User 1 Session 1 Event 1 @ 12:00 User 1 Session 1 Event 2 @ 12:01 --- dbt run -- +-- dbt run -- User 1 Session 1 Event 3 @ 12:02 -In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate! +In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate! With this release, you can now add a `unique_key` expression to an incremental model config. Records matching the `unique_key` will be `delete`d from the incremental table, then `insert`ed as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run. @@ -570,7 +574,7 @@ sessions: ### 3. Run schema validations concurrently https://github.com/analyst-collective/dbt/issues/100 -The `threads` run-target config now applies to schema validations too. Try it with `dbt test` +The `threads` run-target config now applies to schema validations too. Try it with `dbt test` ### 4. Connect to database over ssh https://github.com/analyst-collective/dbt/issues/93 @@ -588,10 +592,10 @@ warehouse: dbname: my-db schema: dbt_dbanin threads: 8 - ssh-host: ssh-host-name # <------ Add this line + ssh-host: ssh-host-name # <------ Add this line run-target: dev ``` - + ### Remove the model-defaults config https://github.com/analyst-collective/dbt/issues/111 The `model-defaults` config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition: @@ -688,12 +692,12 @@ from users where email not in (select email from __dbt__CTE__employees) ``` -Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity. +Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity. ### 4. Feature: In-model configs https://github.com/analyst-collective/dbt/issues/88 -Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file. +Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file. An in-model-config looks like this: @@ -703,7 +707,7 @@ An in-model-config looks like this: -- python function syntax {{ config(materialized="incremental", sql_where="id > (select max(id) from {{this}})") }} -- OR json syntax -{{ +{{ config({"materialized:" "incremental", "sql_where" : "id > (select max(id) from {{this}})"}) }} diff --git a/Makefile b/Makefile index 1460f8d7f4b..502b9fd975f 100644 --- a/Makefile +++ b/Makefile @@ -1,10 +1,16 @@ -.PHONY: test +.PHONY: test test-unit test-integration changed_tests := `git status --porcelain | grep '^\(M\| M\|A\| A\)' | awk '{ print $$2 }' | grep '\/test_[a-zA-Z_\-\.]\+.py'` -test: - @echo "Test run starting..." - @docker-compose run test /usr/src/app/test/runner.sh +test: test-unit test-integration + +test-unit: + @echo "Unit test run starting..." + tox -e unit-py27,unit-py35 + +test-integration: + @echo "Integration test run starting..." + @docker-compose run test /usr/src/app/test/integration.sh test-new: @echo "Test run starting..." diff --git a/dbt/config.py b/dbt/config.py new file mode 100644 index 00000000000..8bbe77e220b --- /dev/null +++ b/dbt/config.py @@ -0,0 +1,25 @@ +import os.path +import yaml + +import dbt.project as project + + +def read_config(profiles_dir): + # TODO: validate profiles_dir + path = os.path.join(profiles_dir, 'profiles.yml') + + if os.path.isfile(path): + with open(path, 'r') as f: + profile = yaml.safe_load(f) + return profile.get('config', {}) + + return {} + + +def send_anonymous_usage_stats(profiles_dir): + config = read_config(profiles_dir) + + if config is not None and config.get("send_anonymous_usage_stats") == False: + return False + + return True diff --git a/dbt/main.py b/dbt/main.py index 11467b98e05..bdcffa158d3 100644 --- a/dbt/main.py +++ b/dbt/main.py @@ -18,17 +18,7 @@ import dbt.task.test as test_task import dbt.task.archive as archive_task import dbt.tracking - - -def is_opted_out(profiles_dir): - profiles = project.read_profiles(profiles_dir) - - if profiles is None or profiles.get("config") is None: - return False - elif profiles['config'].get("send_anonymous_usage_stats") == False: - return True - else: - return False +import dbt.config as config def main(args=None): if args is None: @@ -48,7 +38,7 @@ def handle(args): initialize_logger(parsed.debug) # this needs to happen after args are parsed so we can determine the correct profiles.yml file - if is_opted_out(parsed.profiles_dir): + if not config.send_anonymous_usage_stats(parsed.profiles_dir): dbt.tracking.do_not_track() res = run_from_args(parsed) diff --git a/test/integration.sh b/test/integration.sh new file mode 100755 index 00000000000..a6a6876af56 --- /dev/null +++ b/test/integration.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +. /usr/src/app/test/setup.sh +workon dbt + +cd /usr/src/app +tox -e integration-py27,integration-py35 diff --git a/test/runner.sh b/test/runner.sh deleted file mode 100755 index c11905771fd..00000000000 --- a/test/runner.sh +++ /dev/null @@ -1,14 +0,0 @@ -#!/bin/bash - -. /usr/local/bin/virtualenvwrapper.sh -workon dbt - -cd /usr/src/app - -if [ $# = 0 ]; then - echo "Running all tests" - tox -else - echo "Running specified tests" - DBT_INVOCATION_ENV="ci-local" nosetests -v --nocapture --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov $@ -fi diff --git a/test/unit/test_config.py b/test/unit/test_config.py new file mode 100644 index 00000000000..67556b6353a --- /dev/null +++ b/test/unit/test_config.py @@ -0,0 +1,48 @@ +import os +import unittest +import yaml + +import dbt.config + +if os.name == 'nt': + TMPDIR = 'c:/Windows/TEMP' +else: + TMPDIR = '/tmp' + +class ConfigTest(unittest.TestCase): + + def set_up_empty_config(self): + profiles_path = '{}/profiles.yml'.format(TMPDIR) + + with open(profiles_path, 'w') as f: + f.write(yaml.dump({})) + + def set_up_config_options(self, send_anonymous_usage_stats=False): + profiles_path = '{}/profiles.yml'.format(TMPDIR) + + with open(profiles_path, 'w') as f: + f.write(yaml.dump({ + 'config': { + 'send_anonymous_usage_stats': send_anonymous_usage_stats + } + })) + + def tearDown(self): + profiles_path = '{}/profiles.yml'.format(TMPDIR) + + try: + os.remove(profiles_path) + except: + pass + + def test__implicit_opt_in(self): + self.set_up_empty_config() + self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR)) + + def test__explicit_opt_out(self): + self.set_up_config_options(send_anonymous_usage_stats=False) + self.assertFalse(dbt.config.send_anonymous_usage_stats(TMPDIR)) + + def test__explicit_opt_in(self): + self.set_up_config_options(send_anonymous_usage_stats=True) + self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR)) diff --git a/tox.ini b/tox.ini index 81e65b8b070..ef84fef34b7 100644 --- a/tox.ini +++ b/tox.ini @@ -1,17 +1,33 @@ -# Tox (http://tox.testrun.org/) is a tool for running tests -# in multiple virtualenvs. This configuration file will run the -# test suite on all supported python versions. To use it, "pip install tox" -# and then run "tox" from this directory. - [tox] -envlist = py27, py35 +envlist = unit-py27, unit-py35, integration-py27, integration-py35 + +[testenv:unit-py27] +basepython = python2.7 +commands = /bin/bash -c '$(which nosetests) -v test/unit' +deps = + -rrequirements.txt + -rdev_requirements.txt + +[testenv:unit-py35] +basepython = python3.5 +commands = /bin/bash -c '$(which nosetests) -v test/unit' +deps = + -rrequirements.txt + -rdev_requirements.txt -[testenv] -commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/unit test/integration/*' +[testenv:integration-py27] +basepython = python2.7 +commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*' deps = -rrequirements.txt -rdev_requirements.txt +[testenv:integration-py35] +basepython = python3.5 +commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*' +deps = + -rrequirements.txt + -rdev_requirements.txt [testenv:pywin] basepython = {env:PYTHON:}\python.exe