bugfix: respect config options in dbt_project.yml (#255)

* respect config options in dbt_project.yml * add unit test harness
dbt-labs · Dec 28, 2016 · 2fe3758 · 2fe3758
1 parent a9161cf
commit 2fe3758
Show file tree

Hide file tree

Showing 9 changed files with 140 additions and 54 deletions.
diff --git a/.gitignore b/.gitignore
@@ -23,6 +23,7 @@ var/
 *.egg-info/
 .installed.cfg
 *.egg
+logs/
 
 # PyInstaller
 #  Usually these files are written by a python script from a template
@@ -60,3 +61,6 @@ target/
 
 #Ipython Notebook
 .ipynb_checkpoints
+
+#Emacs
+*~
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,9 @@
 ## dbt 0.6.1 (unreleased)
 
+#### Bugfixes
+
+- respect `config` options in profiles.yml ([#255](https://github.com/analyst-collective/dbt/pull/255))
+
 #### Changes
 
 - add `--debug` flag, replace calls to `print()` with a global logger ([#256](https://github.com/analyst-collective/dbt/pull/256))
@@ -62,7 +66,7 @@ Use `{{ target }}` to interpolate profile variables into your model definitions.
 
 ```sql
 -- only use the last week of data in development
-select * from events 
+select * from events
 
 {% if target.name == 'dev' %}
 where created_at > getdate() - interval '1 week'
@@ -227,7 +231,7 @@ As `dbt` has grown, we found this implementation to be a little unwieldy and har
 
 The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/analyst-collective/dbt/milestone/16 .
 
-As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns: 
+As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns:
 
 
 
@@ -244,7 +248,7 @@ See https://github.com/analyst-collective/dbt/releases/tag/v0.5.1
 
 ## dbt release 0.5.1
 
-### 0. tl;dr 
+### 0. tl;dr
 
 1. Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
 2. Column type resolution for incremental models (no more `Value too long for character type` errors)
@@ -281,15 +285,15 @@ The archived tables will mirror the schema of the source tables they're generate
 
 1. `valid_from`: The timestamp when this archived row was inserted (and first considered valid)
 1. `valid_to`: The timestamp when this archived row became invalidated. The first archived record for a given `unique_key` has `valid_to = NULL`. When newer data is archived for that `unique_key`, the `valid_to` field of the old record is set to the `valid_from` field of the new record!
-1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row). 
+1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row).
 
 dbt models can be built on top of these archived tables. The most recent record for a given `unique_key` is the one where `valid_to` is `null`.
 
 To run this archive process, use the command `dbt archive`. After testing and confirming that the archival works, you should schedule this process through cron (or similar).
 
 ### 2. Incremental column expansion https://github.com/analyst-collective/dbt/issues/175
 
-Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field. 
+Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field.
 In practice, this error looks like:
 
 ```
@@ -485,7 +489,7 @@ models:
       post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())"
 ```
 
-Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks. 
+Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks.
 
 Finally, note that the `grant` statement uses the (hopefully familiar) `{{this}}` syntax whereas the `insert` statement uses the `{{this.name}}` syntax. When DBT creates a model:
  - A temp table is created
@@ -516,7 +520,7 @@ config:
 
 ![windows](https://pbs.twimg.com/profile_images/571398080688181248/57UKydQS.png)
 
---- 
+---
 
 dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.
 
@@ -540,7 +544,7 @@ pip install -U dbt
 # To run models
 dbt run # same as before
 
-# to dry-run models 
+# to dry-run models
 dbt run --dry # previously dbt test
 
 # to run schema tests
@@ -553,10 +557,10 @@ Previously, dbt calculated "new" incremental records to insert by querying for r
 
 User 1 Session 1 Event 1 @ 12:00
 User 1 Session 1 Event 2 @ 12:01
--- dbt run -- 
+-- dbt run --
 User 1 Session 1 Event 3 @ 12:02
 
-In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate! 
+In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!
 
 With this release, you can now add a `unique_key` expression to an incremental model config. Records matching the `unique_key` will be `delete`d from the incremental table, then `insert`ed as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.
 
@@ -570,7 +574,7 @@ sessions:
 
 ### 3. Run schema validations concurrently https://github.com/analyst-collective/dbt/issues/100
 
-The `threads` run-target config now applies to schema validations too. Try it with `dbt test` 
+The `threads` run-target config now applies to schema validations too. Try it with `dbt test`
 
 ### 4. Connect to database over ssh https://github.com/analyst-collective/dbt/issues/93
 
@@ -588,10 +592,10 @@ warehouse:
       dbname: my-db
       schema: dbt_dbanin
       threads: 8
-      ssh-host: ssh-host-name  # <------ Add this line 
+      ssh-host: ssh-host-name  # <------ Add this line
   run-target: dev
 ```
- 
+
 ### Remove the model-defaults config https://github.com/analyst-collective/dbt/issues/111
 
 The `model-defaults` config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:
@@ -688,12 +692,12 @@ from users
 where email not in (select email from __dbt__CTE__employees)
 ```
 
-Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity. 
+Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.
 
 
 ### 4. Feature: In-model configs https://github.com/analyst-collective/dbt/issues/88
 
-Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file. 
+Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.
 
 An in-model-config looks like this:
 
@@ -703,7 +707,7 @@ An in-model-config looks like this:
 -- python function syntax
 {{ config(materialized="incremental", sql_where="id > (select max(id) from {{this}})") }}
 -- OR json syntax
-{{ 
+{{
     config({"materialized:" "incremental", "sql_where" : "id > (select max(id) from {{this}})"})
 }}
 

diff --git a/Makefile b/Makefile
@@ -1,10 +1,16 @@
-.PHONY: test
+.PHONY: test test-unit test-integration
 
 changed_tests := `git status --porcelain | grep '^\(M\| M\|A\| A\)' | awk '{ print $$2 }' | grep '\/test_[a-zA-Z_\-\.]\+.py'`
 
-test:
-	@echo "Test run starting..."
-	@docker-compose run test /usr/src/app/test/runner.sh
+test: test-unit test-integration
+
+test-unit:
+	@echo "Unit test run starting..."
+	tox -e unit-py27,unit-py35
+
+test-integration:
+	@echo "Integration test run starting..."
+	@docker-compose run test /usr/src/app/test/integration.sh
 
 test-new:
 	@echo "Test run starting..."

diff --git a/dbt/config.py b/dbt/config.py
@@ -0,0 +1,25 @@
+import os.path
+import yaml
+
+import dbt.project as project
+
+
+def read_config(profiles_dir):
+    # TODO: validate profiles_dir
+    path = os.path.join(profiles_dir, 'profiles.yml')
+
+    if os.path.isfile(path):
+        with open(path, 'r') as f:
+            profile = yaml.safe_load(f)
+            return profile.get('config', {})
+
+    return {}
+
+
+def send_anonymous_usage_stats(profiles_dir):
+    config = read_config(profiles_dir)
+
+    if config is not None and config.get("send_anonymous_usage_stats") == False:
+        return False
+
+    return True
diff --git a/dbt/main.py b/dbt/main.py
@@ -18,17 +18,7 @@
 import dbt.task.test as test_task
 import dbt.task.archive as archive_task
 import dbt.tracking
-
-
-def is_opted_out(profiles_dir):
-    profiles = project.read_profiles(profiles_dir)
-
-    if profiles is None or profiles.get("config") is None:
-        return False
-    elif profiles['config'].get("send_anonymous_usage_stats") == False:
-        return True
-    else:
-        return False
+import dbt.config as config
 
 def main(args=None):
     if args is None:
@@ -48,7 +38,7 @@ def handle(args):
     initialize_logger(parsed.debug)
 
     # this needs to happen after args are parsed so we can determine the correct profiles.yml file
-    if is_opted_out(parsed.profiles_dir):
+    if not config.send_anonymous_usage_stats(parsed.profiles_dir):
         dbt.tracking.do_not_track()
 
     res = run_from_args(parsed)

diff --git a/test/integration.sh b/test/integration.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+
+. /usr/src/app/test/setup.sh
+workon dbt
+
+cd /usr/src/app
+tox -e integration-py27,integration-py35
diff --git a/test/runner.sh b/test/runner.sh
diff --git a/test/unit/test_config.py b/test/unit/test_config.py
@@ -0,0 +1,48 @@
+import os
+import unittest
+import yaml
+
+import dbt.config
+
+if os.name == 'nt':
+    TMPDIR = 'c:/Windows/TEMP'
+else:
+    TMPDIR = '/tmp'
+
+class ConfigTest(unittest.TestCase):
+
+    def set_up_empty_config(self):
+        profiles_path = '{}/profiles.yml'.format(TMPDIR)
+
+        with open(profiles_path, 'w') as f:
+            f.write(yaml.dump({}))
+
+    def set_up_config_options(self, send_anonymous_usage_stats=False):
+        profiles_path = '{}/profiles.yml'.format(TMPDIR)
+
+        with open(profiles_path, 'w') as f:
+            f.write(yaml.dump({
+                'config': {
+                    'send_anonymous_usage_stats': send_anonymous_usage_stats
+                }
+            }))
+
+    def tearDown(self):
+        profiles_path = '{}/profiles.yml'.format(TMPDIR)
+
+        try:
+            os.remove(profiles_path)
+        except:
+            pass
+
+    def test__implicit_opt_in(self):
+        self.set_up_empty_config()
+        self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR))
+
+    def test__explicit_opt_out(self):
+        self.set_up_config_options(send_anonymous_usage_stats=False)
+        self.assertFalse(dbt.config.send_anonymous_usage_stats(TMPDIR))
+
+    def test__explicit_opt_in(self):
+        self.set_up_config_options(send_anonymous_usage_stats=True)
+        self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR))
diff --git a/tox.ini b/tox.ini
@@ -1,17 +1,33 @@
-# Tox (http://tox.testrun.org/) is a tool for running tests
-# in multiple virtualenvs. This configuration file will run the
-# test suite on all supported python versions. To use it, "pip install tox"
-# and then run "tox" from this directory.
-
 [tox]
-envlist = py27, py35
+envlist = unit-py27, unit-py35, integration-py27, integration-py35
+
+[testenv:unit-py27]
+basepython = python2.7
+commands = /bin/bash -c '$(which nosetests) -v test/unit'
+deps =
+    -rrequirements.txt
+    -rdev_requirements.txt
+
+[testenv:unit-py35]
+basepython = python3.5
+commands = /bin/bash -c '$(which nosetests) -v test/unit'
+deps =
+    -rrequirements.txt
+    -rdev_requirements.txt
 
-[testenv]
-commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/unit test/integration/*'
+[testenv:integration-py27]
+basepython = python2.7
+commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*'
 deps =
     -rrequirements.txt
     -rdev_requirements.txt
 
+[testenv:integration-py35]
+basepython = python3.5
+commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*'
+deps =
+    -rrequirements.txt
+    -rdev_requirements.txt
 
 [testenv:pywin]
 basepython = {env:PYTHON:}\python.exe