Skip to content

Commit

Permalink
bugfix: respect config options in dbt_project.yml (#255)
Browse files Browse the repository at this point in the history
* respect config options in dbt_project.yml
* add unit test harness
  • Loading branch information
cmcarthur authored Dec 28, 2016
1 parent a9161cf commit 2fe3758
Show file tree
Hide file tree
Showing 9 changed files with 140 additions and 54 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ var/
*.egg-info/
.installed.cfg
*.egg
logs/

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -60,3 +61,6 @@ target/

#Ipython Notebook
.ipynb_checkpoints

#Emacs
*~
36 changes: 20 additions & 16 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
## dbt 0.6.1 (unreleased)

#### Bugfixes

- respect `config` options in profiles.yml ([#255](https://github.com/analyst-collective/dbt/pull/255))

#### Changes

- add `--debug` flag, replace calls to `print()` with a global logger ([#256](https://github.com/analyst-collective/dbt/pull/256))
Expand Down Expand Up @@ -62,7 +66,7 @@ Use `{{ target }}` to interpolate profile variables into your model definitions.

```sql
-- only use the last week of data in development
select * from events
select * from events

{% if target.name == 'dev' %}
where created_at > getdate() - interval '1 week'
Expand Down Expand Up @@ -227,7 +231,7 @@ As `dbt` has grown, we found this implementation to be a little unwieldy and har

The additions of automated testing and a more comprehensive manual testing process will go a long way to ensuring the future stability of dbt. We're going to get started on these tasks soon, and you can follow our progress here: https://github.com/analyst-collective/dbt/milestone/16 .

As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns:
As always, feel free to [reach out to us on Slack](http://ac-slackin.herokuapp.com/) with any questions or concerns:



Expand All @@ -244,7 +248,7 @@ See https://github.com/analyst-collective/dbt/releases/tag/v0.5.1

## dbt release 0.5.1

### 0. tl;dr
### 0. tl;dr

1. Raiders of the Lost Archive -- version your raw data to make historical queries more accurate
2. Column type resolution for incremental models (no more `Value too long for character type` errors)
Expand Down Expand Up @@ -281,15 +285,15 @@ The archived tables will mirror the schema of the source tables they're generate
1. `valid_from`: The timestamp when this archived row was inserted (and first considered valid)
1. `valid_to`: The timestamp when this archived row became invalidated. The first archived record for a given `unique_key` has `valid_to = NULL`. When newer data is archived for that `unique_key`, the `valid_to` field of the old record is set to the `valid_from` field of the new record!
1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row).
1. `scd_id`: A unique key generated for each archive record. Scd = [Slowly Changing Dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row).

dbt models can be built on top of these archived tables. The most recent record for a given `unique_key` is the one where `valid_to` is `null`.

To run this archive process, use the command `dbt archive`. After testing and confirming that the archival works, you should schedule this process through cron (or similar).

### 2. Incremental column expansion https://github.com/analyst-collective/dbt/issues/175

Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field.
Incremental tables are a powerful dbt feature, but there was at least one edge case which makes working with them difficult. During the first run of an incremental model, Redshift will infer a type for every column in the table. Subsequent runs can insert new data which does not conform to the expected type. One example is a `varchar(16)` field which is inserted into a `varchar(8)` field.
In practice, this error looks like:

```
Expand Down Expand Up @@ -485,7 +489,7 @@ models:
post-hook: "insert into my_audit_table (model_name, run_at) values ({{this.name}}, getdate())"
```

Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks.
Hooks are recursively appended, so the `my_model` model will only receive the `grant select...` hook, whereas the `some_model` model will receive _both_ the `grant select...` and `insert into...` hooks.

Finally, note that the `grant` statement uses the (hopefully familiar) `{{this}}` syntax whereas the `insert` statement uses the `{{this.name}}` syntax. When DBT creates a model:
- A temp table is created
Expand Down Expand Up @@ -516,7 +520,7 @@ config:

![windows](https://pbs.twimg.com/profile_images/571398080688181248/57UKydQS.png)

---
---

dbt v0.4.1 provides improvements to incremental models, performance improvements, and ssh support for db connections.

Expand All @@ -540,7 +544,7 @@ pip install -U dbt
# To run models
dbt run # same as before
# to dry-run models
# to dry-run models
dbt run --dry # previously dbt test
# to run schema tests
Expand All @@ -553,10 +557,10 @@ Previously, dbt calculated "new" incremental records to insert by querying for r

User 1 Session 1 Event 1 @ 12:00
User 1 Session 1 Event 2 @ 12:01
-- dbt run --
-- dbt run --
User 1 Session 1 Event 3 @ 12:02

In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!
In this scenario, there are two possible outcomes depending on the `sql_where` chosen: 1) Event 3 does not get included in the Session 1 record for User 1 (bad), or 2) Session 1 is duplicated in the sessions table (bad). Both of these outcomes are inadequate!

With this release, you can now add a `unique_key` expression to an incremental model config. Records matching the `unique_key` will be `delete`d from the incremental table, then `insert`ed as usual. This makes it possible to maintain data accuracy without recalculating the entire table on every run.

Expand All @@ -570,7 +574,7 @@ sessions:

### 3. Run schema validations concurrently https://github.com/analyst-collective/dbt/issues/100

The `threads` run-target config now applies to schema validations too. Try it with `dbt test`
The `threads` run-target config now applies to schema validations too. Try it with `dbt test`

### 4. Connect to database over ssh https://github.com/analyst-collective/dbt/issues/93

Expand All @@ -588,10 +592,10 @@ warehouse:
dbname: my-db
schema: dbt_dbanin
threads: 8
ssh-host: ssh-host-name # <------ Add this line
ssh-host: ssh-host-name # <------ Add this line
run-target: dev
```

### Remove the model-defaults config https://github.com/analyst-collective/dbt/issues/111

The `model-defaults` config doesn't make sense in a dbt world with dependencies. To apply default configs to your package, add the configs immediately under the package definition:
Expand Down Expand Up @@ -688,12 +692,12 @@ from users
where email not in (select email from __dbt__CTE__employees)
```

Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.
Ephemeral models play nice with other ephemeral models, incremental models, and regular table/view models. Feel free to mix and match different materialization options to optimize for performance and simplicity.


### 4. Feature: In-model configs https://github.com/analyst-collective/dbt/issues/88

Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.
Configurations can now be specified directly inside of models. These in-model configs work exactly the same as configs inside of the dbt_project.yml file.

An in-model-config looks like this:

Expand All @@ -703,7 +707,7 @@ An in-model-config looks like this:
-- python function syntax
{{ config(materialized="incremental", sql_where="id > (select max(id) from {{this}})") }}
-- OR json syntax
{{
{{
config({"materialized:" "incremental", "sql_where" : "id > (select max(id) from {{this}})"})
}}
Expand Down
14 changes: 10 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
.PHONY: test
.PHONY: test test-unit test-integration

changed_tests := `git status --porcelain | grep '^\(M\| M\|A\| A\)' | awk '{ print $$2 }' | grep '\/test_[a-zA-Z_\-\.]\+.py'`

test:
@echo "Test run starting..."
@docker-compose run test /usr/src/app/test/runner.sh
test: test-unit test-integration

test-unit:
@echo "Unit test run starting..."
tox -e unit-py27,unit-py35

test-integration:
@echo "Integration test run starting..."
@docker-compose run test /usr/src/app/test/integration.sh

test-new:
@echo "Test run starting..."
Expand Down
25 changes: 25 additions & 0 deletions dbt/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
import os.path
import yaml

import dbt.project as project


def read_config(profiles_dir):
# TODO: validate profiles_dir
path = os.path.join(profiles_dir, 'profiles.yml')

if os.path.isfile(path):
with open(path, 'r') as f:
profile = yaml.safe_load(f)
return profile.get('config', {})

return {}


def send_anonymous_usage_stats(profiles_dir):
config = read_config(profiles_dir)

if config is not None and config.get("send_anonymous_usage_stats") == False:
return False

return True
14 changes: 2 additions & 12 deletions dbt/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,17 +18,7 @@
import dbt.task.test as test_task
import dbt.task.archive as archive_task
import dbt.tracking


def is_opted_out(profiles_dir):
profiles = project.read_profiles(profiles_dir)

if profiles is None or profiles.get("config") is None:
return False
elif profiles['config'].get("send_anonymous_usage_stats") == False:
return True
else:
return False
import dbt.config as config

def main(args=None):
if args is None:
Expand All @@ -48,7 +38,7 @@ def handle(args):
initialize_logger(parsed.debug)

# this needs to happen after args are parsed so we can determine the correct profiles.yml file
if is_opted_out(parsed.profiles_dir):
if not config.send_anonymous_usage_stats(parsed.profiles_dir):
dbt.tracking.do_not_track()

res = run_from_args(parsed)
Expand Down
7 changes: 7 additions & 0 deletions test/integration.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

. /usr/src/app/test/setup.sh
workon dbt

cd /usr/src/app
tox -e integration-py27,integration-py35
14 changes: 0 additions & 14 deletions test/runner.sh

This file was deleted.

48 changes: 48 additions & 0 deletions test/unit/test_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
import os
import unittest
import yaml

import dbt.config

if os.name == 'nt':
TMPDIR = 'c:/Windows/TEMP'
else:
TMPDIR = '/tmp'

class ConfigTest(unittest.TestCase):

def set_up_empty_config(self):
profiles_path = '{}/profiles.yml'.format(TMPDIR)

with open(profiles_path, 'w') as f:
f.write(yaml.dump({}))

def set_up_config_options(self, send_anonymous_usage_stats=False):
profiles_path = '{}/profiles.yml'.format(TMPDIR)

with open(profiles_path, 'w') as f:
f.write(yaml.dump({
'config': {
'send_anonymous_usage_stats': send_anonymous_usage_stats
}
}))

def tearDown(self):
profiles_path = '{}/profiles.yml'.format(TMPDIR)

try:
os.remove(profiles_path)
except:
pass

def test__implicit_opt_in(self):
self.set_up_empty_config()
self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR))

def test__explicit_opt_out(self):
self.set_up_config_options(send_anonymous_usage_stats=False)
self.assertFalse(dbt.config.send_anonymous_usage_stats(TMPDIR))

def test__explicit_opt_in(self):
self.set_up_config_options(send_anonymous_usage_stats=True)
self.assertTrue(dbt.config.send_anonymous_usage_stats(TMPDIR))
32 changes: 24 additions & 8 deletions tox.ini
Original file line number Diff line number Diff line change
@@ -1,17 +1,33 @@
# Tox (http://tox.testrun.org/) is a tool for running tests
# in multiple virtualenvs. This configuration file will run the
# test suite on all supported python versions. To use it, "pip install tox"
# and then run "tox" from this directory.

[tox]
envlist = py27, py35
envlist = unit-py27, unit-py35, integration-py27, integration-py35

[testenv:unit-py27]
basepython = python2.7
commands = /bin/bash -c '$(which nosetests) -v test/unit'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv:unit-py35]
basepython = python3.5
commands = /bin/bash -c '$(which nosetests) -v test/unit'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv]
commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/unit test/integration/*'
[testenv:integration-py27]
basepython = python2.7
commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv:integration-py35]
basepython = python3.5
commands = /bin/bash -c 'HOME=/root/ DBT_INVOCATION_ENV=ci-circle {envpython} $(which nosetests) -v --with-coverage --cover-branches --cover-html --cover-html-dir=htmlcov test/integration/*'
deps =
-rrequirements.txt
-rdev_requirements.txt

[testenv:pywin]
basepython = {env:PYTHON:}\python.exe
Expand Down

0 comments on commit 2fe3758

Please sign in to comment.