Skip to content

Commit

Permalink
0.7.0 into main (dbt-labs#372)
Browse files Browse the repository at this point in the history
* Tidy up changelog

* Add 0.7.0 entry to changelog

* Add order_by argument to get_column_values (dbt-labs#349)

* Add slugify macro to utils, use in pivot macro (dbt-labs#314)

* 0.20.0 compatibility (dbt-labs#371)

* Explicitly redefine Redshift -> default

* Upgrade generic tests

* Rm namespaces macro. New dispatch syntax

* Run tests with 0.20.0rc1

* Update changelog, readme

Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>

* Simplify concat (dbt-labs#373)

* Postgres also have an alternative concat binary operation (dbt-labs#296)

* Update default implementation of concat macro

Co-authored-by: Christophe Duong <christophe.duong@gmail.com>

Co-authored-by: Jeremy Cohen <jeremy@fishtownanalytics.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
  • Loading branch information
3 people authored Jun 6, 2021
1 parent 0e00fb5 commit a729044
Show file tree
Hide file tree
Showing 77 changed files with 419 additions and 333 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ jobs:

integration-postgres:
docker:
- image: circleci/python:3.6.3-stretch
- image: circleci/python:3.6.13-stretch
- image: circleci/postgres:9.6.5-alpine-ram

steps:
Expand Down
67 changes: 58 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,76 @@
# dbt-utils v0.7.0 (unreleased)
## Breaking changes

### 🚨 New dbt version

dbt v0.20.0 or greater is required for this release. If you are not ready to upgrade, consider using a previous release of this package. In accordance with the version upgrade, this package release includes breaking changes to:
- Generic (schema) tests
- `dispatch` functionality

### 🚨 get_column_values
The order of (optional) arguments has changed in the `get_column_values` macro.

Before:
```jinja
{% macro get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}
...
{% endmacro %}
```

After:
```jinja
{% macro get_column_values(table, column, max_records=none, default=none) -%}
...
{% endmacro %}
```
If you were relying on the position to match up your optional arguments, this may be a breaking change — in general, we recommend that you explicitly declare any optional arguments (if not all of your arguments!)
```
-- before: This works on previous version of dbt-utils, but on 0.7.0, the `50` would be passed through as the `order_by` argument
{% set payment_methods = dbt_utils.get_column_values(
ref('stg_payments'),
'payment_method',
50
) %}
-- after
{% set payment_methods = dbt_utils.get_column_values(
ref('stg_payments'),
'payment_method',
max_records=50
) %}
```

## Features
* Add new argument, `order_by`, to `get_column_values` (code originally in [#289](https://github.com/fishtown-analytics/dbt-utils/pull/289/) from [@clausherther](https://github.com/clausherther), merged via [#349](https://github.com/fishtown-analytics/dbt-utils/pull/349/))
* Add `slugify` macro, and use it in the pivot macro. :rotating_light: This macro uses the `re` module, which is only available in dbt v0.19.0+. As a result, this feature introduces a breaking change. ([#314](https://github.com/fishtown-analytics/dbt-utils/pull/314))

## Under the hood
* Update the default implementation of concat macro to use `||` operator ([#373](https://github.com/fishtown-analytics/dbt-utils/pull/314) [@ChristopheDuong](https://github.com/ChristopheDuong)). Note this may be a breaking change for spark users.

# dbt-utils v0.6.6

## Fixes

- make `sequential_values` schema test use `dbt_utils.type_timestamp()` to allow for compatibility with db's without timestamp data type. [#376](https://github.com/fishtown-analytics/dbt-utils/pull/376) from [@swanderz](https://github.com/swanderz)

# dbt-utils v0.6.5
## Features
* Add new `accepted_range` test ([#276](https://github.com/fishtown-analytics/dbt-utils/pull/276) [@joellabes](https://github.com/joellabes))
* Make `expression_is_true` work as a column test (code originally in [#226](https://github.com/fishtown-analytics/dbt-utils/pull/226/) from [@elliottohara](https://github.com/elliottohara), merged via [#313])
* Make `expression_is_true` work as a column test (code originally in [#226](https://github.com/fishtown-analytics/dbt-utils/pull/226/) from [@elliottohara](https://github.com/elliottohara), merged via [#313](https://github.com/fishtown-analytics/dbt-utils/pull/313/))
* Add new schema test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton))
* Support a new argument, `zero_length_range_allowed` in the `mutually_exclusive_ranges` test ([#307](https://github.com/fishtown-analytics/dbt-utils/pull/307) [@zemekeng](https://github.com/zemekeneng))
* Support a new argument, `zero_length_range_allowed` in the `mutually_exclusive_ranges` test ([#307](https://github.com/fishtown-analytics/dbt-utils/pull/307) [@zemekeneng](https://github.com/zemekeneng))
* Add new schema test, `sequential_values` ([#318](https://github.com/fishtown-analytics/dbt-utils/pull/318), inspired by [@hundredwatt](https://github.com/hundredwatt))
* Support `quarter` in the `postgres__last_day` macro ([#333](https://github.com/fishtown-analytics/dbt-utils/pull/333/files), [@seunghanhong](https://github.com/seunghanhong))
* Add new argument, `unit`, to `haversine_distance` [#340](https://github.com/fishtown-analytics/dbt-utils/pull/340) [@bastienboutonnet](https://github.com/bastienboutonnet)
* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343])

* Support `quarter` in the `postgres__last_day` macro ([#333](https://github.com/fishtown-analytics/dbt-utils/pull/333/files) [@seunghanhong](https://github.com/seunghanhong))
* Add new argument, `unit`, to `haversine_distance` ([#340](https://github.com/fishtown-analytics/dbt-utils/pull/340) [@bastienboutonnet](https://github.com/bastienboutonnet))
* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/))

## Fixes
* Handle booleans gracefully in the unpivot macro ([#305](https://github.com/fishtown-analytics/dbt-utils/pull/305) [@avishalom](https://github.com/avishalom))
* Fix a bug in `get_relation_by_prefix` that happens with Snowflake external tables. Now the macro will retrieve tables that match the prefix which are external tables ([#350](https://github.com/fishtown-analytics/dbt-utils/issues/350))
* Fix `cardinality_equality` test when the two tables' column names differed ([#334](https://github.com/fishtown-analytics/dbt-utils/pull/334)) [@joellabes](https://github.com/joellabes)
* Fix a bug in `get_relation_by_prefix` that happens with Snowflake external tables. Now the macro will retrieve tables that match the prefix which are external tables ([#351](https://github.com/fishtown-analytics/dbt-utils/pull/351))
* Fix `cardinality_equality` test when the two tables' column names differed ([#334](https://github.com/fishtown-analytics/dbt-utils/pull/334) [@joellabes](https://github.com/joellabes))

## Under the hood
* Fix Markdown formatting for hub rendering ([#336](https://github.com/fishtown-analytics/dbt-utils/issues/350), [@coapacetic](https://github.com/coapacetic))
* Fix Markdown formatting for hub rendering ([#336](https://github.com/fishtown-analytics/dbt-utils/issues/350) [@coapacetic](https://github.com/coapacetic))
* Reorder readme and improve docs

# dbt-utils v0.6.4
Expand Down
101 changes: 79 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -510,24 +510,50 @@ These macros run a query and return the results of the query as objects. They ar


#### get_column_values ([source](macros/sql/get_column_values.sql))
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation).
It takes an options `default` argument for compiling when the relation does not already exist.
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation) as an array.

Arguments:
- `table` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `column` (required): The name of the column you wish to find the column values of
- `order_by` (optional, default=`'count(*) desc'`): How the results should be ordered. The default is to order by `count(*) desc`, i.e. decreasing frequency. Setting this as `'my_column'` will sort alphabetically, while `'min(created_at)'` will sort by when thevalue was first observed.
- `max_records` (optional, default=`none`): The maximum number of column values you want to return
- `default` (optional, default=`[]`): The results this macro should return if the relation has not yet been created (and therefore has no column values).


**Usage:**
```sql
-- Returns a list of the payment_methods in the stg_payments model_
{% set payment_methods = dbt_utils.get_column_values(table=ref('stg_payments'), column='payment_method') %}
{% for state in states %}
{% for payment_method in payment_methods %}
...
{% endfor %}
...
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
```sql
-- Returns the list sorted alphabetically
{% set payment_methods = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='payment_method'
) %}
```

```sql
-- Returns the list sorted my most recently observed
{% set payment_methods = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='max(created_at) desc',
max_records=50,
default=['bank_transfer', 'coupon', 'credit_card']
%}
...
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given schema- or table-name pattern.

Expand Down Expand Up @@ -948,8 +974,8 @@ When an expression falls outside the range, the function returns:


---
### Logger
#### pretty_time ([source](macros/logger/pretty_time.sql))
### Jinja Helpers
#### pretty_time ([source](macros/jinja_helpers/pretty_time.sql))
This macro returns a string of the current timestamp, optionally taking a datestring format.
```sql
{#- This will return a string like '14:50:34' -#}
Expand All @@ -959,7 +985,7 @@ This macro returns a string of the current timestamp, optionally taking a datest
{{ dbt_utils.pretty_time(format='%Y-%m-%d %H:%M:%S') }}
```

#### pretty_log_format ([source](macros/logger/pretty_log_format.sql))
#### pretty_log_format ([source](macros/jinja_helpers/pretty_log_format.sql))
This macro formats the input in a way that will print nicely to the command line when you `log` it.
```sql
{#- This will return a string like:
Expand All @@ -968,7 +994,7 @@ This macro formats the input in a way that will print nicely to the command line

{{ dbt_utils.pretty_log_format("my pretty message") }}
```
#### log_info ([source](macros/logger/log_info.sql))
#### log_info ([source](macros/jinja_helpers/log_info.sql))
This macro logs a formatted message (with a timestamp) to the command line.
```sql
{{ dbt_utils.log_info("my pretty message") }}
Expand All @@ -979,6 +1005,40 @@ This macro logs a formatted message (with a timestamp) to the command line.
11:07:31 + my pretty message
```

#### slugify ([source](macros/jinja_helpers/slugify.sql))
This macro is useful for transforming Jinja strings into "slugs", and can be useful when using a Jinja object as a column name, especially when that Jinja object is not hardcoded.

For this example, let's pretend that we have payment methods in our payments table like `['venmo App', 'ca$h-money']`, which we can't use as a column name due to the spaces and special characters. This macro does its best to strip those out in a sensible way: `['venmo_app',
'cah_money']`.

```sql
{%- set payment_methods = dbt_utils.get_column_values(
table=ref('raw_payments'),
column='payment_method'
) -%}

select
order_id,
{%- for payment_method in payment_methods %}
sum(case when payment_method = '{{ payment_method }}' then amount end)
as {{ slugify(payment_method) }}_amount,

{% endfor %}
...
```

```sql
select
order_id,

sum(case when payment_method = 'Venmo App' then amount end)
as venmo_app_amount,

sum(case when payment_method = 'ca$h money' then amount end)
as cah_money_amount,
...
```

### Materializations
#### insert_by_period ([source](macros/materializations/insert_by_period_materialization.sql))
`insert_by_period` allows dbt to insert records into a table one period (i.e. day, week) at a time.
Expand Down Expand Up @@ -1046,29 +1106,26 @@ We welcome contributions to this repo! To contribute a new feature or a fix, ple
**Note:** This is primarily relevant to:
- Users and maintainers of community-supported [adapter plugins](https://docs.getdbt.com/docs/available-adapters)
- Users who wish to override a low-lying `dbt_utils` macro with a custom implementation, and have that implementation used by other `dbt_utils` macros

If you use Postgres, Redshift, Snowflake, or Bigquery, this likely does not apply to you.

dbt v0.18.0 introduces `adapter.dispatch()`, a reliable way to define different implementations of the same macro
across different databases.
dbt v0.18.0 introduced [`adapter.dispatch()`](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter#dispatch), a reliable way to define different implementations of the same macro across different databases.

All dispatched macros in `dbt_utils` have an override setting: a `var` named
`dbt_utils_dispatch_list` that accepts a list of package names. If you set this
variable in your project, when dbt searches for implementations of a dispatched
`dbt_utils` macro, it will search through your listed packages _before_ using
the implementations defined in `dbt_utils`.
dbt v0.20.0 introduced a new project-level `dispatch` config that enables an "override" setting for all dispatched macros. If you set this config in your project, when dbt searches for implementations of a macro in the `dbt_utils` namespace, it will search through your list of packages instead of just looking in the `dbt_utils` package.

Set a variable in your `dbt_project.yml`:
Set the config in `dbt_project.yml`:
```yml
vars:
dbt_utils_dispatch_list:
- first_package_to_search # likely the name of your root project (only the root folder)
- second_package_to_search # likely an "add-on" package, such as spark_utils
# dbt_utils is always the last place searched
dispatch:
- macro_namespace: dbt_utils
search_order:
- first_package_to_search # likely the name of your root project
- second_package_to_search # could be a "shim" package, such as spark_utils
- dbt_utils # always include dbt_utils as the last place to search
```

If overriding a dispatched macro with a custom implementation in your own project's `macros/` directory, you must name your custom macro with a prefix: either `default__` (note the two underscores), or the name of your adapter followed by two underscores. For example, if you're running on Postgres and wish to override the behavior of `dbt_utils.datediff` (such that `dbt_utils.date_spine` will use your version instead), you can do this by defining a macro called either `default__datediff` or `postgres__datediff`.

When running on Spark, if dbt needs to dispatch `dbt_utils.datediff`, it will search for the following in order:
Let's say we have the config defined above, and we're running on Spark. When dbt goes to dispatch `dbt_utils.datediff`, it will search for macros the following in order:
```
first_package_to_search.spark__datediff
first_package_to_search.default__datediff
Expand Down
3 changes: 2 additions & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
name: 'dbt_utils'
version: '0.1.0'

require-dbt-version: [">=0.18.0", "<0.20.0"]
require-dbt-version: [">=0.20.0", "<0.21.0"]

config-version: 2

target-path: "target"
Expand Down
5 changes: 3 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@ clean-targets: # directories to be removed by `dbt clean`
- "target"
- "dbt_modules"

vars:
dbt_utils_dispatch_list: ['dbt_utils_integration_tests']
dispatch:
- macro_namespace: 'dbt_utils'
search_order: ['dbt_utils_integration_tests', 'dbt_utils']

seeds:

Expand Down
6 changes: 5 additions & 1 deletion integration_tests/macros/limit_zero.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
{% macro my_custom_macro() %}
whatever
{% endmacro %}

{% macro limit_zero() %}
{{ return(adapter.dispatch('limit_zero', dbt_utils._get_utils_namespaces())()) }}
{{ return(adapter.dispatch('limit_zero', 'dbt_utils')()) }}
{% endmacro %}

{% macro default__limit_zero() %}
Expand Down
14 changes: 6 additions & 8 deletions integration_tests/macros/tests.sql
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@

{% macro test_assert_equal(model, actual, expected) %}
select count(*) from {{ model }} where {{ actual }} != {{ expected }}
{% test assert_equal(model, actual, expected) %}
select * from {{ model }} where {{ actual }} != {{ expected }}

{% endmacro %}
{% endtest %}


{% macro test_not_empty_string(model, arg) %}
{% test not_empty_string(model, column_name) %}

{% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}
select * from {{ model }} where {{ column_name }} = ''

select count(*) from {{ model }} where {{ column_name }} = ''

{% endmacro %}
{% endtest %}
1 change: 0 additions & 1 deletion integration_tests/models/schema_tests/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ models:
- name: id
tests:
- dbt_utils.relationships_where:
from: id
to: ref('data_test_relationships_where_table_1')
field: id
from_condition: id <> 4
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/models/sql/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ models:
- name: test_generate_series
tests:
- dbt_utils.equality:
arg: ref('data_generate_series')
compare_model: ref('data_generate_series')

- name: test_get_column_values
columns:
Expand Down
2 changes: 1 addition & 1 deletion integration_tests/models/sql/test_get_column_values.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default = []) %}
{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default=[], order_by="field") %}


{% if target.type == 'snowflake' %}
Expand Down
7 changes: 7 additions & 0 deletions integration_tests/tests/jinja_helpers/test_slugify.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{% if dbt_utils.slugify('!Hell0 world-hi') == 'hell0_world_hi' %}
{# Return 0 rows for the test to pass #}
select 1 limit 0
{% else %}
{# Return >0 rows for the test to fail #}
select 1
{% endif %}
4 changes: 0 additions & 4 deletions macros/cross_db_utils/_get_utils_namespaces.sql

This file was deleted.

2 changes: 1 addition & 1 deletion macros/cross_db_utils/cast_bool_to_text.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% macro cast_bool_to_text(field) %}
{{ adapter.dispatch('cast_bool_to_text', packages = dbt_utils._get_utils_namespaces()) (field) }}
{{ adapter.dispatch('cast_bool_to_text', 'dbt_utils') (field) }}
{% endmacro %}


Expand Down
18 changes: 2 additions & 16 deletions macros/cross_db_utils/concat.sql
Original file line number Diff line number Diff line change
@@ -1,23 +1,9 @@


{% macro concat(fields) -%}
{{ return(adapter.dispatch('concat', packages = dbt_utils._get_utils_namespaces())(fields)) }}
{{ return(adapter.dispatch('concat', 'dbt_utils')(fields)) }}
{%- endmacro %}

{% macro default__concat(fields) -%}
concat({{ fields|join(', ') }})
{%- endmacro %}

{% macro alternative_concat(fields) %}
{{ fields|join(' || ') }}
{% endmacro %}


{% macro redshift__concat(fields) %}
{{ dbt_utils.alternative_concat(fields) }}
{% endmacro %}


{% macro snowflake__concat(fields) %}
{{ dbt_utils.alternative_concat(fields) }}
{% endmacro %}
{%- endmacro %}
Loading

0 comments on commit a729044

Please sign in to comment.