Skip to content

Commit

Permalink
Support kms in haversine_distance macro (#340)
Browse files Browse the repository at this point in the history
  • Loading branch information
bastienboutonnet authored and clrcrl committed May 18, 2021
1 parent 6c1abd3 commit 4b58de9
Show file tree
Hide file tree
Showing 10 changed files with 136 additions and 22 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
* Support a new argument, `zero_length_range_allowed` in the `mutually_exclusive_ranges` test ([#307](https://github.com/fishtown-analytics/dbt-utils/pull/307) [@zemekeng](https://github.com/zemekeneng))
* Add new schema test, `sequential_values` ([#318](https://github.com/fishtown-analytics/dbt-utils/pull/318), inspired by [@hundredwatt](https://github.com/hundredwatt))
* Support `quarter` in the `postgres__last_day` macro ([#333](https://github.com/fishtown-analytics/dbt-utils/pull/333/files), [@seunghanhong](https://github.com/seunghanhong))
* Add new argument, `unit`, to `haversine_distance` [#340](https://github.com/fishtown-analytics/dbt-utils/pull/340) [@bastienboutonnet](https://github.com/bastienboutonnet)


## Fixes
Expand Down
29 changes: 16 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Usage:
---
### Date/Time
#### date_spine ([source](macros/datetime/date_spine.sql))
This macro returns the sql required to build a date spine. The spine will include the `start_date` (if it is aligned to the `datepart`), but it will not include the `end_date`.
This macro returns the sql required to build a date spine. The spine will include the `start_date` (if it is aligned to the `datepart`), but it will not include the `end_date`.

Usage:
```
Expand All @@ -111,9 +111,12 @@ Usage:
#### haversine_distance ([source](macros/geo/haversine_distance.sql))
This macro calculates the [haversine distance](http://daynebatten.com/2015/09/latitude-longitude-distance-sql/) between a pair of x/y coordinates.

Usage:
Optionally takes a `unit` string parameter ('km' or 'mi') which defaults to miles (imperial system).

**Usage:**

```
{{ dbt_utils.haversine_distance(lat1=<float>,lon1=<float>,lat2=<float>,lon2=<float>) }}
{{ dbt_utils.haversine_distance(lat1=<float>,lon1=<float>,lat2=<float>,lon2=<float>, unit='mi'<string>) }}
```
---
### Schema Tests
Expand Down Expand Up @@ -181,13 +184,13 @@ models:

```

This macro can also be used at the column level. When this is done, the `expression` is evaluated against the column.
This macro can also be used at the column level. When this is done, the `expression` is evaluated against the column.

```yaml
version: 2
models:
models:
- name: model_name
columns:
columns:
- name: col_a
tests:
- dbt_utils.expression_is_true:
Expand All @@ -197,7 +200,7 @@ models:
- dbt_utils.expression_is_true:
expression: '= 1'
condition: col_a = 1

```


Expand Down Expand Up @@ -361,7 +364,7 @@ models:
upper_bound_column: ended_at
partition_by: customer_id
gaps: required
# test that each customer can have subscriptions that start and end on the same date
- name: subscriptions
tests:
Expand Down Expand Up @@ -512,9 +515,9 @@ An optional `quote_columns` parameter (`default=false`) can also be used if a co


#### accepted_range ([source](macros/schema_tests/accepted_range.sql))
This test checks that a column's values fall inside an expected range. Any combination of `min_value` and `max_value` is allowed, and the range can be inclusive or exclusive. Provide a `where` argument to filter to specific records only.
This test checks that a column's values fall inside an expected range. Any combination of `min_value` and `max_value` is allowed, and the range can be inclusive or exclusive. Provide a `where` argument to filter to specific records only.

In addition to comparisons to a scalar value, you can also compare to another column's values. Any data type that supports the `>` or `<` operators can be compared, so you could also run tests like checking that all order dates are in the past.
In addition to comparisons to a scalar value, you can also compare to another column's values. Any data type that supports the `>` or `<` operators can be compared, so you could also run tests like checking that all order dates are in the past.

Usage:
```yaml
Expand All @@ -528,19 +531,19 @@ models:
- dbt_utils.accepted_range:
min_value: 0
inclusive: false
- name: account_created_at
tests:
- dbt_utils.accepted_range:
max_value: "getdate()"
#inclusive is true by default
- name: num_returned_orders
tests:
- dbt_utils.accepted_range:
min_value: 0
max_value: "num_orders"
- name: num_web_sessions
tests:
- dbt_utils.accepted_range:
Expand Down
2 changes: 2 additions & 0 deletions integration_tests/data/geo/data_haversine_km.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
lat_1,lon_1,lat_2,lon_2,output
48.864716,2.349014,52.379189,4.899431,430
2 changes: 2 additions & 0 deletions integration_tests/data/geo/data_haversine_mi.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
lat_1,lon_1,lat_2,lon_2,output
48.864716,2.349014,52.379189,4.899431,267
4 changes: 2 additions & 2 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ seeds:
sql:
data_events_20180103:
+schema: events

schema_tests:
data_test_sequential_timestamps:
+column_types:
my_timestamp: timestamp
my_timestamp: timestamp
1 change: 0 additions & 1 deletion integration_tests/macros/tests.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@

{% macro test_assert_equal(model, actual, expected) %}

select count(*) from {{ model }} where {{ actual }} != {{ expected }}

{% endmacro %}
Expand Down
13 changes: 13 additions & 0 deletions integration_tests/models/geo/schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
version: 2

models:
- name: test_haversine_distance_km
tests:
- assert_equal:
actual: actual
expected: expected
- name: test_haversine_distance_mi
tests:
- assert_equal:
actual: actual
expected: expected
23 changes: 23 additions & 0 deletions integration_tests/models/geo/test_haversine_distance_km.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
with data as (
select * from {{ ref('data_haversine_km') }}
),
final as (
select
output as expected,
cast(
{{
dbt_utils.haversine_distance(
lat1='lat_1',
lon1='lon_1',
lat2='lat_2',
lon2='lon_2',
unit='km'
)
}} as numeric
) as actual
from data
)
select
expected,
round(actual,0) as actual
from final
39 changes: 39 additions & 0 deletions integration_tests/models/geo/test_haversine_distance_mi.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
with data as (
select * from {{ ref('data_haversine_mi') }}
),
final as (
select
output as expected,
cast(
{{
dbt_utils.haversine_distance(
lat1='lat_1',
lon1='lon_1',
lat2='lat_2',
lon2='lon_2',
unit='mi'
)
}} as numeric
) as actual
from data

union all

select
output as expected,
cast(
{{
dbt_utils.haversine_distance(
lat1='lat_1',
lon1='lon_1',
lat2='lat_2',
lon2='lon_2',
)
}} as numeric
) as actual
from data
)
select
expected,
round(actual,0) as actual
from final
44 changes: 38 additions & 6 deletions macros/geo/haversine_distance.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,49 @@ This calculates the distance between two sets of latitude and longitude.
The formula is from the following blog post:
http://daynebatten.com/2015/09/latitude-longitude-distance-sql/

The arguments should be float type.
The arguments should be float type.
#}

{% macro haversine_distance(lat1,lon1,lat2,lon2) -%}
{{ return(adapter.dispatch('haversine_distance', packages = dbt_utils._get_utils_namespaces())(lat1,lon1,lat2,lon2)) }}
{% macro degrees_to_radians(degrees) -%}
acos(-1) * {{degrees}} / 180
{%- endmacro %}

{% macro haversine_distance(lat1, lon1, lat2, lon2, unit='mi') -%}
{{ return(adapter.dispatch('haversine_distance', packages = dbt_utils._get_utils_namespaces())(lat1,lon1,lat2,lon2,unit)) }}
{% endmacro %}

{% macro default__haversine_distance(lat1,lon1,lat2,lon2) -%}
{% macro default__haversine_distance(lat1, lon1, lat2, lon2, unit='mi') -%}
{%- if unit == 'mi' %}
{% set conversion_rate = 1 %}
{% elif unit == 'km' %}
{% set conversion_rate = 1.60934 %}
{% else %}
{{ exceptions.raise_compiler_error("unit input must be one of 'mi' or 'km'. Got " ~ unit) }}
{% endif %}

2 * 3961 * asin(sqrt((sin(radians(({{lat2}} - {{lat1}}) / 2))) ^ 2 +
2 * 3961 * asin(sqrt(pow((sin(radians(({{ lat2 }} - {{ lat1 }}) / 2))), 2) +
cos(radians({{lat1}})) * cos(radians({{lat2}})) *
(sin(radians(({{lon2}} - {{lon1}}) / 2))) ^ 2))
pow((sin(radians(({{ lon2 }} - {{ lon1 }}) / 2))), 2))) * {{ conversion_rate }}

{%- endmacro %}



{% macro bigquery__haversine_distance(lat1, lon1, lat2, lon2, unit='mi') -%}
{% set radians_lat1 = dbt_utils.degrees_to_radians(lat1) %}
{% set radians_lat2 = dbt_utils.degrees_to_radians(lat2) %}
{% set radians_lon1 = dbt_utils.degrees_to_radians(lon1) %}
{% set radians_lon2 = dbt_utils.degrees_to_radians(lon2) %}
{%- if unit == 'mi' %}
{% set conversion_rate = 1 %}
{% elif unit == 'km' %}
{% set conversion_rate = 1.60934 %}
{% else %}
{{ exceptions.raise_compiler_error("unit input must be one of 'mi' or 'km'. Got " ~ unit) }}
{% endif %}
2 * 3961 * asin(sqrt(pow(sin(({{ radians_lat2 }} - {{ radians_lat1 }}) / 2), 2) +
cos({{ radians_lat1 }}) * cos({{ radians_lat2 }}) *
pow(sin(({{ radians_lon2 }} - {{ radians_lon1 }}) / 2), 2))) * {{ conversion_rate }}

{%- endmacro %}

0 comments on commit 4b58de9

Please sign in to comment.