Add iasWorld dbt tests and namespace all tests #236

dfsnow · 2023-11-17T02:09:23Z

This PR adds 250+ dbt tests covering major iasWorld tables and columns. Tests definitions are drawn from internal QC queries (via Inquire) where possible. However, most tests are checking "common sense" things like no assessed values less than 0, all PINs in a table exist in PARDAT, etc.

Many of the tests in this PR are currently failing (99 out of 294 are failing). This is by design. In most cases, the failures reveal some kind of data error that needs a correction or decision. Some of them may be false positives, but many of them I've checked by hand to ensure they work.

I'm not exactly sure what the process should be to make corrections/decisions, given there are so many failures. We could:

Append an error_if threshold to every failing test, then slowly work our way through them ourselves.
Collect all the failures in a spreadsheet (or multiple sheets), then send the sheet to Valuations.
Use this PR to work through all the failing tests, and try to fix/address each one.

Curious to hear thoughts @jeancochrane @jeancochrane @wrridgeway.

Note

This PR also adds name spacing to test names in the form of $SCHEMA_$TABLE_$COLUMN_$TESTNAME. This makes the test output easier to parse. I also truncated the test names so they don't overflow the default test output width. All file changes to location/, reporting/, and default/ are related to this change and can be largely ignored.

I apologize in advance for the massive diff. I should've PR'd this sooner but got carried away. I don't think it's actually necessary to review this whole PR, happy to chat about how to manage it.

Link to full test output

This reverts commit aa1ad67.

dfsnow · 2023-11-20T23:41:08Z

dbt/models/default/schema/default.vw_card_res_char.yml

@@ -109,7 +109,7 @@ models:

    tests:
      - unique_combination_of_columns:
-          name: vw_card_res_char_unique_by_pin_card_and_year


All changes in default/ are related to namespace changes. See PR body for more details.

dfsnow · 2023-11-20T23:42:39Z

dbt/tests/generic/test_column_length.sql

@@ -11,14 +11,14 @@
    {%- set columns_csv = additional_select_columns | join(", ") %}

    {%- set length_columns = [] %}
-    {% for column in columns %}
+    {%- for column in columns %}


The changes here are just to control whitespace in the compiled SQL output.

dfsnow · 2023-11-20T23:45:23Z

dbt/tests/generic/test_expression_is_true.sql

-    where not ({{ expression }})
+    where not ({{ column_name }} {{ expression }})


I added this in order to use expression_is_true in column-level tests, i.e.:

- name: rmbed tests: - expression_is_true: name: iasworld_dweldat_rmbed_lte_rmtot expression: <= rmtot select_columns: - parid - taxyr - card

It still works fine in the original context of passing a full expression.

dfsnow · 2023-11-20T23:46:38Z

dbt/tests/generic/test_no_extra_whitespace.sql

The changes in here are to facilitate testing for leading and trailing whitespace in strings such as addresses, i.e. we want 123 FISH STREET to pass but 123 FISH STREET to fail.

dfsnow · 2023-11-20T23:48:40Z

dbt/tests/generic/test_row_values_match_after_join.sql

+    {%- if "." in column -%} {%- set model_col = column -%}
+    {%- else -%} {%- set model_col = "model" ~ "." ~ column -%}
+    {%- endif -%}
+
+    {%- if "." in external_column -%} {%- set external_model_col = external_column -%}
+    {%- else -%}
+        {%- set external_model_col = "external_model" ~ "." ~ external_column -%}
+    {%- endif -%}


The yucky formatting here is a result of sqlfmt 🤷

dfsnow · 2023-11-20T23:51:01Z

dbt/tests/generic/test_row_values_match_after_join.sql

This test is almost explicitly for testing that the class column matches across different tables i.e. a PIN's class in PARDAT matches its class in LEGDAT. One subtlety is that this will not return a row if any row has a matching class after the join. So, if a PARDAT PIN is class 211 and it joins to a DWELDAT record with a two buildings on the same PIN - one class 211, one class 204 - then no records are returned (since the DWELDAT record has a 211).

dfsnow · 2023-11-20T23:52:22Z

dbt/models/reporting/schema.yml

+              name: reporting_ratio_stats_no_nulls
+              expression: |
                year IS NOT NULL
-                  AND triad IS NOT NULL
-                  AND geography_type IS NOT NULL
-                  AND property_group IS NOT NULL
-                  AND assessment_stage IS NOT NULL
-                  AND sale_year IS NOT NULL
+                AND triad IS NOT NULL
+                AND geography_type IS NOT NULL
+                AND property_group IS NOT NULL
+                AND assessment_stage IS NOT NULL
+                AND sale_year IS NOT NULL


All the deindenting and changes to | are just to get a cleaner compiled output i.e. one with line breaks.

dfsnow · 2023-11-20T23:55:11Z

dbt/models/iasworld/schema/iasworld.addn.yml

+            tests:
+              - relationships:
+                  name: iasworld_addn_class_in_ccao_class_dict
+                  to: source('ccao', 'class_dict')
+                  field: class_code
+                  config:
+                    where: |
+                      taxyr >= '2022'
+                      AND class != 'EX'
+                      AND cur = 'Y'
+                      AND deactivat IS NULL


This test is on basically every class column in iasWorld now and is designed to catch issues like people adding a dash to classes, e.g. 2-12.

dfsnow · 2023-11-20T23:58:54Z

dbt/models/iasworld/schema/iasworld.aprval.yml

+                  config: &unique-conditions
+                    where: cur = 'Y' AND deactivat IS NULL


Since many tests require the same conditionals, I used YAML anchors a lot to repeat the same conditions across tests in the same schema file.

dfsnow · 2023-11-20T23:59:58Z

dbt/models/iasworld/schema/iasworld.asmt_all.yml

+                    where: |
+                      taxyr >= '2022'
+                      AND class NOT IN ('EX', 'RR', '999')
+                      AND NOT REGEXP_LIKE(class, '[0-9]{3}[A|B]')


This is excluding any subclasses such as 693A or similar.

dfsnow · 2023-11-21T00:01:56Z

dbt/models/iasworld/schema/iasworld.dweldat.yml

+                    where: | 
+                      cur = 'Y'
+                      AND deactivat IS NULL
+                      AND class NOT IN ('201', '213', '218', '219', '220', '221', '224', '225', '236', '240', '241', '290', '294', '297')


DWELDAT has a bunch of non-regression classes that don't actually have any characteristic data, hence the exclusion here.

dfsnow · 2023-11-21T00:04:59Z

dbt/models/iasworld/schema/iasworld.owndat.yml

+              expression: |
+                CASE
+                  WHEN LENGTH(REGEXP_EXTRACT(addr1, '^[N|S|E|W]{0,2}\s{0,1}([0-9]{1,5})',1)) <= 5 THEN TRUE
+                  WHEN REGEXP_LIKE(UPPER(addr1), '^[PO BOX|P\.O\. BOX|P O BOX|BOX|P O BX|PO BX|PO BX|P.O.BOX]') THEN TRUE
+                  WHEN REGEXP_LIKE(UPPER(addr1), '^[C\/O|ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE]') THEN TRUE
+                  ELSE FALSE
+                END


Basically just testing whether the address starts with a street number less than 5 digits or is a PO Box.

jeancochrane

Awesome job with these tests! I'm surprised they execute so quickly (8m9s in the test run you linked); not bad for 294 queries.

I'm not exactly sure what the process should be to make corrections/decisions, given there are so many failures. We could:

Append an error_if threshold to every failing test, then slowly work our way through them ourselves.

Collect all the failures in a spreadsheet (or multiple sheets), then send the sheet to Valuations.

Use this PR to work through all the failing tests, and try to fix/address each one.

So far we've been following approach 1, but I don't necessarily think that means it's the best option. Ideally we would do 2 or 3, but I'm concerned about spamming ourselves with failures for the duration of the time it takes us to correct mistakes, particularly given that tests might end up running on PRs if someone were to edit an underlying model. (I suppose that's not an issue for the iasWorld tests, though, since we don't actually edit those tables directly.)

In an ideal world, I think the approach we discussed last week is probably the right one:

Tag all unit tests that are intended to test that our transformations do what we expect
Update the build_and_test_dbt workflow to only run unit tests
Allow integration tests to fail while we fix them, but insist that unit tests pass

I don't think this work would be particularly difficult, but I'm not actually sure which of our existing tests (if any) correspond to unit tests, so I don't have high confidence that this would make sense for our current set of tests.

All that being said, I'm comfortable with all of the options you outlined, as long as we all understand which approach we're deciding to take.

jeancochrane · 2023-11-21T16:20:49Z

dbt/models/iasworld/schema/iasworld.asmt_all.yml

+            # - FP Checklist - Non-EX, RR parcels with 0 land value
+            # - FP Checklist - Non-EX, RR parcels with 0 value
+            tests:
+              - dbt_utils.accepted_range: &test-non-negative


[Praise] Nice use of anchors here!

jeancochrane · 2023-11-21T16:21:43Z

dbt/models/iasworld/schema/iasworld.asmt_all.yml

+          # Note that this test is NOT actually the unique primary key of
+          # this table, since there doesn't seem to BE a unique combination
+          # of identifying columns


[Question, non-blocking] Hm, does that mean we should just remove this test then?

I think it's still useful, since an error firing would indicate an increasing number of duplicate values.

jeancochrane · 2023-11-21T16:31:04Z

dbt/models/iasworld/schema/iasworld.owndat.yml

+
+

[Question, non-blocking] Any particular meaning to these newlines?

Fixed in 70b5902!

dfsnow · 2023-11-21T19:29:24Z

In an ideal world, I think the approach we discussed last week is probably the right one:

Tag all unit tests that are intended to test that our transformations do what we expect

Update the build_and_test_dbt workflow to only run unit tests

Allow integration tests to fail while we fix them, but insist that unit tests pass

Okay I think I've landed on a good approach based on this. I tagged all QC/source tests and updated the build_and_test_dbt workflow to exclude all QC tests. This way we won't be blocked by QC test failures but can still see them happen in the test_dbt_models output. We can then work to get the test_dbt_models output "back to green" by correcting the tests (it can serve as a running log of things to fix).

Thoughts @jeancochrane?

Here are the commits since last time if you want to review.

dfsnow added 4 commits November 15, 2023 22:22

Initial sale view tests

aa1ad67

Add ADDN tests

4d3c6ed

Add ADPRVAL tests

cdf2761

Add ASMT_ALL tests

9f8eedc

dfsnow requested a review from a team as a code owner November 17, 2023 02:09

dfsnow marked this pull request as draft November 17, 2023 02:09

dfsnow and others added 9 commits November 16, 2023 21:51

Revert "Initial sale view tests"

047483e

This reverts commit aa1ad67.

Add new test macro for mismatched rows

47dfea3

Test mismatched classes in ASMT_ALL vs PARDAT

2f93f33

Update class comparison tests

a6d5a18

Improve class match testing

c7fb3c8

Add more tests and rename all iasworld tests

9bcdb38

Update test names and formatting

4b6f096

Add more DWELDAT char tests

ecf8bb5

Finish DWELDAT tests

10da78c

dfsnow force-pushed the dansnow/add-iasworld-tests branch from 6102040 to 10da78c Compare November 17, 2023 23:05

dfsnow added 9 commits November 17, 2023 17:38

Add HTPAR tests

6d0145d

Add LAND table tests

2d6d42f

Add LEGDAT and address tests

4df8b13

Add more LEGDAT address checks

5603526

Add OWNDAT tests

b46b353

Add PARDAT tests

0b4daec

Add OBY tests

5f23d8f

Add SALES tests

2b4bf56

Add tags for FP QC

4d1ff64

dfsnow added the enhancement label Nov 20, 2023

dfsnow added 3 commits November 20, 2023 17:33

Update test schemas

f55f513

Truncate test name

8d6486f

Truncate test name

99c2ec2

dfsnow commented Nov 20, 2023

View reviewed changes

dfsnow added 2 commits November 20, 2023 17:55

Update test name

c3ca076

Replace redundant conditions with YAML anchors

3e2af4d

dfsnow commented Nov 20, 2023

View reviewed changes

dfsnow commented Nov 21, 2023

View reviewed changes

Fix missing test namespace

2e48f66

dfsnow changed the title ~~Add iasWorld dbt tests~~ Add iasWorld dbt tests and namespace all tests Nov 21, 2023

dfsnow requested review from wrridgeway and jeancochrane November 21, 2023 00:14

dfsnow marked this pull request as ready for review November 21, 2023 00:14

jeancochrane approved these changes Nov 21, 2023

View reviewed changes

dfsnow added 4 commits November 21, 2023 18:43

Drop extra linebreaks

70b5902

Add test_qc tag to all iasWorld tables

c634f35

Exclude test_qc tests from build_and_test_dbt workflow

040b6bb

Add note about trigger to test_dbt_models

2ec996a

dfsnow merged commit 3551103 into master Nov 21, 2023
9 checks passed

dfsnow deleted the dansnow/add-iasworld-tests branch November 21, 2023 20:30

jeancochrane mentioned this pull request Nov 22, 2023

Bring dbt setup instructions up to date #245

Merged

dfsnow mentioned this pull request Nov 28, 2023

Test dbt models #113

Open

23 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add iasWorld dbt tests and namespace all tests #236

Add iasWorld dbt tests and namespace all tests #236

dfsnow commented Nov 17, 2023 •

edited

Loading

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 20, 2023

dfsnow Nov 21, 2023

dfsnow Nov 21, 2023

jeancochrane left a comment •

edited

Loading

jeancochrane Nov 21, 2023

jeancochrane Nov 21, 2023

dfsnow Nov 21, 2023

jeancochrane Nov 21, 2023

dfsnow Nov 21, 2023

dfsnow commented Nov 21, 2023

		where not ({{ expression }})
		where not ({{ column_name }} {{ expression }})

		config: &unique-conditions
		where: cur = 'Y' AND deactivat IS NULL

Add iasWorld dbt tests and namespace all tests #236

Add iasWorld dbt tests and namespace all tests #236

Conversation

dfsnow commented Nov 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeancochrane left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dfsnow commented Nov 21, 2023

dfsnow commented Nov 17, 2023 •

edited

Loading

jeancochrane left a comment •

edited

Loading