Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vw_pin_appeal unique by pin, year, case_no #674

Merged
merged 22 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions dbt/models/default/default.vw_pin_appeal.sql
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
-- View containing appeals by PIN

-- This CTE exists only so that we can join reason descriptions onto cleaned
-- reason codes.
-- CTE so that we can join reason descriptions onto cleaned reason codes and
-- drop some dupes from htpar.
WITH reasons AS (
SELECT
htpar.*,
SELECT DISTINCT
wrridgeway marked this conversation as resolved.
Show resolved Hide resolved
htpar.parid,
htpar.taxyr,
htpar.caseno,
htpar.user38,
htpar.user104,
htpar.resact,
htpar.hrstatus,
htpar.cpatty,
-- user125 isn't used in the view, but it's FilingID and we want to
-- include it to make sure we're only dropping true dupes using DISTINCT
htpar.user125,
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
-- Reason codes come from different columns before and after 2020
CASE
WHEN htpar.taxyr <= '2020'
Expand Down Expand Up @@ -37,6 +47,11 @@ WITH reasons AS (
THEN REGEXP_REPLACE(htpar.user101, '[^[:alnum:]]', '')
END AS reason_code3
FROM {{ source('iasworld', 'htpar') }} AS htpar
WHERE htpar.cur = 'Y'
AND htpar.deactivat IS NULL
AND htpar.caseno IS NOT NULL
AND htpar.heartyp IN ('A', 'C')
Copy link
Member Author

@wrridgeway wrridgeway Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional means we'll only get strictly defined appeals - I'm not entirely positive that's what we want, but it is what we suggest our view is. If we allow in other values for this column we need to address three appeals in 2022 that have both "E" and "A" heartyp for the same pin, case_no, year.


)

SELECT
Expand Down Expand Up @@ -97,7 +112,9 @@ SELECT
WHEN reasons.hrstatus = 'X' THEN 'closed pending c of e'
END AS status
FROM reasons
LEFT JOIN {{ source('iasworld', 'pardat') }} AS pardat
-- This is an INNER JOIN since there are apparently parcels in htpar that are
-- not in pardat. See 31051000511064, 2008.
INNER JOIN {{ source('iasworld', 'pardat') }} AS pardat
Copy link
Member Author

@wrridgeway wrridgeway Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some godforsaken reason there are PINs in htpar that are not in pardat.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, non-blocking] Why remove these PINs, when the only side effect of their inclusion would be some appeals with null classes? We could just document that artifact in the view? I guess I don't feel super confident that in these cases the mistake lies in extraneous appeals, rather than missing rows from pardat. But if you do, I'm happy to move forward with this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that we usually depend on pardat or the parcel shapefile as our parcel universe. Once we start letting pins from outside of those two sources into views our universe is going to be inconsistent across tables, views, etc. Personally, I trust pardat more than htpar.

ON reasons.parid = pardat.parid
AND reasons.taxyr = pardat.taxyr
AND pardat.cur = 'Y'
Expand All @@ -120,6 +137,3 @@ LEFT JOIN {{ ref('ccao.htpar_reascd') }} AS reascd2
ON reasons.reason_code2 = reascd2.reascd
LEFT JOIN {{ ref('ccao.htpar_reascd') }} AS reascd3
ON reasons.reason_code3 = reascd3.reascd
WHERE reasons.cur = 'Y'
AND reasons.caseno IS NOT NULL
AND reasons.deactivat IS NULL
1 change: 1 addition & 0 deletions dbt/models/default/exposures.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ exposures:
test_row_count: true
asset_id: y282-6ig3
primary_key:
- case_no
- pin
- year
description: |
Expand Down
10 changes: 8 additions & 2 deletions dbt/models/default/schema/default.vw_pin_appeal.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,13 @@ models:
data_tests:
- row_count:
name: default_vw_pin_appeal_row_count
above: 8815299 # as of 2024-11-14
above: 8755522 # as of 2024-12-09
- unique_combination_of_columns:
name: default_vw_pin_appeal_unique_by_14_digit_pin_year_caseno
combination_of_columns:
- pin
- year
- case_no

unit_tests:
- name: default_vw_pin_appeal_class_strips_non_alphanumerics
Expand All @@ -59,7 +65,7 @@ unit_tests:
- {parid: "123", taxyr: "2024", cur: "Y", deactivat: null, class: "2.1-1)A"}
- input: source("iasworld", "htpar")
rows:
- {parid: "123", taxyr: "2024", cur: "Y", deactivat: null, cpatty: "1", caseno: "1"}
- {parid: "123", taxyr: "2024", cur: "Y", deactivat: null, cpatty: "1", caseno: "1", heartyp: "A", user125: "A", user38: "CC", user104: "A", resact: "A", hrstatus: "A", user100: "A", user101: "A", user89: "A"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeancochrane this works, but I'm not sure if I've included extraneous columns.

Copy link
Contributor

@jeancochrane jeancochrane Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work with only heartyp, since that's the only new column that's being used in a table filter or join. The rest of the new columns here after heartyp should be unnecessary, since we're not testing their output and the default that dbt chooses (null) won't affect joins or filters. Does that work as expected?

- input: source("iasworld", "legdat")
rows:
- {parid: "123", taxyr: "2024", cur: "Y", deactivat: null}
Expand Down
Loading