Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add provisional assessment stages to PIN value views #641

Merged

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Nov 13, 2024

Background

This PR updates the default.vw_pin_value and reporting.vw_pin_value_long views using our new knowledge of null procnames to add two new assessment stages:

  • PRE-MAILED: Provisional values that will be mailed once first-pass desk review completes
  • ASSESSOR PRE-CERTIFIED: Provisional values that will be set for second-pass once appeals complete

Closes #640.

Open questions

  • It's important that these values not be published in any public-facing assets, since they are not final and so are subject to change. Are there other models that we need to change to make sure we don't accidentally publish these values? For example, reporting.vw_assessment_roll_muni filters by stage_name such that these new stages will not be included, but reporting.vw_assessment_roll does not. Do we need to update the latter to filter out provisional stages?
  • As far as I know, we can only determine provisional values for open towns in the current assessment year; this is because our queries for provisional values rely on the cur = 'Y' condition, which automatically becomes false once the record gets stamped with a procname. (This doesn't stop us from determining the values for past assessment stages, however, because we can use the valclass IS NULL condition in those cases to find the final stamped record for each stage.) Is it acceptable that provisional values are only available for currently open towns?

Testing

I added a few unit tests in this PR to cover obvious problems with the new assessment stages, but I also ran some one-off QC queries to make sure that this PR doesn't accidentally change any non-provisional values that are already in prod. Expand each of the sections below to see the checks I ran in detail.

Make sure total counts of records with non-provisional assessment stages match between prod and dev

Check default.vw_pin_value:

with dev_count as (
    select 'dev' as source, count(*) as count
    from default.vw_pin_value
    where stage_name in ('MAILED', 'ASSESSOR CERTIFIED', 'BOARD CERTIFIED')
),

prod_count as (
    select 'prod' as source, count(*) as count
    from default.vw_pin_value
)

select * from dev_count
union
select * from prod_count

Check reporting.vw_pin_value_long:

with dev_count as (
    select 'dev' as source, count(*) as count
    from z_dev_jecochr_reporting.vw_pin_value_long
    where stage_name in ('MAILED', 'ASSESSOR CERTIFIED', 'BOARD CERTIFIED')
),

prod_count as (
    select 'prod' as source, count(*) as count
    from reporting.vw_pin_value_long
)

select * from dev_count
union
select * from prod_count
Make sure counts of records in each assessment stage match between prod and dev

Check default.vw_pin_value:

with dev_values as (
    select
        stage_name,
        'dev' as source,
        count(*) as count
    from z_dev_jecochr_default.vw_pin_value
    group by stage_name
    order by stage_name desc
),

dev_total as (
    select
        'TOTAL DEV' as stage_name,
        'dev' as source,
        count(*) as count
    from z_dev_jecochr_default.vw_pin_value
),

prod_values as (
    select
        stage_name,
        'prod' as source,
        count(*) as prod_count
    from default.vw_pin_value
    group by stage_name
    order by stage_name desc
),

prod_total as (
    select
        'TOTAL PROD' as stage_name,
        'prod' as source,
        count(*) as count
    from default.vw_pin_value
)

select *
from dev_values
union
select * from dev_total
union
select *
from prod_values
union
select *
from prod_total
order by stage_name

Check reporting.vw_pin_value_long:

with dev_values as (
    select
        stage_name,
        'dev' as source,
        count(*) as count
    from z_dev_jecochr_reporting.vw_pin_value_long
    group by stage_name
    order by stage_name desc
),

dev_total as (
    select
        'TOTAL DEV' as stage_name,
        'dev' as source,
        count(*) as count
    from z_dev_jecochr_reporting.vw_pin_value_long
),

prod_values as (
    select
        stage_name,
        'prod' as source,
        count(*) as prod_count
    from reporting.vw_pin_value_long
    group by stage_name
    order by stage_name desc
),

prod_total as (
    select
        'TOTAL PROD' as stage_name,
        'prod' as source,
        count(*) as count
    from reporting.vw_pin_value_long
)

select *
from dev_values
union
select * from dev_total
union
select *
from prod_values
union
select *
from prod_total
order by stage_name
Make sure all values for records with non-provisional assessment stages match between prod and dev

Check default.vw_pin_value:

with vw_pin_value_mismatches as (
    select
        dev.pin,
        dev.year,
        dev.mailed_class as dev_mailed_class,
        prod.mailed_class as prod_mailed_class,
        case when (
            (dev.mailed_class is not null and prod.mailed_class is null)
            or (dev.mailed_class is null and prod.mailed_class is not null)
            or (dev.mailed_class != prod.mailed_class)
        ) then false else true end as mailed_class_matches,
        dev.mailed_bldg as dev_mailed_bldg,
        prod.mailed_bldg as prod_mailed_bldg,
        case when (
            (dev.mailed_bldg is not null and prod.mailed_bldg is null)
            or (dev.mailed_bldg is null and prod.mailed_bldg is not null)
            or (dev.mailed_bldg != prod.mailed_bldg)
        ) then false else true end as mailed_bldg_matches,
        dev.mailed_land as dev_mailed_land,
        prod.mailed_land as prod_mailed_land,
        case when (
            (dev.mailed_land is not null and prod.mailed_land is null)
            or (dev.mailed_land is null and prod.mailed_land is not null)
            or (dev.mailed_land != prod.mailed_land)
        ) then false else true end as mailed_land_matches,
        dev.mailed_tot as dev_mailed_tot,
        prod.mailed_tot as prod_mailed_tot,
        case when (
            (dev.mailed_tot is not null and prod.mailed_tot is null)
            or (dev.mailed_tot is null and prod.mailed_tot is not null)
            or (dev.mailed_tot != prod.mailed_tot)
        ) then false else true end as mailed_tot_matches,
        dev.mailed_bldg_mv as dev_mailed_bldg_mv,
        prod.mailed_bldg_mv as prod_mailed_bldg_mv,
        case when (
            (dev.mailed_bldg_mv is not null and prod.mailed_bldg_mv is null)
            or (dev.mailed_bldg_mv is null and prod.mailed_bldg_mv is not null)
            or (dev.mailed_bldg_mv != prod.mailed_bldg_mv)
        ) then false else true end as mailed_bldg_mv_matches,
        dev.mailed_land_mv as dev_mailed_land_mv,
        prod.mailed_land_mv as prod_mailed_land_mv,
        case when (
            (dev.mailed_land_mv is not null and prod.mailed_land_mv is null)
            or (dev.mailed_land_mv is null and prod.mailed_land_mv is not null)
            or (dev.mailed_land_mv != prod.mailed_land_mv)
        ) then false else true end as mailed_land_mv_matches,
        dev.mailed_tot_mv as dev_mailed_tot_mv,
        prod.mailed_tot_mv as prod_mailed_tot_mv,
        case when (
            (dev.mailed_tot_mv is not null and prod.mailed_tot_mv is null)
            or (dev.mailed_tot_mv is null and prod.mailed_tot_mv is not null)
            or (dev.mailed_tot_mv != prod.mailed_tot_mv)
        ) then false else true end as mailed_tot_mv_matches,
    
        dev.certified_class as dev_certified_class,
        prod.certified_class as prod_certified_class,
        case when (
            (dev.certified_class is not null and prod.certified_class is null)
            or (dev.certified_class is null and prod.certified_class is not null)
            or (dev.certified_class != prod.certified_class)
        ) then false else true end as certified_class_matches,
        dev.certified_bldg as dev_certified_bldg,
        prod.certified_bldg as prod_certified_bldg,
        case when (
            (dev.certified_bldg is not null and prod.certified_bldg is null)
            or (dev.certified_bldg is null and prod.certified_bldg is not null)
            or (dev.certified_bldg != prod.certified_bldg)
        ) then false else true end as certified_bldg_matches,
        dev.certified_land as dev_certified_land,
        prod.certified_land as prod_certified_land,
        case when (
            (dev.certified_land is not null and prod.certified_land is null)
            or (dev.certified_land is null and prod.certified_land is not null)
            or (dev.certified_land != prod.certified_land)
        ) then false else true end as certified_land_matches,
        dev.certified_tot as dev_certified_tot,
        prod.certified_tot as prod_certified_tot,
        case when (
            (dev.certified_tot is not null and prod.certified_tot is null)
            or (dev.certified_tot is null and prod.certified_tot is not null)
            or (dev.certified_tot != prod.certified_tot)
        ) then false else true end as certified_tot_matches,
        dev.certified_bldg_mv as dev_certified_bldg_mv,
        prod.certified_bldg_mv as prod_certified_bldg_mv,
        case when (
            (dev.certified_bldg_mv is not null and prod.certified_bldg_mv is null)
            or (dev.certified_bldg_mv is null and prod.certified_bldg_mv is not null)
            or (dev.certified_bldg_mv != prod.certified_bldg_mv)
        ) then false else true end as certified_bldg_mv_matches,
        dev.certified_land_mv as dev_certified_land_mv,
        prod.certified_land_mv as prod_certified_land_mv,
        case when (
            (dev.certified_land_mv is not null and prod.certified_land_mv is null)
            or (dev.certified_land_mv is null and prod.certified_land_mv is not null)
            or (dev.certified_land_mv != prod.certified_land_mv)
        ) then false else true end as certified_land_mv_matches,
        dev.certified_tot_mv as dev_certified_tot_mv,
        prod.certified_tot_mv as prod_certified_tot_mv,
        case when (
            (dev.certified_tot_mv is not null and prod.certified_tot_mv is null)
            or (dev.certified_tot_mv is null and prod.certified_tot_mv is not null)
            or (dev.certified_tot_mv != prod.certified_tot_mv)
        ) then false else true end as certified_tot_mv_matches,
    
        dev.board_class as dev_board_class,
        prod.board_class as prod_board_class,
        case when (
            (dev.board_class is not null and prod.board_class is null)
            or (dev.board_class is null and prod.board_class is not null)
            or (dev.board_class != prod.board_class)
        ) then false else true end as board_class_matches,
        dev.board_bldg as dev_board_bldg,
        prod.board_bldg as prod_board_bldg,
        case when (
            (dev.board_bldg is not null and prod.board_bldg is null)
            or (dev.board_bldg is null and prod.board_bldg is not null)
            or (dev.board_bldg != prod.board_bldg)
        ) then false else true end as board_bldg_matches,
        dev.board_land as dev_board_land,
        prod.board_land as prod_board_land,
        case when (
            (dev.board_land is not null and prod.board_land is null)
            or (dev.board_land is null and prod.board_land is not null)
            or (dev.board_land != prod.board_land)
        ) then false else true end as board_land_matches,
        dev.board_tot as dev_board_tot,
        prod.board_tot as prod_board_tot,
        case when (
            (dev.board_tot is not null and prod.board_tot is null)
            or (dev.board_tot is null and prod.board_tot is not null)
            or (dev.board_tot != prod.board_tot)
        ) then false else true end as board_tot_matches,
        dev.board_bldg_mv as dev_board_bldg_mv,
        prod.board_bldg_mv as prod_board_bldg_mv,
        case when (
            (dev.board_bldg_mv is not null and prod.board_bldg_mv is null)
            or (dev.board_bldg_mv is null and prod.board_bldg_mv is not null)
            or (dev.board_bldg_mv != prod.board_bldg_mv)
        ) then false else true end as board_bldg_mv_matches,
        dev.board_land_mv as dev_board_land_mv,
        prod.board_land_mv as prod_board_land_mv,
        case when (
            (dev.board_land_mv is not null and prod.board_land_mv is null)
            or (dev.board_land_mv is null and prod.board_land_mv is not null)
            or (dev.board_land_mv != prod.board_land_mv)
        ) then false else true end as board_land_mv_matches,
        dev.board_tot_mv as dev_board_tot_mv,
        prod.board_tot_mv as prod_board_tot_mv,
        case when (
            (dev.board_tot_mv is not null and prod.board_tot_mv is null)
            or (dev.board_tot_mv is null and prod.board_tot_mv is not null)
            or (dev.board_tot_mv != prod.board_tot_mv)
        ) then false else true end as board_tot_mv_matches
    
    from z_dev_jecochr_default.vw_pin_value prod
    left join default.vw_pin_value dev
        on dev.pin = prod.pin
        and dev.year = prod.year
)

select *
from vw_pin_value_mismatches
where
    not mailed_class_matches
    or not mailed_bldg_matches
    or not mailed_land_matches
    or not mailed_tot_matches
    or not mailed_bldg_mv_matches
    or not mailed_land_mv_matches
    or not mailed_tot_mv_matches

    or not certified_class_matches
    or not certified_bldg_matches
    or not certified_land_matches
    or not certified_tot_matches
    or not certified_bldg_mv_matches
    or not certified_land_mv_matches
    or not certified_tot_mv_matches

    or not board_class_matches
    or not board_bldg_matches
    or not board_land_matches
    or not board_tot_matches
    or not board_bldg_mv_matches
    or not board_land_mv_matches
    or not board_tot_mv_matches

Check reporting.vw_pin_value_long:

with reporting_vw_pin_value_long_mismatches as (
    select
        dev.pin,
        dev.year,
        dev.stage_name,
        dev.class as dev_class,
        prod.class as prod_class,
        case when (
            (dev.class is not null and prod.class is null)
            or (dev.class is null and prod.class is not null)
            or (dev.class != prod.class)
        ) then false else true end as class_matches,
        dev.major_class,
        prod.major_class,
        case when (
            (dev.major_class is not null and prod.major_class is null)
            or (dev.major_class is null and prod.major_class is not null)
            or (dev.major_class != prod.major_class)
        ) then false else true end as major_class_matches,
        dev.property_group,
        prod.property_group,
        case when (
            (dev.property_group is not null and prod.property_group is null)
            or (dev.property_group is null and prod.property_group is not null)
            or (dev.property_group != prod.property_group)
        ) then false else true end as property_group_matches,
        dev.stage_num,
        prod.stage_num,
        case when (
            (dev.stage_num is not null and prod.stage_num is null)
            or (dev.stage_num is null and prod.stage_num is not null)
            or (dev.stage_num != prod.stage_num)
        ) then false else true end as stage_num_matches,
        dev.bldg,
        prod.bldg,
        case when (
            (dev.bldg is not null and prod.bldg is null)
            or (dev.bldg is null and prod.bldg is not null)
            or (dev.bldg != prod.bldg)
        ) then false else true end as bldg_matches,
        dev.land,
        prod.land,
        case when (
            (dev.land is not null and prod.land is null)
            or (dev.land is null and prod.land is not null)
            or (dev.land != prod.land)
        ) then false else true end as land_matches,
        dev.tot,
        prod.tot,
        case when (
            (dev.tot is not null and prod.tot is null)
            or (dev.tot is null and prod.tot is not null)
            or (dev.tot != prod.tot)
        ) then false else true end as tot_matches,
        dev.bldg_mv,
        prod.bldg_mv,
        case when (
            (dev.bldg_mv is not null and prod.bldg_mv is null)
            or (dev.bldg_mv is null and prod.bldg_mv is not null)
            or (dev.bldg_mv != prod.bldg_mv)
        ) then false else true end as bldg_mv_matches,
        dev.land_mv,
        prod.land_mv,
        case when (
            (dev.land_mv is not null and prod.land_mv is null)
            or (dev.land_mv is null and prod.land_mv is not null)
            or (dev.land_mv != prod.land_mv)
        ) then false else true end as land_mv_matches,
        dev.tot_mv,
        prod.tot_mv,
        case when (
            (dev.tot_mv is not null and prod.tot_mv is null)
            or (dev.tot_mv is null and prod.tot_mv is not null)
            or (dev.tot_mv != prod.tot_mv)
        ) then false else true end as tot_mv_matches
    from z_dev_jecochr_reporting.vw_pin_value_long as dev
    left join reporting.vw_pin_value_long as prod
        on dev.pin = prod.pin
        and dev.year = prod.year
        and dev.stage_name = prod.stage_name
)

select *
from reporting_vw_pin_value_long_mismatches
where
    not class_matches
    or not major_class_matches
    or not property_group_matches
    or not stage_num_matches
    or not bldg_matches
    or not land_matches
    or not tot_matches
    or not bldg_mv_matches
    or not land_mv_matches
    or not tot_mv_matches

@jeancochrane jeancochrane linked an issue Nov 13, 2024 that may be closed by this pull request
@jeancochrane jeancochrane changed the title Add provisional assessment stages to default.vw_pin_value and reporting.vw_pin_value_long Add provisional assessment stages to PIN value views Nov 13, 2024
dbt/macros/pre_stage_filters.sql Outdated Show resolved Hide resolved
dbt/models/reporting/docs.md Show resolved Hide resolved
dbt/models/default/default.vw_pin_value.sql Show resolved Hide resolved
dbt/models/default/default.vw_pin_value.sql Show resolved Hide resolved
Comment on lines +397 to +404
-- If the PIN has no stages but its year is not the current
-- assessment year, it is likely a data error from a prior
-- year that we don't want to include in our results. In
-- contrast, if the PIN is in the current year but has no
-- stages, it is most likely a provisional value for a PIN
-- that has not mailed yet
CARDINALITY(stages.procnames) != 0
OR asmt.taxyr = DATE_FORMAT(NOW(), '%Y')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic seem correct to you? It made sense in my QC, but it's hard to know for sure given that the only provisional values that currently exist in our database are pre-certified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make perfect sense to me. The only thing I'll mention is that because of these conditional constraints you probably don't need to have and {{ tablename }}.cur = 'Y' in your first macro, but better safe than sorry.

@jeancochrane jeancochrane marked this pull request as ready for review November 13, 2024 23:17
@jeancochrane jeancochrane requested a review from a team as a code owner November 13, 2024 23:17
Comment on lines 438 to 440
WHEN stage_values.pre_certified_tot IS NOT NULL THEN 1.5
WHEN stage_values.mailed_tot IS NOT NULL THEN 1
WHEN stage_values.pre_mailed_tot IS NOT NULL THEN 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0 makes a lot of sense for the pre-mail stage number, but 1.5 feels kind of weird for pre-certified. Is it risky to increment these so that they run 1-5 (or 0-4)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear from 0, 1, 1.5, 2, 3 which stages are traditional and which are not. I think it's better to have pre-certified be a weird number than to make it more difficult to be able to quickly divine which are the main stages. pre-certified isn't stage 2, it's a temporary stage that only exists in certain limited contexts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going by that logic, can we make pre_mailed stage 0.5?

Copy link
Member

@wrridgeway wrridgeway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome Jean. Thanks for whipping reporting.vw_pin_value_long into shape.

Comment on lines +397 to +404
-- If the PIN has no stages but its year is not the current
-- assessment year, it is likely a data error from a prior
-- year that we don't want to include in our results. In
-- contrast, if the PIN is in the current year but has no
-- stages, it is most likely a provisional value for a PIN
-- that has not mailed yet
CARDINALITY(stages.procnames) != 0
OR asmt.taxyr = DATE_FORMAT(NOW(), '%Y')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make perfect sense to me. The only thing I'll mention is that because of these conditional constraints you probably don't need to have and {{ tablename }}.cur = 'Y' in your first macro, but better safe than sorry.

WHEN stage_values.mailed_tot IS NOT NULL THEN 1
WHEN stage_values.pre_mailed_tot IS NOT NULL THEN 0
END AS stage_num
FROM stage_values
),

change_reasons AS (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I should have done when I originally made this view 😅, but perhaps a quick comment as to why clean_values and change_reason exist? Just something like "Assign numeric values to each stage" or "Gather change reason codes for each stage"

dbt/models/default/default.vw_pin_value.sql Show resolved Hide resolved
dbt/models/default/default.vw_pin_value.sql Show resolved Hide resolved
Comment on lines 438 to 440
WHEN stage_values.pre_certified_tot IS NOT NULL THEN 1.5
WHEN stage_values.mailed_tot IS NOT NULL THEN 1
WHEN stage_values.pre_mailed_tot IS NOT NULL THEN 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's clear from 0, 1, 1.5, 2, 3 which stages are traditional and which are not. I think it's better to have pre-certified be a weird number than to make it more difficult to be able to quickly divine which are the main stages. pre-certified isn't stage 2, it's a temporary stage that only exists in certain limited contexts.

-- the difference for legacy compatibility
'BOR CERTIFIED'
],
ARRAY[0, 1, 1.5, 2, 3],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if there's a way to pull these stage number values form vw_pin_value, but I'm sure you already tried.

Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur with @wrridgeway, this is amazing work. @ccao-jardine should be pleased we can final report on pre-mailed/certified values.

dbt/macros/pre_stage_filters.sql Outdated Show resolved Hide resolved
Comment on lines 438 to 440
WHEN stage_values.pre_certified_tot IS NOT NULL THEN 1.5
WHEN stage_values.mailed_tot IS NOT NULL THEN 1
WHEN stage_values.pre_mailed_tot IS NOT NULL THEN 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going by that logic, can we make pre_mailed stage 0.5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (non-blocking): It looks like column names/descriptions are missing for the new pre_ columns. It's probably worth adding them for the sake of completeness and clarity.

@ccao-jardine
Copy link
Member

All makes sense. I am indeed excited for this, and appreciate the attention to detail.

For example, reporting.vw_assessment_roll_muni filters by stage_name such that these new stages will not be included, but reporting.vw_assessment_roll does not.

For what it's worth, I think we are safe here. Our public reporting assets that rely on those views are pretty robust: none of them automatically update, and we can apply filters on stage_name to future-proof them as a second layer of protection.

@jeancochrane jeancochrane merged commit 6a2a3f9 into master Nov 18, 2024
8 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/640-ingest-pre-procname-data-into-lake branch November 18, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingest pre-procname data into lake
4 participants