Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for 'a has fewer rows than b' #221

Closed
wants to merge 2 commits into from

Conversation

dmarts
Copy link

@dmarts dmarts commented May 13, 2020

Description & motivation

Add a fewer_rows_than test.

This lets an analyst confirm that our model (A) has fewer rows than some reference model (B). It's useful for confirming that the number of rows has actually reduced after applying some filter or inner join in model A.

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)

@dmarts dmarts marked this pull request as ready for review May 13, 2020 14:16
@dmarts dmarts requested a review from clrcrl as a code owner May 13, 2020 14:17
@dmarts dmarts changed the title Macro and test case feature/fewer_rows_than May 13, 2020
@dmarts dmarts changed the title feature/fewer_rows_than Add test for 'a has fewer rows than b' May 13, 2020
@clrcrl
Copy link
Contributor

clrcrl commented May 18, 2020

Hi dan! Thanks for the PR! I'm a little behind on a few other things, so just to set expectations, I probably won't look at this until late this week or potentially later than that 😅

@dmarts
Copy link
Author

dmarts commented May 18, 2020

You do your thing 👍

@clrcrl clrcrl changed the base branch from master to dev/0.7.0 December 23, 2020 16:02
Copy link
Contributor

@clrcrl clrcrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about this design of this test, then I'll review fully! (Trying to get these done before Christmas haha!)

select
case
when count_model > count_comparison then count_model - count_comparison + 1
when count_model = count_comparison then count_model - count_comparison
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've spent more minutes looking at this line than I'd care to admit.

In this case, the two tables have the same number of rows, right? Do we expect that to be a passing, or failing case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who wrote this? Someone in the dim and distant past. Yikes. Anyway, we're trying to find the number of errors so line 24+ is supposed to find the number of excess rows:

  • If count_model = count_comparison then we have 1 too many rows therefore return 1
  • if count_model > count_comparison then we have count_model - count_comparison + 1 too many rows so return that number
  • if count_model < count_comparison then we're good so return 0

So I think L27 should be:
when count_model = count_comparison then 1

That's one bug squashed. Is it clearer now? If it's failed the very first review by an experienced user then it probably needs some more work 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it! (Welcome back from the Christmas break 😉 )

I think one thing that is throwing me is "count model" and "count comparison" — these terms are generic so it makes it a little challenging to follow the logic. I think we can make these more explicit, and maybe adjust the case logic to be a little more verbose.

What do you think of something like this?

select
  (count_model_with_more_rows - count_model_with_fewer_rows) as row_count_delta,

   case
     -- pass the test if the delta is positive (i.e. return the number 0)
     when row_count_delta > 0 then 0
     -- fail the test if they are the same number
     when row_count_delta = 0 then 1
     -- fail the test for negative numbers
     when row_count_delta < 0 then abs(row_count_delta)
   end as excess_rows
from counts
)
select excess_rows from final

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this may not work on postgres — most cloud warehouses support lateral column aliasing (i.e. using a calculated column in the same CTE) but I suspect postgres does not, so we might need to move the "case" statement into another CTE! 😬

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion looks good! I didn't know that about postgres but the very verbose CTE-heavy version I've just committed might solve it...

@clrcrl clrcrl force-pushed the dev/0.7.0 branch 5 times, most recently from bbba960 to 60a3b3c Compare January 11, 2021 15:52
@clrcrl clrcrl mentioned this pull request Mar 10, 2021
3 tasks
@clrcrl
Copy link
Contributor

clrcrl commented Mar 10, 2021

Reopened as #343 (to integrate upstream changes)

@clrcrl clrcrl closed this Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants