Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dim_user_levels to include content_area and topic_tags #244

Merged
merged 4 commits into from
Nov 26, 2024

Conversation

nataliazm99
Copy link
Contributor

There's two changes in this PR:

  1. Adding content_area and topic_tags to dim_user_levels
  2. Change to staging scripts layer to define content_area as 'other' when null to make it consistent with the logic we use for course_names where nulls are defined as 'other'

Jira ticket(s): DATAOPS-1082

1. Adding content_area and topic_tags to dim_user_levels
2. Change to staging scripts layer to define content_area as 'other'  when null to make it consistent with the logic we use for course_names where nulls are defined as 'other'
@nataliazm99
Copy link
Contributor Author

QA'd by

  1. Comparing test schema vs prod for:
    a. dim_active_students --> No difference
    b. dim_user_course_activity --> difference of 1 csf student in 2022-23

  2. Validating that user counts for combinations of course, content area, and topic tags make sense.

Code used for QA:

1.a

with 
test as (
select 
school_year
, is_active_student
, has_user_level_activity
, count(distinct student_id) n_students
from dev.dbt_natalia.dim_active_students s
group by 1,2,3
)
--
, prod as (
select 
school_year
, is_active_student
, has_user_level_activity
, count(distinct student_id) n_students
from dev.analytics.dim_active_students s
group by 1,2,3
)
--
select 
  p.school_year
, p.is_active_student
, p.has_user_level_activity
, p.n_students
, t.n_students
from prod p
left join  test t 
on p.school_year = t.school_year
and p.is_active_student = t.is_active_student
and p.has_user_level_activity = t.has_user_level_activity
where p.n_students <> t.n_students
order by 1,2,3
;

1.b.

with 
test as (
select 
 school_year
, course_name
, user_type
, count(distinct user_id) n_users
from 
dev.dbt_natalia.dim_user_course_activity
group by 1,2,3
)
--
, prod as (
select 
 school_year
, course_name
, user_type
, count(distinct user_id) n_users
from 
dev.analytics.dim_user_course_activity
group by 1,2,3
)
--
select  
  p.school_year
, p.course_name
, p.user_type
, p.n_users
, t.n_users
, (p.n_users - t.n_users) diff_n_users
from prod p
left join test t 
on p.school_year = t.school_year
and p.course_name = t.course_name
and p.user_type = t.user_type
where p.n_users <> t.n_users
order by 1,2,3
;
select
school_year
, content_area
, course_name 
, topic_tags
--, script_name
, count(distinct user_id) n_users
from dev.dbt_natalia.dim_user_levels
group by 1,2,3,4--,5
order by 1,2,3,4--,5
;

case
when course_name = 'hoc'
then 'hoc' -- If course_name is HOC, content area is HOC too
when nullif(content_area,'') is null then 'other' -- If content area is null then 'other' to align with course_name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a @jordan-springer question, but any difference between this statement and when content_area = ''?

Copy link
Collaborator

@allison-code-dot-org allison-code-dot-org left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The CI run was successful, and then I updated the branch, kicking off another CI run. Approving now assuming that is also successful.

@nataliazm99
Copy link
Contributor Author

@jordan-springer ready for you to review

Copy link
Collaborator

@jordan-springer jordan-springer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jordan-springer jordan-springer merged commit b4f55ac into main Nov 26, 2024
1 check passed
@jordan-springer jordan-springer deleted the feature/update_dim_user_levels_content_fields branch November 26, 2024 22:17
@jordan-springer jordan-springer restored the feature/update_dim_user_levels_content_fields branch November 27, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants