Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/user levels upstream update content fields #245

Merged
merged 3 commits into from
Dec 1, 2024

Conversation

nataliazm99
Copy link
Contributor

This PR has changes to two models adding the new fields content_area and topic_tags:

1. dim_student_script_level_activity
Changes:
a. Include content_area and topic_tags
b. Filtered the CTE for dim_course_structure to only pull student-facing content: content_area like 'curriculum%' or content_area in ('hoc')
c. Change the left join from user_levels to course_structure to an inner join to actually limit the content of the model to student-facing one. This was also causing a significant number of records with nulls in the fields labeling the curriculum (course_name, script_name, etc.).

2. dim_self_paced_pd_activity
Changes:
a. Include content_area and topic_tags

Validation

The following code was used to validate that the changes worked and there were no changes to upstream models

1. dim_student_script_level_activity

a. No impact on dim_user_course_activity: difference in student counts by course and school year between prod and test schema is -1 for a CSF student in 2022-23.

with 
test as (
select 
 school_year
, course_name
, user_type
, count(distinct user_id) n_users
from 
dev.dbt_natalia.dim_user_course_activity
group by 1,2,3
)
--
, prod as (
select 
 school_year
, course_name
, user_type
, count(distinct user_id) n_users
from 
dev.analytics.dim_user_course_activity
group by 1,2,3
)
--
select  
  p.school_year
, p.course_name
, p.user_type
, p.n_users
, t.n_users
, (p.n_users - t.n_users) diff_n_users
from prod p
left join test t 
on p.school_year = t.school_year
and p.course_name = t.course_name
and p.user_type = t.user_type
where p.n_users <> t.n_users
order by 1,2,3
;

b. No impact on dim_active_students: difference in student counts by school year between prod and test schema is null

with 
test as (
select 
school_year
, is_active_student
, has_user_level_activity
, count(distinct student_id) n_students
from dev.dbt_natalia.dim_active_students s
group by 1,2,3
)
--
, prod as (
select 
school_year
, is_active_student
, has_user_level_activity
, count(distinct student_id) n_students
from dev.analytics.dim_active_students s
group by 1,2,3
)
--
select 
  p.school_year
, p.is_active_student
, p.has_user_level_activity
, p.n_students
, t.n_students
from prod p
left join  test t 
on p.school_year = t.school_year
and p.is_active_student = t.is_active_student
and p.has_user_level_activity = t.has_user_level_activity
where p.n_students <> t.n_students
order by 1,2,3
;

c. Impact on dim_student_script_level_activity: difference in student counts by course and school year between prod and test schema is null

with 
test as (
select 
 school_year
, course_name
, user_type
, count(distinct student_id) n_users
from 
dev.dbt_natalia.dim_student_script_level_activity sla 
group by 1,2,3
)
--
, prod as (
select 
 school_year
, course_name
, user_type
, count(distinct student_id) n_users
from 
dev.analytics.dim_student_script_level_activity sla 
group by 1,2,3
)
--
select  
  p.school_year
, p.course_name
, p.user_type
, p.n_users
, t.n_users
, (p.n_users - t.n_users) diff_n_users
from prod p
left join test t 
on p.school_year = t.school_year
and p.course_name = t.course_name
and p.user_type = t.user_type
where p.n_users <> t.n_users
order by 1,2,3
;

d. Data including content_area and topic_tags displays is as expected:

select 
school_year
, content_area
, course_name
, script_name
, topic_tags
, count(distinct student_id) n_users
from 
dev.dbt_natalia.dim_student_script_level_activity sla 
where school_year = '2024-25'
group by 1,2,3,4,5
order by 1,2,3,4,5
;

e. No data with null course_name:

select distinct  
 content_area
, course_name 
, script_name 
from dev.dbt_natalia.dim_course_structure cs
where 
 course_name is null
order by 1;

**2. dim_self_paced_pd_activity **

a. difference in course counts between prod and test schema: difference is null

with 
test as (
select 
level_created_school_year as school_year
, course_name
, count(distinct teacher_id) n_users
from 
dev.dbt_natalia.dim_self_paced_pd_activity
group by 1,2
)
--
, prod as (
select 
level_created_school_year as school_year
, course_name
, count(distinct teacher_id) n_users
from 
dev.analytics.dim_self_paced_pd_activity
group by 1,2
)
--
select  
 p.school_year
, p.course_name
, p.n_users
, t.n_users
, (p.n_users - t.n_users) diff_n_users
from prod p
left join test t 
on p.school_year = t.school_year
and p.course_name = t.course_name
where p.n_users <> t.n_users
order by 1,2,3
;

b. Data including content_area and topic_tags displays is as expected:

select 
course_name
, content_area
, topic_tags
, count(distinct teacher_id) n_users
from 
dev.dbt_natalia.dim_self_paced_pd_activity
where level_created_school_year in ('2023-24', '2024-25')
group by 1,2,3
order by 1,2,3

Links

Jira ticket(s): DATAOPS-1084

Copy link
Collaborator

@allison-code-dot-org allison-code-dot-org left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all makes sense to me and build succeeded. Thanks for laying out the QA so nicely!

Copy link
Collaborator

@coryamanda coryamanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me too. Thank you so much for the detailed QC before submission, Natalia!

@coryamanda coryamanda merged commit 55246db into main Dec 1, 2024
1 check passed
@coryamanda coryamanda deleted the feature/user_levels_upstream_update_content_fields branch December 1, 2024 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants