Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/fct school acquisition metrics #52

Merged
merged 20 commits into from
Jan 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
d437482
added school_weeks and made it work with fct_weekly_school_acquisition
bakerfranke Jan 15, 2024
043fc7d
Merge branch 'main' of https://github.com/code-dot-org/analytics into…
bakerfranke Jan 15, 2024
799f13d
submitting changes to fct_school_acquisition -- see int_school_weeks
bakerfranke Jan 16, 2024
73d4c4a
edit school years typo
bakerfranke Jan 16, 2024
d3775e6
Update int_school_weeks.sql
bakerfranke Jan 16, 2024
435228d
minor changes to formatting, adding tests/documentation, etc
allison-code-dot-org Jan 16, 2024
34d7796
small formatting + naming changes
allison-code-dot-org Jan 16, 2024
a4e91c6
fixed int_school_weeks timestamps so ended_at goes up through 23:59:5…
bakerfranke Jan 17, 2024
d3ab096
fixed bug that caused a full day gap between ended_at and next starte…
bakerfranke Jan 17, 2024
eb46044
attempt 2 to commit - fixed bug that caused a full day gap between en…
bakerfranke Jan 17, 2024
f43265e
temporary commit for branch switch
bakerfranke Jan 17, 2024
bfba4f2
fixed conflicts
bakerfranke Jan 19, 2024
0428347
addressed issues raised in PR
bakerfranke Jan 19, 2024
b009efd
fixed date diff
bakerfranke Jan 19, 2024
d973f75
added comment
bakerfranke Jan 19, 2024
7fff4fa
removed order by in int_school_weeks
bakerfranke Jan 19, 2024
92f0b36
removed order by from int_school_weeks
bakerfranke Jan 19, 2024
689851d
Merge branch 'main' of https://github.com/code-dot-org/analytics into…
jordan-springer Jan 20, 2024
c804885
Merge branch 'main' of https://github.com/code-dot-org/analytics into…
bakerfranke Jan 21, 2024
9d57f8e
Merge branch 'main' into feature/fct_school_acquisition_metrics
jordan-springer Jan 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion dbt/models/intermediate/_intermediate__models.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,4 +35,12 @@ models:
combination_of_columns:
- user_id
- level_id
- script_id
- script_id
- name: int_school_weeks
description: This model allows us to report metrics on a weekly basis based on timestamps. We want to do this using ISO weeks to the greatest extent possible because our school year is defined to start on July 1 and end June 30. This means that the start of the school year can start either in ISO week 26 or 27 depending on the year. Some years have 52 weeks and some bleed into a 53rd week. This table numbers the "school year weeks" with week 1 possibly being a fragment (less than 7 days) of a week that starts with July 1, and stops at the first ISO week boundary that comes after july 1. Weeks 2 through 51 fall on standard ISO week boundaries. Week 52 (or 53, depending on the year) is another possible fragment of a week running from the ISO week boundary that is closest to June 30, and running up through and including June 30. As a result, for example, the end of one school year (june 30) and the start of another (july 1) may fall entirely within ISO week 27, but in this table you might see ISO week 27 split between "school year week" 52 (being 4 days) and school year week 1 of the next year being 3 days.
tests:
- dbt_utils.unique_combination_of_columns:
combination_of_columns:
- school_year
- iso_week
- school_year_week
133 changes: 133 additions & 0 deletions dbt/models/intermediate/int_school_weeks.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
{# NOTES:
bakerfranke marked this conversation as resolved.
Show resolved Hide resolved

This creates a corallary to int_school_years, but for weeks of the school year in SCHOOL YEAR ORDER.

We frequently need to Report metrics on a weekly basis based on timestamps. We want to do this using
ISO weeks to the greatest extent possible, however, because our school year is defined to start on July 1 and end June 30.
It means that the start of the school year can start either in ISO week 26 or 27 depending on the year.
Furthermore, some years have 52 weeks and some bleed into a 53rd week.

When reporting YoY metrics based on school year, we want to line up the beginning of the year with July 1, but
adhere to standard iso week boundaries as much as possible, since it's likely that other metrics might be grouped the same way.

The SOLUTION.

1. This table numbers the "school year weeks" with week 1 possibly being a fragment (less than 7 days) of a week that starts with July 1,
and stops at the frist ISO week boundary that comes after july 1.

2. Weeks 2 through 51 fall on standard ISO week boundaries.

3. Week 52 (or 53, depending on the year) is another possible fragment of a week running from the ISO week boundary that is closest to June 30,
and running up through and including June 30.

As a result, for example, the end of one school year (june 30) and the start of another (july 1) may fall entirely within ISO week 27, but in
this table you might see ISO week 27 split between "school year week" 52 (being 4 days) and school year week 1 of the next year being 3 days.

The final table includes `started_at` and `ended_at` fields for joining (to align to our traditional practice with int_school_years).
And for each school year week it also includes an ISO week reference. Again, ISO week boundaries match up 1:1 with school_year week boundaries
EXCEPT for the school_year week 52/week 1 boundaries.

METHOD:
1. Generate all possible dates (literally all the days) and their ISO weeks from the beginning of Code.org time (~July 1, 2013)
2. Flag all the weeks where either the ISO week changes between days OR the school year changes
3. The "school year week" is the sum/count of all the flags up to that date WITHIN (partitioned by) that school year.

#}

-- Next 3 CTEs are SQL-y way of generating all the days (dates) from the beginning of code.org
-- time, culminating in the CTE 'date_range'
with

school_years as (
select *
from {{ ref('seed_school_years') }}
),

numbers_small as (
select 0 as num
union all
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
union all
select 6
union all
select 7
union all
select 8
union all
select 9
),

numbers_large as (
select a.num + 10 * b.num + 100 * c.num + 1000 * d.num + 10000 * e.num as num
from numbers_small as a
cross join numbers_small as b
cross join numbers_small as c
cross join numbers_small as d
cross join numbers_small as e
),

date_range as (
select
sy.school_year,
'2013-07-01'::date + num as date,
date_part(week, date) as iso_week
from numbers_large
left join school_years sy
on
'2013-07-01'::date
+ num between sy.started_at::date and sy.ended_at::date
where '2013-07-01'::date + num <= '2030-06-30'::date
),

flagged_week_changes as (
select
dr.*,
lag(iso_week) over (order by date) as prev_iso_week,

-- 'flag' dates where an iso week changes, or the school year changes (july 1)
case
when
iso_week != prev_iso_week
or (date_part(month, date) = 7 and date_part(day, date) = 1)
then 'flag'
end as week_change
from date_range dr
),

school_week_calc as (
select
*,
-- the school year week is the sum of all the "flagged" iso_week changes, including july 1, partition by school year so counting starts over each july 1
sum(
case
when week_change = 'flag'
then 1
else 0
end)
over (
partition by school_year
order by date rows between unbounded preceding and current row
) as school_year_week
from flagged_week_changes
)

select
school_year,
iso_week::integer,
school_year_week::integer,
min(date)::timestamp as started_at, --0:0:0 on start date
dateadd(s, -1, max(date)+1) as ended_at, --subtract 1 second from the next day to give 23:59:59 on end date
datediff('day',started_at,ended_at)+1 as days_interval
from school_week_calc
group by
school_year,
iso_week,
school_year_week
4 changes: 2 additions & 2 deletions dbt/models/intermediate/int_school_years.sql
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ school_years as (
from {{ ref('seed_school_years') }}
)

select
*

select *
from school_years
where current_date >= started_at
89 changes: 47 additions & 42 deletions dbt/models/marts/metrics/fct_weekly_school_acquisition_metrics.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

with
school_status as (
select *
Expand All @@ -16,102 +17,106 @@ school_status_sy as (
school_status.status,
school_status.active_courses,
dim_schools.school_level_simple,
dim_schools.is_stage_el,
dim_schools.is_stage_mi,
dim_schools.is_stage_hi,
school_status.school_started_at
from school_status
left join dim_schools
on school_status.school_id = dim_schools.school_id
),
)
, school_weeks as (
select * FROM {{ref('int_school_weeks')}}
)

active_schools_by_week as (
, active_schools_by_week as (
select
school_year,
school_level_simple,
status,
date_part(week, school_started_at) as start_week,
(start_week + 26) % 52 as sy_week_order, -- hardcoded for now, make dynamic later
min(school_started_at) as week_of,
sssy.school_year,
sssy.school_level_simple,
sssy.is_stage_el,
sssy.is_stage_mi,
sssy.is_stage_hi,
sssy.status,
sw.iso_week start_week, --keeping start_week alias for now, even though the output gets re-aliased as iso_week
sw.school_year_week,
sw.started_at week_of,
count(distinct school_id) as num_schools
from school_status_sy
from school_status_sy sssy
left join school_weeks sw
on school_started_at between sw.started_at and sw.ended_at
where status like 'active %'
group by 1,2,3,4,5

),
group by 1,2,3,4,5,6,7,8,9

running_totals_by_week as (
)
, running_totals_by_week as (
select
school_year,
status,
start_week,
sy_week_order,
school_year_week,
bakerfranke marked this conversation as resolved.
Show resolved Hide resolved
min(week_of)::date week_of,
sum(case when school_level_simple like '%el%' then num_schools else 0 end) as el_schools,
sum(case when school_level_simple like '%mi%' then num_schools else 0 end) as mi_schools,
sum(case when school_level_simple like '%hi%' then num_schools else 0 end) as hi_schools,
sum(case when is_stage_el=1 then num_schools else 0 end) as el_schools,
sum(case when is_stage_mi=1 then num_schools else 0 end) as mi_schools,
sum(case when is_stage_hi=1 then num_schools else 0 end) as hi_schools,
sum(el_schools) over (
partition by school_year, status order by sy_week_order
partition by school_year, status order by school_year_week
rows between unbounded preceding and current row
) el_running_total,

sum(mi_schools) over (
partition by school_year, status order by sy_week_order
partition by school_year, status order by school_year_week
rows between unbounded preceding and current row
) mi_running_total,

sum(hi_schools) over (
partition by school_year, status order by sy_week_order
partition by school_year, status order by school_year_week
rows between unbounded preceding and current row
) hi_running_total
from active_schools_by_week
group by 1,2,3,4
order by status, sy_week_order
order by status, school_year_week
),

report_by_week as (
select
'elementary' as school_level,
'elementary' as school_level,
school_year,
status,
start_week,
sy_week_order,
start_week as iso_week,
school_year_week,
week_of,
el_schools as num_schools_this_week,
el_running_total as num_schools_running_total
el_schools as num_schools_this_week,
el_running_total as num_schools_running_total

from running_totals_by_week


union all

select
'middle' as school_level,
'middle' as school_level,
school_year,
status,
start_week,
sy_week_order,
start_week as iso_week,
school_year_week,
week_of,
mi_schools as num_schools_this_week,
mi_running_total as num_schools_running_total
mi_schools as num_schools_this_week,
mi_running_total as num_schools_running_total
from running_totals_by_week

union all

select
'high' as school_level,
'high' as school_level,
school_year,
status,
start_week,
sy_week_order,
start_week as iso_week,
school_year_week,
week_of,
hi_schools as num_schools_this_week,
hi_running_total as num_schools_running_total
hi_schools as num_schools_this_week,
hi_running_total as num_schools_running_total
from running_totals_by_week
)

select *
from report_by_week