-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/dim_student_script_level_activity #170
Conversation
… feature/dim_student_course_activity
… feature/dim_student_course_activity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small change to add script_name, but otherwise good to go!
@@ -182,7 +182,7 @@ models: | |||
config: | |||
tags: ['released'] | |||
|
|||
- name: dim_student_script_level_activity | |||
- name: dim_student_course_activity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think naming as course_activity could be confusing since it's not by course, but rather by level. Are you thinking dim_student_script_level_activity is too long of a name and we need to go simpler?
Description
This pull request seeks to evaluate the efficiency gains of re-writing various models that produce our current
dim_student_script_level_activity
model.In developing this model, I have also removed
dim_student_projects
to redeploy later after we optimize.Future Work
int_daily_user_level_summary
,dim_user_course_activity
dim_active_students
Goals:
i. Reduce overall runtime
Last successful Production Run: ~1hr 48min
Current runtime of
dim_student_script_level_activity
Design
Essentially, there are two major groups of data that need to be intertwined. So how do we do that as quickly as possible given all of the other models that will have begun before this one.
Such as:
date
)In some way shape or form, I think this will need an incremental loading.
The current feature has been running for and has returned (so far) 10.5B rows
Links
Jira ticket(s): dataops-922
Testing story
I tested the model primarily using timestamps and running some checkpoints (see below*)
eg.
-
not_null
-
unique
- `dbt_utils.unique_combination_of_columns: , ["value","value","value"...]
Note: when submitting a new model for review please make sure the following have been tested:
dbt build -m 'your_model'
)or: has the dbt Cloud job succeeded?
dbt run -m 'your_model'
)select 1 from 'your_model'
)Privacy
i.
ii.
iii.
PR Checklist:
--> Note: if these are not all checked, the PR will be sent back.
.yml.
, diddbt docs generate
succeed?)dbt docs
has been updated successfully on Github Pageschore/
,feature/
,fix/
)*Redshift testing file: