-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update interpret
pipeline stage to optionally generate leaf node comps for all properties
#106
Update interpret
pipeline stage to optionally generate leaf node comps for all properties
#106
Conversation
…tput-to-res-and-condo-models-04-interpret-step
05cbea9
to
93ba2ce
Compare
93ba2ce
to
05cbea9
Compare
…tput-to-res-and-condo-models-04-interpret-step
47c87c9
to
e27581f
Compare
…ter error logging
renv.lock
Outdated
@@ -760,14 +765,14 @@ | |||
}, | |||
"glue": { | |||
"Package": "glue", | |||
"Version": "1.6.2", | |||
"Version": "1.7.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few R dependencies got updated as part of the lightsnip update (ccao-data/lightsnip#10), which ended up suggesting an upgrade to the latest lightgbm release (4.3.0). I went ahead and allowed the upgrade and it doesn't seem like any breaking changes were introduced that affect us.
pipeline/04-interpret.R
Outdated
# Make sure that the leaf node tibbles are all integers, which is what | ||
# the comps algorithm expects | ||
leaf_nodes <- leaf_nodes %>% mutate_all(as.integer) | ||
training_leaf_nodes <- training_leaf_nodes %>% mutate_all(as.integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block ended up fixing the float-index slicing error that I was debugging yesterday. It seems like somehow the transformations we tweaked above ended up causing reticulate to interpret all numeric tibble values as floats rather than integers; I'm not 100% sure which transformation caused that change in behavior, but explicitly casting the tibbles to integers here resolves the issue.
@dfsnow The changes in response to your comments are officially working as of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @jeancochrane, just a few super minor fixes left + the one blocking change. I can follow all your python here since it's well-commented and don't see any obvious issues.
pipeline/01-train.R
Outdated
@@ -145,7 +145,7 @@ lgbm_model <- parsnip::boost_tree( | |||
# using floor(log2(num_leaves)) + add_to_linked_depth. Useful since | |||
# otherwise Bayesian opt spends time exploring irrelevant parameter space | |||
link_max_depth = params$model$parameter$link_max_depth, | |||
|
|||
save_tree_error = comp_enable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (blocking): You'll actually need to move this out of the overall model spec and move it to the set_args()
call that produces lgbm_model_final
. That way it's shut off during CV but is on for final fitting.
Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeancochrane The most recent run failed with an obscure error. After some debugging, I'm pretty sure it's related to R not natively supporting 64-bit integers. If the 64-bit dtype is used in python then reticulate
doesn't know what type to cast it back to unless the add-on bit64
R package is loaded.
I'm not sure how this actually succeeded before, but using 32-bit dtypes seems to solve all conversion issues.
…tput-to-res-and-condo-models-04-interpret-step
@@ -355,7 +355,7 @@ lgbm_model_final <- lgbm_model %>% | |||
stop_iter = NULL, | |||
validation = 0, | |||
trees = lgbm_final_params$num_iterations, | |||
save_tree_errors = comp_enable | |||
save_tree_error = comp_enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jk @jeancochrane, this was the real issue. The save_tree_error
arg here had a misspelling.
…tput-to-res-and-condo-models-04-interpret-step
Alright we've got a full run with (seemingly) all bugs worked out. Final log output here. Gonna go ahead and merge! |
This PR updates the pipeline to optionally generate comps for all properties using leaf node assignments and upload them to S3 based on a new
comp_enable
param. This experimental approach to comp calculation is explained in this lightsnip vignette and uses the fastest algorithm described in this blog post.The approach in this PR depends on changes in a companion branch of ccao-data/lightnsip. Before this branch gets merged, that branch should be merged into the main branch of lightsnip and the renv lockfile in this repo should be updated to point back to the main lightsnip branch.
Connects #41.