-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
{simulist} v0.3.0 full package review #117
Conversation
* Add DOI to README * Add CITATION file with DOI * Update CITATION.cff * Automatic readme update * Add more info * Update CITATION.cff * Fix code syntax * Update CITATION.cff * Update CITATION.cff * Automatic readme update --------- Co-authored-by: GitHub Action <action@github.com>
…ew arguments (mean_contacts, contact_interval, prob_infect)
This pull request:
(Note that results may be inacurrate if you branched from an outdated version of the target branch.) |
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
When I originally opened this PR the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given I previously approved simulist 0.2.0, I tried to focus on the changes for this review. I initially forgot about my previous review so pardon if I'm opening up old discussions - I tried to cross-check as much as I could.
Happy to approve again 💯 I left some comments for your consideration, but in no way are they blockers from my end.
if (is.data.frame(hosp_risk)) { | ||
hosp_risk <- .check_risk_df( | ||
hosp_risk, | ||
age_range = age_range | ||
) | ||
} | ||
if (is.data.frame(hosp_death_risk)) { | ||
hosp_death_risk <- .check_risk_df( | ||
hosp_death_risk, | ||
age_range = age_range | ||
) | ||
} | ||
if (is.data.frame(non_hosp_death_risk)) { | ||
non_hosp_death_risk <- .check_risk_df( | ||
non_hosp_death_risk, | ||
age_range = age_range | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems repetitive and a good candidate for a refactor into a generalized (internal) function.
function (object, range) {
if (is.data.frame(object)) {
object <- .check_risk_df(object, range)
}
}
# OR even more generic
function (object, fn, range) {
if (is.data.frame(object)) {
object <- fn(object, range)
}
}
This way you don't have to repeat yourself here and any future changes need to also only be done once.
The second option does not check the function arguments so that may prove a hurdle down the line (regrettably R and strong types is not the best match).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this looks like a code smell. Thanks for the suggestions on how to refactor this chunk. I've made some changes from these suggestions on a branch called refactor_df_checks
.
However, I'm not sure that I prefer the new structure. Although ideally we'd be able to remove the chain of if
statements all performing the same check, when implemented in it's own function the risks need to be bundled into a list, the new function adds more code to the package, function documentation and unit tests (yet to write) are required, and then the checked objects require unpacking in the sim_*()
functions.
Note the .check_risk_df()
does some <data.frame>
formatting and so is not called purely for side-effects of warnings or errors. Therefore returning the objects and reassigning them to variables in required.
Please let me know your thoughts on this refactor and which you prefer, or if the refactor I've implemented is different from what you had in mind. I will leave the refactor_df_checks
branch open on the package so if we decide to make this change it can be merged after the v0.3.0 release.
names_mf | ||
} | ||
|
||
#' Anonymise names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice I expect this to be .anonymize
and I realize that we do not have any conventions for naming within Epiverse. I don't want to start a language discussion as I know this is personal preference, but wonder what your thoughts are on this? Do you expect functions to be named in en-GB, en-US or not do you have no implicit assumptions for yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've chosen to use en-GB as the language for the package: https://github.com/epiverse-trace/simulist/blob/main/DESCRIPTION#L50. This impacts the spell checking we perform in the package testing.
There is no Epiverse policy on which language to pick (see epiverse-trace/packagetemplate#103). There was also a discussion on the internal slack which I can send you if you're interested.
My personal policy is that if the function is exported and there is a difference in spelling between US and GB spelling then I export both. See https://github.com/epiverse-trace/epiparameter/blob/main/R/epidist.R#L674-L680. (Tidyverse do the same). For internal functions like .anonymise()
I only have a GB spelling, but would not have a problem if other developers wanted their internal functions to use en-US.
#' | ||
#' @return A vector of `character` strings of equal length to the input. | ||
#' @keywords internal | ||
.anonymise <- function(x, string_len = 10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sometimes have a bit of friction with our minimal dependency requirement, as I have here - it seems like a lot of work to create a md5sum/shasum for strings, which does not seem possible in base R. It makes sense under minimal dependencies to create a custom function.
digest provides this functionality as well and could save you these lines of code. It also seems quite widely used and may meet the availability expectations (although certainly not as available as ggplot2).
Yet your implementation seems pretty good as well, so I am not going to argue for adding a dependency. Surfacing my thoughts on this so you can check-in on them if you'd like. Happy to discuss, but not a hard want 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick search of packages that offer hashing and several looked appropriate. I decided to write an internal implementation as the package does not need all of the complexity offered by these packages and the complexity of the function was not that difficult/time consuming to write, and I don't think it adds much of a maintenance burden to the package.
To me this kind of thing can go either way without much issue, either keep the internal function or import {digest} or equivalent package. I'll leave the current implementation for now as it seems to work fine, but we can reopen this discussion if things change. 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening @joshwlambert - I've added only a few small technical suggestions in the files below; I think apart from one regarding regexes, the others don't prevent a release. I haven't been able to look at the tests or vignettes in much detail as there are quite a few files, so hopefully others can pitch in to review those.
https://epiverse-trace.github.io/simulist/ | ||
BugReports: https://github.com/epiverse-trace/simulist/issues | ||
Depends: | ||
R (>= 3.6.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've forgotten why there's an R version dependency; could you add this to the design decisions vignette?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly why this version was initially chosen. It does (roughly) match up with the Epiverse policy to support the last 4 R versions. It also seems from some discussions found online that working out which is the minimum R version that the package supports is not easy (without setting up many CI runs on many older versions of R).
For now I'll leave the dependency on R >= 3.6.0 as is, and will consider adding a note to the dependencies section of the design principles vignette, although not sure we want to replicate info from blueprints in every package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some distinct logical blocks in this function - suggest adding comments that explain what the blocks do and why they are separated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see the logical blocks you mention. Could you please explain where you think the comments should be added? I don't mind if you do this here or as an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were a number of if
statements which seemed to be handling different cases, and it would be good to add short comments to explain them. It's not a priority and could be added in a future release.
A big thank you to both reviewers for providing many helpful comments and suggestions. As always these reviews have improved the package. Most comments have been responded to, linking to commit hashes or PRs that implement the change. Some changes that require more time and thought have been logged as issues and will be tackled in development for the next version. Some comments are left unresolved so that anyone can chip in and share their opinions. I will now close this full package review PR and move on to the release checklist and release v0.3.0. 🚀 Thanks again for the speedy and enjoyable review! 😄 |
This PR is to provide a platform to review the entirety of the package.
Once this review concludes I will release v0.3.0 on GitHub.
Please see the
NEWS.md
file for an overview of changes between v0.2.0 and v0.3.0.This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.