{simulist} v0.3.0 full package review #117

joshwlambert · 2024-05-22T11:35:39Z

This PR is to provide a platform to review the entirety of the package.

Once this review concludes I will release v0.3.0 on GitHub.

Please see the NEWS.md file for an overview of changes between v0.2.0 and v0.3.0.

This PR is unconventional as it is not intended for merging or for additional commits (unless minor) and instead comments will be converted to issues and these will be addressed in their own PRs.

* Add DOI to README * Add CITATION file with DOI * Update CITATION.cff * Automatic readme update * Add more info * Update CITATION.cff * Fix code syntax * Update CITATION.cff * Update CITATION.cff * Automatic readme update --------- Co-authored-by: GitHub Action <action@github.com>

…ew arguments (mean_contacts, contact_interval, prob_infect)

github-actions · 2024-05-22T11:37:13Z

This pull request:

Adds 45 new dependencies (direct and indirect)
Adds 6 new system dependencies
Removes 0 existing dependencies (direct and indirect)
Removes 0 existing system dependencies

(Note that results may be inacurrate if you branched from an outdated version of the target branch.)

github-actions · 2024-05-23T08:33:36Z

This pull request:

Adds 49 new dependencies (direct and indirect)
Adds 7 new system dependencies
Removes 0 existing dependencies (direct and indirect)
Removes 0 existing system dependencies

Reach out on slack (#code-review or #help channels) to double check if there are base R alternatives to the new dependencies.

(Note that results may be inaccurate if you branched from an outdated version of the target branch.)

joshwlambert · 2024-05-23T08:35:17Z

When I originally opened this PR the review branch was many commits behind main and I did not notice (I think this was because I previously had a review branch for the last full package review (#73) and the branch did not update even though I deleted it previously). I've now fixed this by rebasing review onto main and now this PR contains all the changes up to the HEAD of main. Apologies if this has caused any inconvenience.

chartgerink

Given I previously approved simulist 0.2.0, I tried to focus on the changes for this review. I initially forgot about my previous review so pardon if I'm opening up old discussions - I tried to cross-check as much as I could.

Happy to approve again 💯 I left some comments for your consideration, but in no way are they blockers from my end.

tests/testthat/testdata/README.md

R/sim_linelist.R

chartgerink · 2024-05-28T07:58:38Z

R/sim_linelist.R

+  if (is.data.frame(hosp_risk)) {
+    hosp_risk <- .check_risk_df(
+      hosp_risk,
+      age_range = age_range
+    )
+  }
+  if (is.data.frame(hosp_death_risk)) {
+    hosp_death_risk <- .check_risk_df(
+      hosp_death_risk,
+      age_range = age_range
+    )
+  }
+  if (is.data.frame(non_hosp_death_risk)) {
+    non_hosp_death_risk <- .check_risk_df(
+      non_hosp_death_risk,
+      age_range = age_range
+    )
+  }


This seems repetitive and a good candidate for a refactor into a generalized (internal) function.

function (object, range) { if (is.data.frame(object)) { object <- .check_risk_df(object, range) } } # OR even more generic function (object, fn, range) { if (is.data.frame(object)) { object <- fn(object, range) } }

This way you don't have to repeat yourself here and any future changes need to also only be done once.

The second option does not check the function arguments so that may prove a hurdle down the line (regrettably R and strong types is not the best match).

I agree that this looks like a code smell. Thanks for the suggestions on how to refactor this chunk. I've made some changes from these suggestions on a branch called refactor_df_checks.

However, I'm not sure that I prefer the new structure. Although ideally we'd be able to remove the chain of if statements all performing the same check, when implemented in it's own function the risks need to be bundled into a list, the new function adds more code to the package, function documentation and unit tests (yet to write) are required, and then the checked objects require unpacking in the sim_*() functions.

Note the .check_risk_df() does some <data.frame> formatting and so is not called purely for side-effects of warnings or errors. Therefore returning the objects and reassigning them to variables in required.

Please let me know your thoughts on this refactor and which you prefer, or if the refactor I've implemented is different from what you had in mind. I will leave the refactor_df_checks branch open on the package so if we decide to make this change it can be merged after the v0.3.0 release.

R/sim_linelist.R

README.Rmd

chartgerink · 2024-05-28T08:41:13Z

R/utils.R

+  names_mf
+}
+
+#' Anonymise names


I notice I expect this to be .anonymize and I realize that we do not have any conventions for naming within Epiverse. I don't want to start a language discussion as I know this is personal preference, but wonder what your thoughts are on this? Do you expect functions to be named in en-GB, en-US or not do you have no implicit assumptions for yourself?

I've chosen to use en-GB as the language for the package: https://github.com/epiverse-trace/simulist/blob/main/DESCRIPTION#L50. This impacts the spell checking we perform in the package testing.

There is no Epiverse policy on which language to pick (see epiverse-trace/packagetemplate#103). There was also a discussion on the internal slack which I can send you if you're interested.

My personal policy is that if the function is exported and there is a difference in spelling between US and GB spelling then I export both. See https://github.com/epiverse-trace/epiparameter/blob/main/R/epidist.R#L674-L680. (Tidyverse do the same). For internal functions like .anonymise() I only have a GB spelling, but would not have a problem if other developers wanted their internal functions to use en-US.

chartgerink · 2024-05-28T08:47:53Z

R/utils.R

+#'
+#' @return A vector of `character` strings of equal length to the input.
+#' @keywords internal
+.anonymise <- function(x, string_len = 10) {


I sometimes have a bit of friction with our minimal dependency requirement, as I have here - it seems like a lot of work to create a md5sum/shasum for strings, which does not seem possible in base R. It makes sense under minimal dependencies to create a custom function.

digest provides this functionality as well and could save you these lines of code. It also seems quite widely used and may meet the availability expectations (although certainly not as available as ggplot2).

Yet your implementation seems pretty good as well, so I am not going to argue for adding a dependency. Surfacing my thoughts on this so you can check-in on them if you'd like. Happy to discuss, but not a hard want 😊

I did a quick search of packages that offer hashing and several looked appropriate. I decided to write an internal implementation as the package does not need all of the complexity offered by these packages and the complexity of the function was not that difficult/time consuming to write, and I don't think it adds much of a maintenance burden to the package.

To me this kind of thing can go either way without much issue, either keep the internal function or import {digest} or equivalent package. I'll leave the current implementation for now as it seems to work fine, but we can reopen this discussion if things change. 😄

pratikunterwegs

Thanks for opening @joshwlambert - I've added only a few small technical suggestions in the files below; I think apart from one regarding regexes, the others don't prevent a release. I haven't been able to look at the tests or vignettes in much detail as there are quite a few files, so hopefully others can pitch in to review those.

DESCRIPTION

pratikunterwegs · 2024-05-28T10:39:37Z

DESCRIPTION

+    https://epiverse-trace.github.io/simulist/
+BugReports: https://github.com/epiverse-trace/simulist/issues
+Depends: 
+    R (>= 3.6.0)


I've forgotten why there's an R version dependency; could you add this to the design decisions vignette?

I don't remember exactly why this version was initially chosen. It does (roughly) match up with the Epiverse policy to support the last 4 R versions. It also seems from some discussions found online that working out which is the minimum R version that the package supports is not easy (without setting up many CI runs on many older versions of R).

For now I'll leave the dependency on R >= 3.6.0 as is, and will consider adding a note to the dependencies section of the design principles vignette, although not sure we want to replicate info from blueprints in every package.

LICENSE

R/add_cols.R

R/checkers.R

pratikunterwegs · 2024-05-28T12:28:40Z

R/sim_internal.R

There are some distinct logical blocks in this function - suggest adding comments that explain what the blocks do and why they are separated.

I'm not sure I see the logical blocks you mention. Could you please explain where you think the comments should be added? I don't mind if you do this here or as an issue.

There were a number of if statements which seemed to be handling different cases, and it would be good to add short comments to explain them. It's not a priority and could be added in a future release.

inst/CITATION

joshwlambert · 2024-05-31T14:31:53Z

A big thank you to both reviewers for providing many helpful comments and suggestions. As always these reviews have improved the package.

Most comments have been responded to, linking to commit hashes or PRs that implement the change. Some changes that require more time and thought have been logged as issues and will be tackled in development for the next version. Some comments are left unresolved so that anyone can chip in and share their opinions.

I will now close this full package review PR and move on to the release checklist and release v0.3.0. 🚀

Thanks again for the speedy and enjoyable review! 😄

actions-user and others added 30 commits January 9, 2024 17:06

Update CITATION.cff

bdc3fe4

add WIP .sim_network_bp function

64ee558

fix sampling contacts and bookkeeping ancestors

effcdf1

fixed sampling contact distribution with replacement

329b3bb

increase size of allocated vectors

e23372a

fixed issue counting chain length

68ce45a

simplified vector indexing in .sim_network_bp

fb7c2b6

simplified next_gen_size calculation in .sim_network_bp

e351faa

remove preallocation of id vec and add vector growing to .sim_network_bp

41377ab

remove input checking from .sim_network_bp as it is an internal function

92cd518

updated .check_sim_input with new model arguments

acfac02

updated .sim_contacts_tbl for new simulation model

9b48081

updated exported simulation function to use new model arguments

756e11b

updated .sim_bp_linelist to use .sim_network_bp

d333197

updated add_cols function to sample infected individuals

1332929

removed contact_distribution from functions that no longer use it

5152677

removed documentation for old arguments and added documentation for n…

ccde142

…ew arguments (mean_contacts, contact_interval, prob_infect)

added details to .sim_network_bp documentation

83619cb

updated man files for updated functions

78310fc

fix typo converting contact_interval to function in sim_contacts

f62a1f9

fixed examples and inherit doc in .sim_network_bp

2f851c2

added tests for .sim_network_bp

f608936

added linelist row filtering to sim_outbreak

5abcb12

reduced maximum iterations for simulation conditioning

aac0ae1

updated tests to pass with new arguments

33cf465

pass add_names argument to .sim_network_bp

0b616cf

update simulation functions to use new arguments in vignettes

c84df98

remove mention of bpmodels from the package

bf46d38

Update CITATION.cff

b9bec96

joshwlambert and others added 12 commits May 20, 2024 16:11

update spelling and WORDLIST

5cf8b76

Automatic readme update

3bdc344

add pkgdown favicons to resolve pkgdown CI failure

6d5f8b7

update package architecture figure

13b9022

update NEWS.md with v0.3.0 changes

955f92f

update WORDLIST

9cb4340

reduce image size of package architecture diagram

741adbf

update GHA workflows

cabd31a

add update-copyright-year workflow

a9819a5

add contributor point to release_bullets

63d0bde

update check.env

535cfb6

update .gitignore

e41cabb

joshwlambert added the pkg review Full package review label May 22, 2024

chartgerink self-requested a review May 23, 2024 07:09

chartgerink approved these changes May 28, 2024

View reviewed changes

pratikunterwegs self-assigned this May 28, 2024

pratikunterwegs reviewed May 28, 2024

View reviewed changes

joshwlambert closed this May 31, 2024

joshwlambert deleted the review branch May 31, 2024 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{simulist} v0.3.0 full package review #117

{simulist} v0.3.0 full package review #117

joshwlambert commented May 22, 2024

github-actions bot commented May 22, 2024

github-actions bot commented May 23, 2024

joshwlambert commented May 23, 2024

chartgerink left a comment

chartgerink May 28, 2024

joshwlambert May 31, 2024

chartgerink May 28, 2024

joshwlambert May 30, 2024

chartgerink May 28, 2024

joshwlambert May 30, 2024

pratikunterwegs left a comment

pratikunterwegs May 28, 2024

joshwlambert May 31, 2024

pratikunterwegs May 28, 2024

joshwlambert May 31, 2024

pratikunterwegs May 31, 2024

joshwlambert commented May 31, 2024

{simulist} v0.3.0 full package review #117

{simulist} v0.3.0 full package review #117

Conversation

joshwlambert commented May 22, 2024

github-actions bot commented May 22, 2024

github-actions bot commented May 23, 2024

joshwlambert commented May 23, 2024

chartgerink left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pratikunterwegs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joshwlambert commented May 31, 2024