-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to generate_sim_data
#169
Comments
It's fine @ben18785, I also think we can improve these functions a lot to make them easier and more intuitive to use. Just to clarify some points. As stated in the
This is the red line in the corresponding example on the vignettes. Another important point to note is the age structure of the serosurvey:
At the time I generated the simulated datasets I didn't realized that the ages were not saved correctly; it should read:
This doesn't affect the rest of the calculation because when computing the age group markers On the other hand, note that this serosurvey is grouped by age. As you pointed out, we don't have a direct way to simulate heterogeneous age cohorts. The way we bypass this is simulating data for each age between
And then:
which returns a prepared serosurvey with the characteristics we wanted (in retrospective I think we shouldn't use
Baring this in mind:
I agree on changing the names of the functions and variables as you suggest. Some of them may be more troublesome than others, since we have to change them as well in the preloaded datasets in order for the R-CMD checks to pass (just as we need to do for #112). I suggest we open a separate issue and PR to address these changes of names.
This was recently addressed by @jpavlich in #168. I merged it today, so please rebase your branch before continuing.
I think we should keep at least |
As discussed in in-person meeting, we will change this to be of the form: feature_df <- data.frame(
age_min=c(1, 6, 11),
age_max=c(5, 10, 20),
sample_size=c(10, 15, 20),
year_survey=c(2010, 2010, 2010)
)
serosurvey_time_example <- simulate_serosurvey(
model="time",
foi = data.frame(year=c(1990,1991,...,2009), foi=c(0.1, 0.2, ..., 0.3)),
seroreversion=0,
survey_features=feature_df
)
serosurvey_age_example <- simulate_serosurvey(
model="age",
foi = data.frame(age=c(1,2,...,20), foi=c(0.1, 0.2, ..., 0.3)),
seroreversion=0.2,
survey_features=feature_df
)
ages <- seq(1, 10, 1)
years <- seq(1990, 2000, 1)
foi_age_and_time <- expand_grid(year=years, age=ages) %>%
mutate(foi=0.1)
serosurvey_age_and_time_example <- simulate_serosurvey(
model="age-time",
foi = foi_age_and_time,
seroreversion=0.2,
survey_features=feature_df
) But internally & externally we will have e.g. |
Sorry @ntorresd -- I'm revisiting this one as I've genuinely found it hard to use this function.
To give an example, I am trying to replace the reliance on the
simdata_large_epi
file in a vignette by replacing it with a simulated dataset as per #160. I tried:df <- generate_sim_data( sim_data=data.frame(age_mean_f=c(2, 7, 12, 17, 22, 27, 32, 37, 42, 47), tsur=2050), foi=rep(1.5, 47), sample_size_by_age = c(20, 25, 25, 25, 25, 25, 25, 25, 25, 30) )
which I thought should work since it allows 47 FOIs (which seems sensible...from 1 up until a max age of 47). From reading the function documentation, it's still not clear to me how to get this to work, and I think we should make various changes.
generate_sim_data
is clunky and vague;simulate_serosurvey
uses full words and is immediately understandable, and I propose this change of name.In addition, I think we should consider changes to its arguments; current arguments for this include:
sim_data
which is a vague name and I would propose changing it toserosurvey_characteristics
. This currently has a column calledage_mean_f
which specifies "Age group markers" -- I am not sure what this means. I'd suggest we change this column to be namedages_surveyed
; I would suggest changingtsur
toyear_survey
.sample_size_by_age
would actually work better as a column in this data frame since then it's guaranteed to be of the right length, and I would suggest renaming itn_sample
foi
works fine as it is for either time- or age-varying FOIs, but it won't generalise to time- and age-varying FOIs. To handle this, I propose a change to it: we require that users supply a data frame with columns:age
andfoi
for age-varying FOIs;year
andfoi
for time-varying FOIs; andage
,year
andfoi
for age- and time-varying FOIs. This has the added benefit that we can check the users are supplying the right inputs for whichever type of model they have.We also need to check the inputs to the function and give the user better error handling when they have supplied inappropriate arguments:
sim_data
argument)I also propose changes to the output of the function:
age_mean_f
,age_min
andage_max
since, to the best of my knowledge, we don't have a way of simulating data for heterogeneous age cohorts (e.g. a group of individuals with ages between 10 and 20); we probably should have thistsur
could beyear_survey
counts
could ben_seropositive
total
should ben_sample
to mirror the function inputssurvey
can just be removed (and I'd suggest removing it from the inputs)The text was updated successfully, but these errors were encountered: