refactor prepare_serodata #191

zmcucunuba · 2024-05-15T15:27:35Z

At the moment, prepare_serodata only produces age_mean_f and adds some columns such as binomial confidence intervals. They could remain. But perhaps we don't need the sample_size column, which simply calculates the total sample size and adds that column, which may be misleading. In the short term, we must remove the sample_size column.

In the mid-long term, we need prepare_serodata to quality-control the data for the user.

The text was updated successfully, but these errors were encountered:

ben18785 · 2024-05-16T15:15:56Z

A few thoughts:

We already do checking of the user inputs within the fit_seromodel function through validate_prepared_serodata function. To me, this means that it's not necessary to force users to go through a pre-fitting quality checking.
prepare_serodata outputs a data frame with lots of columns in it. Only a small subset of these are necessary for fitting the model: total, counts, age_mean_f (I think just renamed age_middle) and tsur (this only being needed if there are multiple serosurveys).
The other columns outputted by prepare_serodata are useful for plotting.

A proposal:

prepare_serodata is kept but perhaps renamed to prepare_serodata_for_plotting. I would suggest modifying it to change column names as per the below.
For fitting via fit_seromodel, we require a user only to supply a data frame with age_min, age_max, n_tested, n_seropositive and year_survey. We make it clear within the function documentation that we just use the mean of age_min and age_max for the age when performing inference. This gives us flexibility for the future when we may want to account for the uncertainty in the distribution of ages within the bins when doing inference. This would mean that we would need to change validate_prepared_serodata (perhaps renamed as validate_serodata) to handle the new inputs.

zmcucunuba · 2024-05-17T19:07:21Z

Yes, I agree—many thanks. We agreed with Sumali to have something like:

prepare_sero_data <- function(my_raw_sero_data) {

# Use my_raw_sero_data to produce a data.frame with 5 names: 
sero_data <- data.frame(n_tested, n_seropositive, year_survey, age_min, age_max)
return(sero_data)

}

`sero_data` <- prepare_sero_data(my_raw_sero_data)

sero_data witll be used in other functions like set_sero_model, fit_sero_modeland others related to #190

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor prepare_serodata #191

refactor prepare_serodata #191

zmcucunuba commented May 15, 2024 •

edited

Loading

ben18785 commented May 16, 2024

zmcucunuba commented May 17, 2024 •

edited

Loading

refactor prepare_serodata #191

refactor prepare_serodata #191

Comments

zmcucunuba commented May 15, 2024 • edited Loading

ben18785 commented May 16, 2024

zmcucunuba commented May 17, 2024 • edited Loading

zmcucunuba commented May 15, 2024 •

edited

Loading

zmcucunuba commented May 17, 2024 •

edited

Loading