Modify `get_exposure_matrix` to allow multiple serosurveys #72

ben18785 · 2023-06-16T11:42:17Z

We may have data from multiple years. The current approach assumes a single survey year. This is linked to #70

ntorresd · 2023-06-20T20:27:07Z

Hi @ben18785 . You're right, currently the package requires the user to use as input a single serological survey. It could also happen that a the user uses as input a dataframe including several serological surveys labelled by different values of survey.

I think the use of S3 classes can also address this issue (see #66 ). serofoi may be able to identify whether there is a single or several serological surveys in a given dataset by checking the unique values of survey or tsurand automatically create different S3 objects for each one of them. This way the user could easily create a pipeline to apply various models and analysis in one run for several serological surveys without the need to provide a different dataframe for each.

ben18785 · 2023-06-20T21:59:55Z

Thanks @ntorresd I agree that we can use the survey and year to determine whether a user has supplied a single dataset or multiple. (I'm less sure on the need for S3 classes though? :) )

ekamau · 2023-06-21T20:20:45Z

What is in the exposure matrix? i.e., in the rows and columns?

ben18785 · 2023-06-22T07:44:51Z

From memory, a row represents an age-group for which we have measurements; the columns are all the possible ages in the sample. Each row has a load of 1s and 0s -- the 1s are used to pick out the FOIs to which that row's age group was exposed, where we start from the oldest FOI. E.g. R code mimicking it:

ages <- seq(1, 80, 1)
measured_ages <- c(10, 20, 75)
m_exposure <- matrix(nrow = length(measured_ages),
                     ncol = length(ages))
for(i in seq_along(measured_ages)) {
  age <- measured_ages[i]
  n_zeros <- length(ages) - age
  n_ones <- age
  m_exposure[i, ] <- c(rep(0, n_zeros), rep(1, n_ones))
}

ekamau · 2023-06-22T21:19:16Z

I get it .. more like counts for cumulative exposure over time ..

sumalibajaj · 2024-03-13T10:08:35Z

Hi @ntorresd, I just had a chat with @ben18785 and think that we should think of way to code this such that even with multiple serosurveys, we still get a single set of FOIs, and not for each serosurvey. I think we discussed fitting for each serosurvey separately as my task (apologies if I misunderstood!)- so I'll put that on hold for now..

sumalibajaj · 2024-04-15T16:01:01Z

Hi @ntorresd ,

This following code takes in a dataframe with multiple serosurveys, splits it by year of serosurvey, create an exposure matrix using your original code, pads old surveys (to the right) and latest surveys (to the left), and appends (rbind here) all exposure matrices. Let me know what you think:

# Extracting exposure matrix - with multiple surveys 
get_exposure_matrix_multiplesurveys <- function(serodata_multiple) {
  tsur_unique <- unique(serodata_multiple$tsur) %>% sort() # make sure it is increasing
  latest_tsur <- max(tsur_unique)

  # list to store survey specific exposure matrices
  exposure_list <- list()
  
  # create an exposure matrix for each survey
  for(i in 1:length(tsur_unique)){
    t <- tsur_unique[i]
    # survey specific data
    serodata_temp <- serodata_multiple %>%
      filter(tsur == t)
    
    # from original code to create exposure matrix
    age_class <- serodata_temp$age_mean_f
    cohort_ages <- get_cohort_ages(serodata = serodata_temp)
    ly <- nrow(cohort_ages)
    exposure <- matrix(0, nrow = length(age_class), ncol = ly)
    for (k in seq_along(age_class)) {
      exposure[k, (ly - age_class[k] + 1):ly] <- 1
    }
    
    # pad an older survey with no exposure for time till latest survey on the RIGHT
    if(t < latest_tsur){
      no_exposure_mat <- matrix(0, nrow = length(age_class), ncol = latest_tsur - t)
      exposure <- cbind(exposure, no_exposure_mat)
      # this will give the no.of cols in the final exposure matrix 
      additional_columns <- ncol(exposure)
    }
    else{
      # make no. of cols same as padded older surveys, by adding 0s on the LEFT
      new_columns <- matrix(0, nrow = nrow(exposure), 
                            ncol = additional_columns - ncol(exposure))
      exposure <- cbind(new_columns, exposure)
    }
    
    # save the exposure matrix for the survey
    exposure_list[[i]] <- exposure
  }
  
  # combine all surveys
  exposure_output <- do.call(rbind, exposure_list)
  return(exposure_output)
}

sumalibajaj · 2024-04-16T17:25:47Z

Hi @ntorresd. Updated code:

get_exposure_matrix_multiplesurveys <- function(serodata_multiple) {
  tsur_unique <- unique(serodata_multiple$tsur) 

  # list to store survey specific exposure matrices
  exposure_list <- list()
  
  # create an exposure matrix for each survey
  for(i in 1:length(tsur_unique)){
    t <- tsur_unique[i]
    
    # survey specific data
    serodata_temp <- serodata_multiple %>%
      filter(tsur == t)
    
    # from original code to create exposure matrix
    # and add years as colnames
    exposure <- get_exposure_matrix(serodata_temp)
    cohort_ages_temp <- get_cohort_ages(serodata = serodata_temp)
    colnames(exposure) <- cohort_ages_temp[ ,1]
    
    # save the exposure matrix for the survey
    exposure_list[[i]] <- exposure %>% as.data.frame()
  }
  
  # combine all surveys (all years will self organise, then order increasing years)
  exposure_output <- bind_rows(exposure_list)
  exposure_output[is.na(exposure_output)] <- 0
  exposure_output <- exposure_output %>% select(order(colnames(exposure_output)))
  return(exposure_output)
}```

ntorresd added the enhancement New feature or request label Jul 31, 2023

ntorresd self-assigned this Jul 31, 2023

sumalibajaj self-assigned this Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify `get_exposure_matrix` to allow multiple serosurveys #72

Modify `get_exposure_matrix` to allow multiple serosurveys #72

ben18785 commented Jun 16, 2023

ntorresd commented Jun 20, 2023

ben18785 commented Jun 20, 2023

ekamau commented Jun 21, 2023

ben18785 commented Jun 22, 2023

ekamau commented Jun 22, 2023

sumalibajaj commented Mar 13, 2024

sumalibajaj commented Apr 15, 2024

sumalibajaj commented Apr 16, 2024 •

edited

Loading

Modify get_exposure_matrix to allow multiple serosurveys #72

Modify get_exposure_matrix to allow multiple serosurveys #72

Comments

ben18785 commented Jun 16, 2023

ntorresd commented Jun 20, 2023

ben18785 commented Jun 20, 2023

ekamau commented Jun 21, 2023

ben18785 commented Jun 22, 2023

ekamau commented Jun 22, 2023

sumalibajaj commented Mar 13, 2024

sumalibajaj commented Apr 15, 2024

sumalibajaj commented Apr 16, 2024 • edited Loading

Modify `get_exposure_matrix` to allow multiple serosurveys #72

Modify `get_exposure_matrix` to allow multiple serosurveys #72

sumalibajaj commented Apr 16, 2024 •

edited

Loading