Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify get_exposure_matrix to allow multiple serosurveys #72

Open
ben18785 opened this issue Jun 16, 2023 · 8 comments
Open

Modify get_exposure_matrix to allow multiple serosurveys #72

ben18785 opened this issue Jun 16, 2023 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@ben18785
Copy link
Collaborator

We may have data from multiple years. The current approach assumes a single survey year. This is linked to #70

@ntorresd
Copy link
Member

Hi @ben18785 . You're right, currently the package requires the user to use as input a single serological survey. It could also happen that a the user uses as input a dataframe including several serological surveys labelled by different values of survey.

I think the use of S3 classes can also address this issue (see #66 ). serofoi may be able to identify whether there is a single or several serological surveys in a given dataset by checking the unique values of survey or tsurand automatically create different S3 objects for each one of them. This way the user could easily create a pipeline to apply various models and analysis in one run for several serological surveys without the need to provide a different dataframe for each.

@ben18785
Copy link
Collaborator Author

Thanks @ntorresd I agree that we can use the survey and year to determine whether a user has supplied a single dataset or multiple. (I'm less sure on the need for S3 classes though? :) )

@ekamau
Copy link
Collaborator

ekamau commented Jun 21, 2023

What is in the exposure matrix? i.e., in the rows and columns?

@ben18785
Copy link
Collaborator Author

From memory, a row represents an age-group for which we have measurements; the columns are all the possible ages in the sample. Each row has a load of 1s and 0s -- the 1s are used to pick out the FOIs to which that row's age group was exposed, where we start from the oldest FOI. E.g. R code mimicking it:

ages <- seq(1, 80, 1)
measured_ages <- c(10, 20, 75)
m_exposure <- matrix(nrow = length(measured_ages),
                     ncol = length(ages))
for(i in seq_along(measured_ages)) {
  age <- measured_ages[i]
  n_zeros <- length(ages) - age
  n_ones <- age
  m_exposure[i, ] <- c(rep(0, n_zeros), rep(1, n_ones))
}

@ekamau
Copy link
Collaborator

ekamau commented Jun 22, 2023

I get it .. more like counts for cumulative exposure over time ..

@ntorresd ntorresd added the enhancement New feature or request label Jul 31, 2023
@ntorresd ntorresd self-assigned this Jul 31, 2023
@sumalibajaj
Copy link
Collaborator

Hi @ntorresd, I just had a chat with @ben18785 and think that we should think of way to code this such that even with multiple serosurveys, we still get a single set of FOIs, and not for each serosurvey. I think we discussed fitting for each serosurvey separately as my task (apologies if I misunderstood!)- so I'll put that on hold for now..

@sumalibajaj sumalibajaj self-assigned this Apr 15, 2024
@sumalibajaj
Copy link
Collaborator

Hi @ntorresd ,

This following code takes in a dataframe with multiple serosurveys, splits it by year of serosurvey, create an exposure matrix using your original code, pads old surveys (to the right) and latest surveys (to the left), and appends (rbind here) all exposure matrices. Let me know what you think:

# Extracting exposure matrix - with multiple surveys 
get_exposure_matrix_multiplesurveys <- function(serodata_multiple) {
  tsur_unique <- unique(serodata_multiple$tsur) %>% sort() # make sure it is increasing
  latest_tsur <- max(tsur_unique)

  # list to store survey specific exposure matrices
  exposure_list <- list()
  
  # create an exposure matrix for each survey
  for(i in 1:length(tsur_unique)){
    t <- tsur_unique[i]
    # survey specific data
    serodata_temp <- serodata_multiple %>%
      filter(tsur == t)
    
    # from original code to create exposure matrix
    age_class <- serodata_temp$age_mean_f
    cohort_ages <- get_cohort_ages(serodata = serodata_temp)
    ly <- nrow(cohort_ages)
    exposure <- matrix(0, nrow = length(age_class), ncol = ly)
    for (k in seq_along(age_class)) {
      exposure[k, (ly - age_class[k] + 1):ly] <- 1
    }
    
    # pad an older survey with no exposure for time till latest survey on the RIGHT
    if(t < latest_tsur){
      no_exposure_mat <- matrix(0, nrow = length(age_class), ncol = latest_tsur - t)
      exposure <- cbind(exposure, no_exposure_mat)
      # this will give the no.of cols in the final exposure matrix 
      additional_columns <- ncol(exposure)
    }
    else{
      # make no. of cols same as padded older surveys, by adding 0s on the LEFT
      new_columns <- matrix(0, nrow = nrow(exposure), 
                            ncol = additional_columns - ncol(exposure))
      exposure <- cbind(new_columns, exposure)
    }
    
    # save the exposure matrix for the survey
    exposure_list[[i]] <- exposure
  }
  
  # combine all surveys
  exposure_output <- do.call(rbind, exposure_list)
  return(exposure_output)
}

@sumalibajaj
Copy link
Collaborator

sumalibajaj commented Apr 16, 2024

Hi @ntorresd. Updated code:

get_exposure_matrix_multiplesurveys <- function(serodata_multiple) {
  tsur_unique <- unique(serodata_multiple$tsur) 

  # list to store survey specific exposure matrices
  exposure_list <- list()
  
  # create an exposure matrix for each survey
  for(i in 1:length(tsur_unique)){
    t <- tsur_unique[i]
    
    # survey specific data
    serodata_temp <- serodata_multiple %>%
      filter(tsur == t)
    
    # from original code to create exposure matrix
    # and add years as colnames
    exposure <- get_exposure_matrix(serodata_temp)
    cohort_ages_temp <- get_cohort_ages(serodata = serodata_temp)
    colnames(exposure) <- cohort_ages_temp[ ,1]
    
    # save the exposure matrix for the survey
    exposure_list[[i]] <- exposure %>% as.data.frame()
  }
  
  # combine all surveys (all years will self organise, then order increasing years)
  exposure_output <- bind_rows(exposure_list)
  exposure_output[is.na(exposure_output)] <- 0
  exposure_output <- exposure_output %>% select(order(colnames(exposure_output)))
  return(exposure_output)
}```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants