MultiCCA clustering and tumor samples #1

galadrielbriere · 2019-04-30T14:37:03Z

Hello,

I'm trying to run some parts of your benchmark and I have some questions about your code and some of the choices you made.

First, I have a question about the MultiCCA run :

cca.ret = PMA::MultiCCA(omics.transposed, ncomponents = MAX.NUM.CLUSTERS)
sample.rep = omics.transposed[[1]] %*% cca.ret$ws[[1]]

It seems here that only the fisrt omic dataset is used to generate sample.rep, reducing it using the canonical variates found for this dataset. sample.rep is then used for the clustering. Why did you choose the first omic ? Can we consider using another dataset ? Let's say :

sample.rep = omics.transposed[[2]] %*% cca.ret$ws[[2]]

What are the consequences on the results ?

Second, in the same MultiCCA run, the silhouette values of clusters are computed to chose coherent clusters :

 sils = c()
  clustering.per.num.clusters = list()
  for (num.clusters in 2:MAX.NUM.CLUSTERS) {
    cur.clustering = kmeans(sample.rep, num.clusters, iter.max=100, nstart=30)$cluster  
    sil = get.clustering.silhouette(list(t(sample.rep)), cur.clustering)
    sils = c(sils, sil)
    clustering.per.num.clusters[[num.clusters - 1]] = cur.clustering
}
 cca.clustering = clustering.per.num.clusters[[which.min(sils)]]

I don't understand the last line of this code : why did you choose the min average silhouette width ? I thought the higher the silhouette value, the better was the clustering. Shouldn't it be which.max(sils) instead ?

Finally, my last question is about the choice of removing some tissues from the datasets :

filter.non.tumor.samples <- function(raw.datum, only.primary=only.primary) {
  # 01 is primary, 06 is metastatic, 03 is blood derived cancer
  if (!only.primary)
    return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01', '03', '06')])
  else
    return(raw.datum[,substring(colnames(raw.datum), 14, 15) %in% c('01')])
}

Why did you chose to select only primary tumors for some cancers and discard other sample types like metastatic or recurrent tumor ? Is it coherent to discard only "normal" samples and keep the information on the samples types (not running the fix.patient.names function) so that the clusters also take this information ?

I hope my questions are clear,
Thank you in advance !
Galadriel

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiCCA clustering and tumor samples #1

MultiCCA clustering and tumor samples #1

galadrielbriere commented Apr 30, 2019

MultiCCA clustering and tumor samples #1

MultiCCA clustering and tumor samples #1

Comments

galadrielbriere commented Apr 30, 2019