SomaticVariantsReport.Rmd

---
title: "SUMMIT 'basket' trial biomarker genomic report"
author: "Manuel Duval / GeneCreek"
date: "Nov. 04th, 2017"
output: 
  html_document:
    theme: spacelab
    highlight: espresso
    code_folding: hide
    toc: true
    toc_depth: 3
    toc_float: true
---
version#: `r Sys.time()`  

#Scopes  
 
**Describes ErBb2 somatic variations via lolliplots.**  
**Reports co-occurrence of somatic variations in oncogenes and tumor suppressor genes with oncoplots.**  
**Renders change from baseline and PFS via waterfall and swim plots aligned with oncoplots**  

#Data   
**biomarker log data cut**   

##Methods  
-the report is created with the statistical computing and graphics R environment.  
-the R markdown script referred to as SUMMITBioMarkerGenomicReport.Rmd reads the set of Erbb2 genotype data as well as MSK-IMPACT and non MSK-IMPACT targeted genes' genotypes form the input SUMMITbiomarkerLogDataCut23-Jun-2017.xls excel formatted file.  
-In addition to general purposes packages like dplyr and DT, the two main specific packages applied by the script are [trackViewer](https://bioconductor.org/packages/release/bioc/html/trackViewer.html) and [ComplexHeatmap](https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html): these are used to generate lollipop plots and oncoplots respectively.  Detail information about the packages version are reported at the end of the document.  

##New (11/03/2017)
-Oct. 20 data cut;  
##Next 
-Once fully parsed, render the list of variants (a QC step);   

<hr />

```{r, echo = F, warning = FALSE}
library(knitr)
opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = TRUE)

#load the libjvm lib
dyn.load('/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/lib/server/libjvm.dylib')
```


```{r chunk_packages, echo = TRUE, results = 'hide', message = FALSE, warning = FALSE}
#Dependencies.  
sapply(c("XLConnect", "XLConnectJars", "knitr", "ggplot2", "GenomicRanges", "rtracklayer", "trackViewer", "plyr", "dplyr", "DT", "stringr", "tidyr", "magrittr","ComplexHeatmap", "GetoptLong", "gridExtra", "cowplot", "grid"), library, character.only = TRUE)
```


```{r chunk_DataLoad, echo = TRUE, results = 'hide', message = FALSE, warning = FALSE,  tidy = TRUE}
#Loading data and setting variables.  

#loading worksheet
SummitBioMarker0818 <- loadWorkbook(paste(getwd(), "/data/SUMMITBiomarkerLogDataCut20-OCT-2017.xlsx", sep =  ""))

###~~~~setting some global variables

#the Erbb2Features' GRanges object used to draw lolliplots with the Erbb2 domains.  
#instantiating the Erbb2Features GenomicRanges object which holds ErBb2 protein domains. 
Erbb2Features <- GRanges("ERBB2", IRanges(c(1, 652, 675, 720, 1003),  width = c(651, 23, 45, 285, 405), names = c("Extracellular", "TM", "JM", "Kinase", "Tail")))
Erbb2Features$fill <- c("bisque", "coral2", "darkseagreen1","skyblue", "chartreuse2")
Erbb2Features$height <- c(0.02, 0.05, 0.035, 0.04, 0.03)

#Oct. 3rd, 2017: adjusting height of the ERBB2 domain in the context of enlarging the lolliplot
Erbb2Features2 <- Erbb2Features
Erbb2Features2$height <- c(0.01, 0.025, 0.017, 0.02, 0.015)

#instantiating the Erbb2 kinase domain GenomicRanges object which holds ErBb2 kinase domain only. 
Erbb2KinaseFeature <- GRanges("ERBB2", IRanges(c(1), width = c(200)), names = c("Kinase"))
Erbb2KinaseFeature$fill <- c ("skyblue")
Erbb2KinaseFeature$height <- c(0.05)

#setting the Best Overall Response legend for the lolliplots
response <- c("ND", "PD", "SD", "PR", "CR")
response.color.set <- as.list(as.data.frame(rbind(c("gray", "black", "yellow", "green", "red"), "#FFFFFFFF"), stringsAsFactors = FALSE))
names(response.color.set) <- response
###~~~~

#function for the oncoplot
alter_fun = list(
    background = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "#CCCCCC", col = NA))
    },
    Missense = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "green", col = NA))
    },
    DeepDel = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "blue", col = NA))
    },
    AMP = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "red", col = NA))
    },
    Promoter = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "purple", col = NA))
    },
    Nonsense = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "black", col = NA))
    },    
    MissenseAmp = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h*0.33, gp = gpar(fill = "cyan", col = NA))
    },
    fusionAmp = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h*0.33, gp = gpar(fill = "cadetblue1", col = NA))
    },
    splice = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "coral3", col = NA))
    },
    Gain = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "darksalmon", col = NA))
    }, 
    rearrangement = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.8, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "khaki", col = NA))
    }, 
    Indel = function(x, y, w, h) {
        grid.rect(x, y, w-unit(0.5, "mm"), h-unit(0.1, "mm"), gp = gpar(fill = "darkgreen", col = NA))
    }
)
col = c("Missense" = "green", "AMP" = "red", "Promoter" = "purple", "DeepDel" = "blue", "Nonsense" = "black", "MissenseAmp" = "cyan", "Indel" = "darkgreen", "splice" = "coral3", "Gain" = "darksalmon", "rearrangement" = "khaki", "fusionAmp" = "cadetblue1")
```


#Biliary  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r chunk_biliary, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading biliary enrolment data set
biliary <- readWorksheet(SummitBioMarker0818, sheet = "Biliary tract", endCol = 17, endRow = 14)
biliary[biliary == ""]  <- NA
#tabular views
kable(t(as.matrix(table(biliary$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(biliary$Mutation.code, biliary$Primary.Cell.Type), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(biliary$Mutation.code, biliary$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
biliary$Best.Overall.Response <- as.factor(str_sub(biliary$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
biliary$Best.Overall.Response <- factor(biliary$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
#reporting it via a table
BestOverallResPerVar <-  as.data.frame.matrix(table(biliary$Mutation.code, biliary$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(biliary$Mutation.code, biliary$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#biliary$Clinical.Benefit <- revalue(biliary$Clinical.Benefit, c("pending" = "NA"))
```
<hr /> 

##Lolliplot: ErBb2 somatic variants annotated with Best Overall Response values  
```{r chunk_biliarySNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa sequence
biliary$ErBb2SomaticVarCoord <- as.numeric(str_extract(biliary$Mutation.code, "\\d+"))
#adding two additional variables: the wild type aa and the substituted aa
biliary$ErBb2WildTypeAA <- str_sub(biliary$Mutation.code, 1, 1)
biliary$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", biliary$Mutation.code, perl = TRUE)

#reordering the dataframe according to the aa coordinate
biliary <- biliary[order(biliary$ErBb2SomaticVarCoord),]

biliaryDf1 <- data.frame(biliary$ErBb2SomaticVarCoord, paste0(biliary$ErBb2WildTypeAA, biliary$ErBb2SomaticVarCoord), biliary$ErBb2SubsAA, biliary$Best.Overall.Response)
#the biliaryDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(biliaryDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
biliaryDf2 <- as.data.frame(unique(biliaryDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
biliaryDf2$SNV <- paste0(biliaryDf2$WTaaCoord, biliaryDf2$Varalleles)
#adding it to the initial biliaryDf1 df
biliaryDf1 <- merge(biliaryDf1, biliaryDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
biliaryDf1$VarRes <- paste0(biliaryDf1$Response, biliaryDf1$SNV)

# replacing NA with the 'ND' string value for Not Determined
biliaryDf1$Response <- gsub("NA", "ND", biliaryDf1$Response)

# set order for clinical response value to bottom CR, PR, SD, PD, ND
biliaryDf1$Response <- str_replace_all(str_c(biliaryDf1$Response), c(CR = "E-CR", PR = "D-PR", SD = "C-SD", PD = "B-PD", ND = "A-ND"))
# order the df
biliaryDf1 <- biliaryDf1[order(biliaryDf1$coord, biliaryDf1$Response), ]

## 
biliaryDf3 <- biliaryDf1
## 

biliaryDf1 <- unique(ddply(biliaryDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
biliaryDf1 <- biliaryDf1[order(biliaryDf1$coord, biliaryDf1$VarRes),]

#instantiating the GRanges object with the coordinates and names of somatic variations
biliary.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(biliaryDf1)[1], function(i) rep(biliaryDf1$coord[i], biliaryDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(biliaryDf1)[1], function(i) rep(biliaryDf1$SNV[i], biliaryDf1$score[i])))))

#adding the stack.factor attribute
biliary.gr$stack.factor <- unlist(sapply(1:dim(biliaryDf1)[1], function(i) paste0(biliaryDf1$Response[i], "_", seq(1:biliaryDf1$score[i]))))
biliary.gr$value1 <- 100
biliary.gr$value2 <- 100 - biliary.gr$value1

biliary.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", biliary.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

#editing the gr object's stack.factor variable
biliaryDf3 <- ddply(biliaryDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
biliaryDf3 <- biliaryDf3[order(biliaryDf3$coord, biliaryDf3$Response),]

biliaryDf3$idx <- str_replace_all(str_c(biliaryDf3$idx), c(`1` = "A", `2` = "B", `3` = "C", `4` = "D"))
biliary.gr$stack.factor <- as.character(biliaryDf3$idx)

#plotting
lolliplot(biliary.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7)
```


##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r chunkBiliaryOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7, fig.height = 6.2}
#load co-occurring somatic variants
biliarySomatics <- readWorksheet(SummitBioMarker0818, sheet = "Biliary tract", endCol = 7, startRow = 19,  endRow = 111)
#Nov3: filling out blank cells.
biliarySomatics  <- biliarySomatics  %>% fill(Subject.Identifier.for.the.Study)
#IMPACT data
biliarySomaticsImpact <- biliarySomatics[(biliarySomatics$Gene != "not done"),]
#subsetting
biliarySomaticsImpact <- biliarySomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
biliarySomaticsImpact <- unique(biliarySomaticsImpact)
#set Subject.id to character
biliarySomaticsImpact$Subject.Identifier.for.the.Study <- as.character(biliarySomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
biliarySomaticsImpact <- na.omit(biliarySomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
biliarySomaticsImpact <- biliarySomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(biliarySomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
biliarySomaticsNonImpact <- biliarySomatics[(biliarySomatics$Gene == "not done"),]
biliarySomaticsNonImpact <- biliarySomaticsNonImpact[,c(1,6,7)]
biliarySomaticsNonImpact <- unique(biliarySomaticsNonImpact)
biliarySomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(biliarySomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
biliarySomaticsNonImpact <- na.omit(biliarySomaticsNonImpact)
biliarySomaticsNonImpact <- biliarySomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(biliarySomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for biliary tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```


##Waterfall and swim plots with oncoplot
```{r BiliaryWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align='left', fig.height = 4, fig.width = 5.3}
#Waterfalls
biliaryWF <- biliary[,c(1, 12, 13, 17)]

#selecting records for which there are IMPACT genotype data
biliaryWF <- biliaryWF[(biliaryWF$Subject.Identifier.for.the.Study %in% intersect(biliaryWF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

#biliaryWF <- biliaryWF[complete.cases(biliaryWF$Percentage.Change.of.Tumor.Measurement),]
biliaryWF <- biliaryWF[order(-biliaryWF$Percentage.Change.of.Tumor.Measurement),]

biliaryWF$Subject.Identifier.for.the.Study <- as.character(biliaryWF$Subject.Identifier.for.the.Study)

biliaryWF$Best.Overall.Response <- factor(biliaryWF$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"))

g1 <- ggplot(biliaryWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 5), legend.title = element_text(size = 7), legend.position = "top", plot.margin = unit(c(0,0,0,1.6), "cm"))

#+ scale_y_continuous(position = "right")

g2 <- ggplot(biliaryWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(), axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE) 

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r BiliarylOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width= 6.9, fig.height = 6.0, fig.align = 'center'}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
d1 <- setdiff(gsub("\\*", "", colnames(mat)), biliaryWF$Subject.Identifier.for.the.Study)
mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- biliaryWF[,c(1,4)]
df2<- as.data.frame(colnames(mat2))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)


oncoPrint(mat2[c(as.character(df3$subjid))], get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```


<hr />
<hr />  
#Bladder  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r bladderDf, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading bladder enrolment data set
bladder <- readWorksheet(SummitBioMarker0818, sheet = "bladder", endCol = 17, endRow = 17)
bladder[bladder == ""]  <- NA
#tabular views
kable(t(as.matrix(table(bladder$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(bladder$Mutation.code, bladder$Cancer.type), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(bladder$Mutation.code, bladder$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
bladder$Best.Overall.Response <- as.factor(str_sub(bladder$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
bladder$Best.Overall.Response <- factor(bladder$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(bladder$Mutation.code, bladder$Best.Overall.Response))

datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(bladder$Mutation.code, bladder$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

bladder$Clinical.Benefit <- revalue(bladder$Clinical.Benefit, c("pending"="N/A"))

bladder$Clinical.Benefit <- factor(bladder$Clinical.Benefit, levels = c("YES", "NO", "N/A"), ordered = TRUE)

```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r bladderLolliplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
bladder$ErBb2SomaticVarCoord <- as.numeric(str_extract(bladder$Mutation.code, "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
bladder$ErBb2WildTypeAA <- str_sub(bladder$Mutation.code, 1, 1)
bladder$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", bladder$Mutation.code, perl = T)

# reordering the dataframe according to the aa coordinate
bladder <- bladder[order(bladder$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
bladderDf1 <- data.frame(bladder$ErBb2SomaticVarCoord, paste0(bladder$ErBb2WildTypeAA, bladder$ErBb2SomaticVarCoord), bladder$ErBb2SubsAA, bladder$Best.Overall.Response)

#the bladderDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(bladderDf1) <- c("coord", "WTaaCoord", "Var", "Response")

#creating a 2nd df in order to concatenate the substitutions mapping to the same coordinate
bladderDf2 <- as.data.frame(unique(bladderDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))

#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
bladderDf2$SNV <- paste0(bladderDf2$WTaaCoord, bladderDf2$Varalleles)

#adding it fo the initial bladderDf1 df
bladderDf1 <- merge(bladderDf1, bladderDf2[,-c(2,3)], by = "WTaaCoord")

#adding one attribute: concatenation of the clinical response and the variant
bladderDf1$VarRes <- paste0(bladderDf1$Response, bladderDf1$SNV)

# replacing NA with the 'ND' string value for Not Determined
bladderDf1$Response <- gsub("NA", "ND",bladderDf1$Response)

#set order for clinical response value to bottom CR, PR, SD, PD, ND
bladderDf1$Response <- str_replace_all(str_c(bladderDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
bladderDf1 <- bladderDf1[order(bladderDf1$coord, bladderDf1$Response),]

##
bladderDf3 <- bladderDf1
##

bladderDf1 <- unique(ddply(bladderDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
bladderDf1 <- bladderDf1[order(bladderDf1$coord, bladderDf1$Response),]

#instantiating the GRanges object with the coordinates and names of somatic variations
bladder.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(bladderDf1)[1], function(i) rep(bladderDf1$coord[i], bladderDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(bladderDf1)[1], function(i) rep(bladderDf1$SNV[i], bladderDf1$score[i])))))

#adding the stack.factor attribute
bladder.gr$stack.factor <- unlist(sapply(1:dim(bladderDf1)[1], function(i) paste0(bladderDf1$Response[i], "_", seq(1:bladderDf1$score[i]))))
bladder.gr$value1 <- 100
bladder.gr$value2 <- 100 - bladder.gr$value1

bladder.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", bladder.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

bladderDf3 <- ddply(bladderDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
bladderDf3 <- bladderDf3[order(bladderDf3$coord, bladderDf3$Response),]
bladderDf3$idx <- str_replace_all(str_c(bladderDf3$idx), c("10" = "J", "11" = "K", "1" = "A", "2" = "B", "3" = "C", "4" = "D", "5" = "E", "6" = "F", "7" = "G", "8" = "H", "9" = "I"))
bladder.gr$stack.factor <- as.character(bladderDf3$idx)

#plotting
lolliplot(bladder.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7)
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r chunkBladderOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7.1, fig.height = 10.4}
#load co-occurring somatic variants
bladderSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Bladder", endCol = 7, startRow = 21,  endRow = 270)
#IMPACT data
bladderSomaticsImpact <- bladderSomatics[(bladderSomatics$Gene != "not done"),]
#subsetting
bladderSomaticsImpact <- bladderSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
bladderSomaticsImpact <- unique(bladderSomaticsImpact)
#set Subject.id to character
bladderSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(bladderSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
bladderSomaticsImpact <- na.omit(bladderSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
bladderSomaticsImpact <- bladderSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(bladderSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
bladderSomaticsNonImpact <- bladderSomatics[(bladderSomatics$Gene == "not done"),]
bladderSomaticsNonImpact <- bladderSomaticsNonImpact[,c(1,6,7)]
bladderSomaticsNonImpact <- unique(bladderSomaticsNonImpact)
bladderSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(bladderSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
bladderSomaticsNonImpact <- na.omit(bladderSomaticsNonImpact)
bladderSomaticsNonImpact <- bladderSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(bladderSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for bladder tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r BladderWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align='left', fig.height= 4, fig.width= 4.8}
#Waterfalls
bladderWF <- bladder[,c(1, 12, 13, 17)]

#selecting records for which there are IMPACT and Non-IMPACT genotype data
bladderWF <- bladderWF[(bladderWF$Subject.Identifier.for.the.Study %in% intersect(bladderWF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

#bladderWF <- bladderWF[complete.cases(bladderWF$Percentage.Change.of.Tumor.Measurement),]
bladderWF <- bladderWF[order(-bladderWF$Percentage.Change.of.Tumor.Measurement),]

bladderWF$Subject.Identifier.for.the.Study <- as.character(bladderWF$Subject.Identifier.for.the.Study)

g1 <- ggplot(bladderWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top")

g2 <- ggplot(bladderWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(), axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.position = "bottom") + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r bladderOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width= 6.8, fig.height = 10, fig.align = 'center'}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
d1 <- setdiff(gsub("\\*", "", colnames(mat)), bladderWF$Subject.Identifier.for.the.Study)
mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- bladderWF[,c(1,4)]
df2<- as.data.frame(colnames(mat2))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat2 <- mat2[c(as.character(df3$subjid))]
mat2 <- mat2[rowSums(is.na(mat2)) < dim(mat2)[2],]
oncoPrint(mat2, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```
<hr />
<hr />
#Breast mono 
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r breast_mono, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading biliary enrolment data set
Breast_mono <- readWorksheet(SummitBioMarker0818, sheet = "Breast mono", endCol = 18, endRow = 34)
Breast_mono[Breast_mono == ""]  <- NA
#tabular views
kable(t(as.matrix(table(Breast_mono$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(Breast_mono$Mutation.code, Breast_mono$Primary.Cell.Type), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(Breast_mono$Mutation.code, Breast_mono$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
Breast_mono$Best.Overall.Response <- as.factor(str_sub(Breast_mono$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
Breast_mono$Best.Overall.Response <- factor(Breast_mono$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(Breast_mono$Mutation.code, Breast_mono$Best.Overall.Response))
#kable(table(Breast_mono$Mutation.code, Breast_mono$Best.Overall.Response), caption = "ErbB2 mutation by Best Overall Response")
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(Breast_mono$Mutation.code, Breast_mono$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#Breast_mono$Clinical.Benefit <- revalue(Breast_mono$Clinical.Benefit, c("pending"="N/A"))
```
<hr /> 

##Lolliplot: ErBb2 somatic variants annotated with Best Overall Response values  
```{r Breast_monoSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.height = 10, fig.width = 10}
#adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa sequence
Breast_mono$ErBb2SomaticVarCoord <- as.numeric(str_extract(Breast_mono$Mutation.code, "\\d+"))
#adding two additional variables: the wild type aa and the substituted aa
Breast_mono$ErBb2WildTypeAA <- str_sub(Breast_mono$Mutation.code, 1, 1)
Breast_mono$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", Breast_mono$Mutation.code, perl = TRUE)

#reordering the dataframe according to the aa coordinate
Breast_mono <- Breast_mono[order(Breast_mono$ErBb2SomaticVarCoord),]

#setting the L755_E757delinsS variant to a neighbor coordinate for lolliplot rendering
Breast_mono[13, 19] <- 754
#for Subject.Identifier.for.the.Study #1489 with 2 varianst, set the 2nd one to neighbor coordinate for lolliplot rendering
Breast_mono[27, 19] <- 779

#creating the data structure for the GRanges() GenomicRanges package function
Breast_monoDf1 <- data.frame(Breast_mono$ErBb2SomaticVarCoord, paste0(Breast_mono$ErBb2WildTypeAA, Breast_mono$ErBb2SomaticVarCoord), Breast_mono$ErBb2SubsAA, Breast_mono$Best.Overall.Response)
#the Breast_monoDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(Breast_monoDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
Breast_monoDf2 <- as.data.frame(unique(Breast_monoDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
Breast_monoDf2$SNV <- paste0(Breast_monoDf2$WTaaCoord, Breast_monoDf2$Varalleles)
#adding it fo the initial Breast_monoDf1 df
Breast_monoDf1 <- merge(Breast_monoDf1, Breast_monoDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
Breast_monoDf1$VarRes <- paste0(Breast_monoDf1$Response, Breast_monoDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
Breast_monoDf1$Response <- gsub("NA", "ND",Breast_monoDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
Breast_monoDf1$Response <- str_replace_all(str_c(Breast_monoDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
Breast_monoDf1 <- Breast_monoDf1[order(Breast_monoDf1$coord, Breast_monoDf1$Response),]

#duplicated the Breast_monoDf1 Df object for later plotting on the kinase domain 
Breast_monoDf1_2 <- Breast_monoDf1
##
Breast_monoDf3 <- Breast_monoDf1
##

Breast_monoDf1 <- unique(ddply(Breast_monoDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
Breast_monoDf1 <- Breast_monoDf1[order(Breast_monoDf1$coord, Breast_monoDf1$Response),]

#refer to line 497 and 499 respctively
Breast_monoDf1[5, 4] <- "L755_E757delinsS"
Breast_monoDf1[20, 4] <- "V777L and D769Y"


#instantiating the GRanges object with the coordinates and names of somatic variations
Breast_mono.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(Breast_monoDf1)[1], function(i) rep(Breast_monoDf1$coord[i], Breast_monoDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(Breast_monoDf1)[1], function(i) rep(Breast_monoDf1$SNV[i], Breast_monoDf1$score[i])))))

#adding the stack.factor attribute
Breast_mono.gr$stack.factor <- unlist(sapply(1:dim(Breast_monoDf1)[1], function(i) paste0(Breast_monoDf1$Response[i], "_", seq(1:Breast_monoDf1$score[i]))))
Breast_mono.gr$value1 <- 100
Breast_mono.gr$value2 <- 100 - Breast_mono.gr$value1

Breast_mono.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", Breast_mono.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

Breast_monoDf3 <- ddply(Breast_monoDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
Breast_monoDf3 <- Breast_monoDf3[order(Breast_monoDf3$coord, Breast_monoDf3$Response),]
Breast_monoDf3$idx <- str_replace_all(str_c(Breast_monoDf3$idx), c("10" = "J", "1" = "A", "2" = "B", "3" = "C", "4" = "D", "5" = "E", "6" = "F", "7" = "G", "8" = "H", "9" = "I"))
Breast_mono.gr$stack.factor <- as.character(Breast_monoDf3$idx)

Breast_monoDf3$SNPsideID <- c(rep("top", 4), rep("bottom", 1), rep("top", 9), rep("bottom", 3), rep("top", 3), rep("bottom", 1), rep("top", 5), rep("bottom", 1), rep("top", 1), rep("bottom", 1), rep("top", 1), rep("bottom", 1), rep("top", 2))

Breast_mono.gr$SNPsideID <- Breast_monoDf3$SNPsideID

#plotting
lolliplot(Breast_mono.gr, Erbb2Features2, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7)
```


##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r BreastMonoOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7.2, fig.height = 10.3}
#load co-occurring somatic variants
breast_monoSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Breast mono", endCol = 7, startRow = 38,  endRow = 302)
#IMPACT data
breast_monoSomaticsImpact <- breast_monoSomatics[(breast_monoSomatics$Gene != "not done"),]
#subsetting
breast_monoSomaticsImpact <- breast_monoSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
breast_monoSomaticsImpact <- unique(breast_monoSomaticsImpact)
#set Subject.id to character
breast_monoSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(breast_monoSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
breast_monoSomaticsImpact <- na.omit(breast_monoSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
breast_monoSomaticsImpact <- breast_monoSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(breast_monoSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
breast_monoSomaticsNonImpact <- breast_monoSomatics[(breast_monoSomatics$Gene == "not done"),]
breast_monoSomaticsNonImpact <- breast_monoSomaticsNonImpact[,c(1,6,7)]
breast_monoSomaticsNonImpact <- unique(breast_monoSomaticsNonImpact)
breast_monoSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(breast_monoSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
breast_monoSomaticsNonImpact <- na.omit(breast_monoSomaticsNonImpact)
breast_monoSomaticsNonImpact <- breast_monoSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(breast_monoSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]


oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for breast tumor (mono)", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r breast_monoWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align ='left', fig.height = 4, fig.width= 5.2}
#Waterfalls
breast_monoWF <- Breast_mono[-27, c(1, 12, 13, 17)]
#breast_monoWF <- Breast_mono[, c(1, 12, 13, 17)]
#selecting records for which there are IMPACT and Non-IMPACT genotype data
breast_monoWF <- breast_monoWF[(breast_monoWF$Subject.Identifier.for.the.Study %in% intersect(breast_monoWF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

breast_monoWF1 <- breast_monoWF[complete.cases(breast_monoWF$Percentage.Change.of.Tumor.Measurement),]

#breast_monoWF <- breast_monoWF[order(-breast_monoWF$Percentage.Change.of.Tumor.Measurement),]

breast_monoWF1$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(breast_monoWF1$Percentage.Change.of.Tumor.Measurement))

breast_monoWF1$Progression.Free.Survival.Time..Months. <- as.numeric(as.character(breast_monoWF1$Progression.Free.Survival.Time..Months.))

breast_monoWF1 <- breast_monoWF1[order(-breast_monoWF1$Percentage.Change.of.Tumor.Measurement),]

breast_monoWF1$Subject.Identifier.for.the.Study <- as.character(breast_monoWF1$Subject.Identifier.for.the.Study)

breast_monoWF2 <- rbind(breast_monoWF1, breast_monoWF[is.na(breast_monoWF$Percentage.Change.of.Tumor.Measurement),])
breast_monoWF2$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(breast_monoWF2$Percentage.Change.of.Tumor.Measurement))

breast_monoWF2$Subject.Identifier.for.the.Study <- as.character(breast_monoWF2$Subject.Identifier.for.the.Study)

g1 <- ggplot(breast_monoWF2, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-101, 101) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top") 

g2 <- ggplot(breast_monoWF2, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(), axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r breast_monoOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7, fig.height = 10.1, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
breast_monoWF2 <- breast_monoWF2[order(-breast_monoWF2$Percentage.Change.of.Tumor.Measurement, breast_monoWF2$Subject.Identifier.for.the.Study),]

d1 <- setdiff(gsub("\\*", "", colnames(mat)), breast_monoWF2$Subject.Identifier.for.the.Study)
d1 <- sapply(d1, function(x) paste("^",x, "$", sep = ""))
mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- breast_monoWF2[,c(1,4)]
df2<- as.data.frame(colnames(mat2))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat2 <- mat2[,c(as.character(df3$subjid))]
mat2 <- mat2[rowSums(is.na(mat2)) < dim(mat2)[2],]

oncoPrint(mat2, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations",at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```

<hr/><hr />
#Breast combo 
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r Breast_combo, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading biliary enrolment data set
Breast_combo <- readWorksheet(SummitBioMarker0818, sheet = "Breast combo", endCol = 18, endRow = 37)
Breast_combo[Breast_combo == ""]  <- NA

Breast_combo[Breast_combo == "pending"]  <- "NA"
#removing trailing white space
Breast_combo <- as.data.frame(apply(Breast_combo, 2, function(x) trimws(x)))
#tabular views
kable(t(as.matrix(table(Breast_combo$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(Breast_combo$Mutation.code, Breast_combo$Primary.Cell.Type), caption = "ErbB2 mutation by Primary Cell Type")
#kable(table(Breast_combo$Mutation.code, Breast_combo$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
Breast_combo$Best.Overall.Response <- as.factor(str_sub(Breast_combo$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
Breast_combo$Best.Overall.Response <- factor(Breast_combo$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(Breast_combo$Mutation.code, Breast_combo$Best.Overall.Response))
#kable(table(Breast_combo$Mutation.code, Breast_combo$Best.Overall.Response), caption = "ErbB2 mutation by Best Overall Response")
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(Breast_combo$Mutation.code, Breast_combo$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#Breast_combo$Clinical.Benefit <- revalue(Breast_combo$Clinical.Benefit, c("pending" = "N/A", "n/a" = "N/A"))

#assessing the potential contribution of ErbB2 somatic variant on Best Overall Response
##chisq.test(table(na.omit(Breast_combo$Clinical.Benefit, Breast_combo$Mutation.code)))
```
<hr /> 

##Lolliplot: ErBb2 somatic variants annotated with Best Overall Response values  
```{r Breast_comboSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.height = 10, fig.width = 10}
#adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa sequence
Breast_combo$ErBb2SomaticVarCoord <- as.numeric(str_extract(Breast_combo$Mutation.code, "\\d+"))
#adding two additional variables: the wild type aa and the substituted aa
Breast_combo$ErBb2WildTypeAA <- str_sub(Breast_combo$Mutation.code, 1, 1)
Breast_combo$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", Breast_combo$Mutation.code, perl = TRUE)

#reordering the dataframe according to the aa coordinate
Breast_combo <- Breast_combo[order(Breast_combo$ErBb2SomaticVarCoord),]

#to show the 2nd variant
Breast_combo[27,19] <- 779

#creating the data structure for the GRanges() GenomicRanges package function
Breast_comboDf1 <- data.frame(Breast_combo$ErBb2SomaticVarCoord, paste0(Breast_combo$ErBb2WildTypeAA, Breast_combo$ErBb2SomaticVarCoord), Breast_combo$ErBb2SubsAA, Breast_combo$Best.Overall.Response)
#the Breast_comboDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(Breast_comboDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
Breast_comboDf2 <- as.data.frame(unique(Breast_comboDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
Breast_comboDf2$SNV <- paste0(Breast_comboDf2$WTaaCoord, Breast_comboDf2$Varalleles)
#adding it fo the initial Breast_comboDf1 df
Breast_comboDf1 <- merge(Breast_comboDf1, Breast_comboDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
Breast_comboDf1$VarRes <- paste0(Breast_comboDf1$Response, Breast_comboDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
Breast_comboDf1$Response <- gsub("NA", "ND",Breast_comboDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
Breast_comboDf1$Response <- str_replace_all(str_c(Breast_comboDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
Breast_comboDf1 <- Breast_comboDf1[order(Breast_comboDf1$coord, Breast_comboDf1$Response),]

#duplicated the Breast_comboDf1 Df object for later plotting on the kinase domain 
Breast_comboDf1_2 <- Breast_comboDf1

##
Breast_comboDf3 <- Breast_comboDf1
##

Breast_comboDf1 <- unique(ddply(Breast_comboDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
Breast_comboDf1 <- Breast_comboDf1[order(Breast_comboDf1$coord, Breast_comboDf1$Response),]

Breast_comboDf1[25,4] <- "V777L and L755S"

#instantiating the GRanges object with the coordinates and names of somatic variations
Breast_combo.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(Breast_comboDf1)[1], function(i) rep(Breast_comboDf1$coord[i], Breast_comboDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(Breast_comboDf1)[1], function(i) rep(Breast_comboDf1$SNV[i], Breast_comboDf1$score[i])))))

#adding the stack.factor attribute
Breast_combo.gr$stack.factor <- unlist(sapply(1:dim(Breast_comboDf1)[1], function(i) paste0(Breast_comboDf1$Response[i], "_", seq(1:Breast_comboDf1$score[i]))))
Breast_combo.gr$value1 <- 100
Breast_combo.gr$value2 <- 100 - Breast_combo.gr$value1

Breast_combo.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", Breast_combo.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

Breast_comboDf3 <- ddply(Breast_comboDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
Breast_comboDf3 <- Breast_comboDf3[order(Breast_comboDf3$coord, Breast_comboDf3$Response),]
Breast_combo.gr$stack.factor <- as.character(Breast_comboDf3$idx)

Breast_comboDf3$SNPsideID <- c(rep("top", 9), rep("bottom", 1), rep("top", 9), rep("bottom", 1), rep("top", 2), rep("bottom", 1), rep("top", 1), rep("bottom", 5), rep("top", 4), rep("bottom", 1), rep("top", 1), rep("bottom", 1))

Breast_combo.gr$SNPsideID <- Breast_comboDf3$SNPsideID


#plotting
lolliplot(Breast_combo.gr, Erbb2Features2, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")
```


##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r chunkBreastComboOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7.3, fig.height = 10}
#load co-occurring somatic variants
breast_comboSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Breast combo", endCol = 7, startRow = 42,  endRow = 320)
#IMPACT data
breast_comboSomaticsImpact <- breast_comboSomatics[(breast_comboSomatics$Gene != "not done"),]
#subsetting
breast_comboSomaticsImpact <- breast_comboSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
breast_comboSomaticsImpact <- unique(breast_comboSomaticsImpact)
#set Subject.id to character
breast_comboSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(breast_comboSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
breast_comboSomaticsImpact <- na.omit(breast_comboSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
breast_comboSomaticsImpact <- breast_comboSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(breast_comboSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
breast_comboSomaticsNonImpact <- breast_comboSomatics[(breast_comboSomatics$Gene == "not done"),]
breast_comboSomaticsNonImpact <- breast_comboSomaticsNonImpact[,c(1,6,7)]
breast_comboSomaticsNonImpact <- unique(breast_comboSomaticsNonImpact)
breast_comboSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(breast_comboSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
breast_comboSomaticsNonImpact <- na.omit(breast_comboSomaticsNonImpact)
breast_comboSomaticsNonImpact <- breast_comboSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(breast_comboSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for breast tumor (combo)", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r breast_comboWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align ='left', fig.height = 4, fig.width = 5.15}
#Waterfalls
breast_comboWF <- Breast_combo[-27,c(1, 12, 13, 17)]

#selecting records for which there are IMPACT and Non-IMPACT genotype data
breast_comboWF <- breast_comboWF[(breast_comboWF$Subject.Identifier.for.the.Study %in% intersect(breast_comboWF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

breast_comboWF1 <- breast_comboWF[complete.cases(breast_comboWF$Percentage.Change.of.Tumor.Measurement),]

breast_comboWF1$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(breast_comboWF1$Percentage.Change.of.Tumor.Measurement))

breast_comboWF1$Progression.Free.Survival.Time..Months. <- as.numeric(as.character(breast_comboWF1$Progression.Free.Survival.Time..Months.))

breast_comboWF1 <- breast_comboWF1[order(-breast_comboWF1$Percentage.Change.of.Tumor.Measurement),]

breast_comboWF1$Subject.Identifier.for.the.Study <- as.character(breast_comboWF1$Subject.Identifier.for.the.Study)

breast_comboWF2 <- rbind(breast_comboWF1, breast_comboWF[is.na(breast_comboWF$Percentage.Change.of.Tumor.Measurement),])
breast_comboWF2$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(breast_comboWF2$Percentage.Change.of.Tumor.Measurement))
breast_comboWF2$Progression.Free.Survival.Time..Months. <- as.numeric(breast_comboWF2$Progression.Free.Survival.Time..Months.)
g1 <- ggplot(breast_comboWF2, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top") 

g2 <- ggplot(breast_comboWF2, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(), axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r breast_comboOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 6.9, fig.height = 10.1, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
breast_comboWF2 <- breast_comboWF2[order(-breast_comboWF2$Percentage.Change.of.Tumor.Measurement, breast_comboWF2$Subject.Identifier.for.the.Study),]

d1 <- setdiff(gsub("\\*", "", colnames(mat)), breast_comboWF2$Subject.Identifier.for.the.Study)
d1 <- sapply(d1, function(x) paste("^",x, "$", sep = ""))
mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- breast_comboWF2[,c(1,4)]
df2<- as.data.frame(colnames(mat2))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat2 <- mat2[c(as.character(df3$subjid))]
mat2 <- mat2[rowSums(is.na(mat2)) < dim(mat2)[2],]

oncoPrint(mat2, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```

<hr /><hr />  
#Cervical  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r chunk9, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading cervical enrolment data set
cervical <- readWorksheet(SummitBioMarker0818, sheet = "cervical", endCol = 17, endRow = 7)
cervical[cervical == ""]  <- NA
cervical[cervical == "pending"]  <- "NA"
#tabular views
kable(t(as.matrix(table(cervical$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(cervical$Mutation.code, cervical$Primary.Cell.Type
), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(cervical$Mutation.code, cervical$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
cervical$Best.Overall.Response <- as.factor(str_sub(cervical$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
cervical$Best.Overall.Response <- factor(cervical$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(cervical$Mutation.code, cervical$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")
#plotting

ClinicalBenefitPerVar <- as.data.frame.matrix(table(cervical$Mutation.code, cervical$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#cervical$Clinical.Benefit <- revalue(cervical$Clinical.Benefit, c("pending"="N/A"))
```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r chunk10, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
cervical$ErBb2SomaticVarCoord <- as.numeric(str_extract(cervical$Mutation.code, "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
cervical$ErBb2WildTypeAA <- str_sub(cervical$Mutation.code, 1, 1)
cervical$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", cervical$Mutation.code, 
    perl = T)

# reordering the dataframe according to the aa coordinate
cervical <- cervical[order(cervical$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
cervicalDf1 <- data.frame(cervical$ErBb2SomaticVarCoord, paste0(cervical$ErBb2WildTypeAA, cervical$ErBb2SomaticVarCoord), cervical$ErBb2SubsAA, cervical$Best.Overall.Response)
#the cervicalDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(cervicalDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
cervicalDf2 <- as.data.frame(unique(cervicalDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
cervicalDf2$SNV <- paste0(cervicalDf2$WTaaCoord, cervicalDf2$Varalleles)
#adding it fo the initial cervicalDf1 df
cervicalDf1 <- merge(cervicalDf1, cervicalDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
cervicalDf1$VarRes <- paste0(cervicalDf1$Response, cervicalDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
cervicalDf1$Response <- gsub("NA", "ND",cervicalDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
cervicalDf1$Response <- str_replace_all(str_c(cervicalDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
cervicalDf1 <- cervicalDf1[order(cervicalDf1$coord, cervicalDf1$Response),]

##
cervicalDf3 <- cervicalDf1
##

cervicalDf1 <- unique(ddply(cervicalDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
cervicalDf1 <- cervicalDf1[order(cervicalDf1$coord, cervicalDf1$Response),]

#instantiating the GRanges object with the coordinates and names of somatic variations
cervical.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(cervicalDf1)[1], function(i) rep(cervicalDf1$coord[i], cervicalDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(cervicalDf1)[1], function(i) rep(cervicalDf1$SNV[i], cervicalDf1$score[i])))))

#adding the stack.factor attribute
cervical.gr$stack.factor <- unlist(sapply(1:dim(cervicalDf1)[1], function(i) paste0(cervicalDf1$Response[i], "_", seq(1:cervicalDf1$score[i]))))
cervical.gr$value1 <- 100
cervical.gr$value2 <- 100 - cervical.gr$value1

cervical.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", cervical.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

cervicalDf3 <- ddply(cervicalDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
cervicalDf3 <- cervicalDf3[order(cervicalDf3$coord, cervicalDf3$Response),]
cervical.gr$stack.factor <- as.character(cervicalDf3$idx)

#plotting
lolliplot(cervical.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")

```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r CervicalOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#load co-occurring somatic variants
CervicalSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Cervical", endCol = 7, startRow = 10,  endRow = 52)
#IMPACT data
CervicalSomaticsImpact <- CervicalSomatics[(CervicalSomatics$Gene != "not done"),]
#subsetting
CervicalSomaticsImpact <- CervicalSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
CervicalSomaticsImpact <- unique(CervicalSomaticsImpact)
#set Subject.id to character
CervicalSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(CervicalSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
CervicalSomaticsImpact <- na.omit(CervicalSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
CervicalSomaticsImpact <- CervicalSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(CervicalSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
CervicalSomaticsNonImpact <- CervicalSomatics[(CervicalSomatics$Gene == "not done"),]
CervicalSomaticsNonImpact <- CervicalSomaticsNonImpact[,c(1,6,7)]
CervicalSomaticsNonImpact <- unique(CervicalSomaticsNonImpact)
CervicalSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(CervicalSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
CervicalSomaticsNonImpact <- na.omit(CervicalSomaticsNonImpact)
CervicalSomaticsNonImpact <- CervicalSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(CervicalSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for cervical tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```
  
##Waterfall and swim plots with oncoplot
```{r cervicalWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align = 'left', fig.height= 4, fig.width= 5.1}
#Waterfalls
cervicalWF <- cervical[,c(1, 12, 13, 17)]

#selecting records for which there are IMPACT and Non-IMPACT genotype data
cervicalWF <- cervicalWF[(cervicalWF$Subject.Identifier.for.the.Study %in% intersect(cervicalWF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

#cervicalWF <- cervicalWF[complete.cases(cervicalWF$Percentage.Change.of.Tumor.Measurement),]
cervicalWF$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(cervicalWF$Percentage.Change.of.Tumor.Measurement))
cervicalWF <- cervicalWF[order(-cervicalWF$Percentage.Change.of.Tumor.Measurement),]

cervicalWF$Subject.Identifier.for.the.Study <- as.character(cervicalWF$Subject.Identifier.for.the.Study)

g1 <- ggplot(cervicalWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top")

g2 <- ggplot(cervicalWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6),  axis.text.x = element_text(size = 6, angle = 90, hjust = 1)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r cervicalOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 6.8, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
##d1 <- setdiff(gsub("\\*", "", colnames(mat)), cervicalWF$Subject.Identifier.for.the.Study)
##mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- cervicalWF[,c(1,4)]
df2 <- as.data.frame(colnames(mat))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat <- mat[c(as.character(df3$subjid))]
mat <- mat[rowSums(is.na(mat)) < dim(mat)[2],]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```  
<hr /><hr />

#colorectal  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r chunk_colorectal, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading colorectal enrolment data set
colorectal <- readWorksheet(SummitBioMarker0818, sheet = "Colorectal", endCol = 17, endRow = 13)
colorectal[colorectal == ""]  <- NA
#tabular views
kable(t(as.matrix(table(colorectal$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(colorectal$Mutation.code, colorectal$Primary.Cell.Type
), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(colorectal$Mutation.code, colorectal$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
colorectal$Best.Overall.Response <- as.factor(str_sub(colorectal$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
colorectal$Best.Overall.Response <- factor(colorectal$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(colorectal$Mutation.code, colorectal$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(colorectal$Mutation.code, colorectal$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#colorectal$Clinical.Benefit <- revalue(colorectal$Clinical.Benefit, c("pending"="N/A"))

colorectal$Clinical.Benefit <- factor(colorectal$Clinical.Benefit, levels = c("YES", "NO", "N/A"), ordered = TRUE)
```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r chunk_colorectalSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
colorectal$ErBb2SomaticVarCoord <- as.numeric(str_extract(colorectal$Mutation.code, "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
colorectal$ErBb2WildTypeAA <- str_sub(colorectal$Mutation.code, 1, 1)
colorectal$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", colorectal$Mutation.code, perl = T)

# reordering the dataframe according to the aa coordinate
colorectal <- colorectal[order(colorectal$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
colorectalDf1 <- data.frame(colorectal$ErBb2SomaticVarCoord, paste0(colorectal$ErBb2WildTypeAA, colorectal$ErBb2SomaticVarCoord), colorectal$ErBb2SubsAA, colorectal$Best.Overall.Response)

#the colorectalDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(colorectalDf1) <- c("coord", "WTaaCoord", "Var", "Response")

#creating a 2nd df in order to concatenate the substitutions mapping to the same coordinate
colorectalDf2 <- as.data.frame(unique(colorectalDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))

#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
colorectalDf2$SNV <- paste0(colorectalDf2$WTaaCoord, colorectalDf2$Varalleles)

#adding it fo the initial colorectalDf1 df
colorectalDf1 <- merge(colorectalDf1, colorectalDf2[,-c(2,3)], by = "WTaaCoord")

#adding one attribute: concatenation of the clinical response and the variant
colorectalDf1$VarRes <- paste0(colorectalDf1$Response, colorectalDf1$SNV)

# replacing NA with the 'ND' string value for Not Determined
colorectalDf1$Response <- gsub("NA", "ND",colorectalDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
colorectalDf1$Response <- str_replace_all(str_c(colorectalDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
colorectalDf1 <- colorectalDf1[order(colorectalDf1$coord, colorectalDf1$Response),]

##
colorectalDf3 <- colorectalDf1
##

colorectalDf1 <- unique(ddply(colorectalDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
colorectalDf1 <- colorectalDf1[order(colorectalDf1$coord, colorectalDf1$Response),]

#instantiating the GRanges object with the coordinates and names of somatic variations
colorectal.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(colorectalDf1)[1], function(i) rep(colorectalDf1$coord[i], colorectalDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(colorectalDf1)[1], function(i) rep(colorectalDf1$SNV[i], colorectalDf1$score[i])))))

#adding the stack.factor attribute
colorectal.gr$stack.factor <- unlist(sapply(1:dim(colorectalDf1)[1], function(i) paste0(colorectalDf1$Response[i], "_", seq(1:colorectalDf1$score[i]))))
colorectal.gr$value1 <- 100
colorectal.gr$value2 <- 100 - colorectal.gr$value1

colorectal.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", colorectal.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

colorectalDf3 <- ddply(colorectalDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
colorectalDf3 <- colorectalDf3[order(colorectalDf3$coord, colorectalDf3$Response),]
colorectal.gr$stack.factor <- as.character(colorectalDf3$idx)

#plotting
lolliplot(colorectal.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r chunkColorectalOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7, fig.height = 8}
#load co-occurring somatic variants
ColorectalSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Colorectal", endCol = 7, startRow = 18,  endRow = 118)
#IMPACT data
ColorectalSomaticsImpact <- ColorectalSomatics[(ColorectalSomatics$Gene != "not done"),]
#subsetting
ColorectalSomaticsImpact <- ColorectalSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
ColorectalSomaticsImpact <- unique(ColorectalSomaticsImpact)
#set Subject.id to character
ColorectalSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(ColorectalSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
ColorectalSomaticsImpact <- na.omit(ColorectalSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
ColorectalSomaticsImpact <- ColorectalSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(ColorectalSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]


ColorectalSomaticsNonImpact <- ColorectalSomatics[(ColorectalSomatics$Gene == "not done"),]
ColorectalSomaticsNonImpact <- ColorectalSomaticsNonImpact[,c(1,6,7)]
ColorectalSomaticsNonImpact <- unique(ColorectalSomaticsNonImpact)
ColorectalSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(ColorectalSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
ColorectalSomaticsNonImpact <- na.omit(ColorectalSomaticsNonImpact)
ColorectalSomaticsNonImpact <- ColorectalSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(ColorectalSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- as.data.frame(mat2[,-1])
row.names(mat2) <-"ERBB2"
colnames(mat2) <- "1150*"


mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for colorectal tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp", "fusionAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp", "Fusion & Amp")), show_column_names = TRUE)
```
  
##Waterfall and swim plots with oncoplot
```{r colorectalWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align = 'left', fig.height = 4, fig.width = 5.1}
#Waterfalls
colorectalWF <- colorectal[,c(1, 12, 13, 17)]

#colorectalWF <- colorectalWF[complete.cases(colorectalWF$Percentage.Change.of.Tumor.Measurement),]
colorectalWF <- colorectalWF[order(-colorectalWF$Percentage.Change.of.Tumor.Measurement),]

colorectalWF$Subject.Identifier.for.the.Study <- as.character(colorectalWF$Subject.Identifier.for.the.Study)

g1 <- ggplot(colorectalWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.8), "cm"), legend.position = "top") 

g2 <- ggplot(colorectalWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(), axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r colorectalOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7.1, fig.align = "center", fig.height = 8}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
##d1 <- setdiff(gsub("\\*", "", colnames(mat)), colorectalWF$Subject.Identifier.for.the.Study)
##mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
##mat2 <- as.data.frame(mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)])
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- colorectalWF[,c(1,4)]
df2<- as.data.frame(colnames(mat))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat <- mat[c(as.character(df3$subjid))]
mat <- mat[rowSums(is.na(mat)) < dim(mat)[2], ]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp", "fusionAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp", "Fusion & Amp")), show_column_names = TRUE, column_order = NULL)
```

<hr /><hr />

#Endometrial  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r chunk_endometrial, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading endometrial enrolment data set
endometrial <- readWorksheet(SummitBioMarker0818, sheet = "Endometrial", endCol = 17, endRow = 9)
endometrial[endometrial == ""]  <- NA
#tabular views
kable(t(as.matrix(table(endometrial$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(endometrial$Mutation.code, endometrial$Primary.Cell.Type
), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(endometrial$Mutation.code, endometrial$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
endometrial$Best.Overall.Response <- as.factor(str_sub(endometrial$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
endometrial$Best.Overall.Response <- factor(endometrial$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(endometrial$Mutation.code, endometrial$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(endometrial$Mutation.code, endometrial$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#endometrial$Clinical.Benefit <- revalue(endometrial$Clinical.Benefit, c("pending"="N/A"))

endometrial$Clinical.Benefit <- factor(endometrial$Clinical.Benefit, levels = c("YES", "NO", "N/A"), ordered = TRUE)
```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r chunk_endometrialSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
endometrial$ErBb2SomaticVarCoord <- as.numeric(str_extract(endometrial$Mutation.code, 
    "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
endometrial$ErBb2WildTypeAA <- str_sub(endometrial$Mutation.code, 1, 1)
endometrial$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", endometrial$Mutation.code, perl = T)

# reordering the dataframe according to the aa coordinate
endometrial <- endometrial[order(endometrial$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
endometrialDf1 <- data.frame(endometrial$ErBb2SomaticVarCoord, paste0(endometrial$ErBb2WildTypeAA, endometrial$ErBb2SomaticVarCoord), endometrial$ErBb2SubsAA, endometrial$Best.Overall.Response)
#the endometrialDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(endometrialDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
endometrialDf2 <- as.data.frame(unique(endometrialDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
endometrialDf2$SNV <- paste0(endometrialDf2$WTaaCoord, endometrialDf2$Varalleles)
#adding it fo the initial endometrialDf1 df
endometrialDf1 <- merge(endometrialDf1, endometrialDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
endometrialDf1$VarRes <- paste0(endometrialDf1$Response, endometrialDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
endometrialDf1$Response <- gsub("NA", "ND",endometrialDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
endometrialDf1$Response <- str_replace_all(str_c(endometrialDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
endometrialDf1 <- endometrialDf1[order(endometrialDf1$coord, endometrialDf1$Response),]

##
endometrialDf3 <- endometrialDf1
##

endometrialDf1 <- unique(ddply(endometrialDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
endometrialDf1 <- endometrialDf1[order(endometrialDf1$coord, endometrialDf1$Response),]

#instantiating the GRanges object with the coordinates and names of somatic variations
endometrial.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(endometrialDf1)[1], function(i) rep(endometrialDf1$coord[i], endometrialDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(endometrialDf1)[1], function(i) rep(endometrialDf1$SNV[i], endometrialDf1$score[i])))))

#adding the stack.factor attribute
endometrial.gr$stack.factor <- unlist(sapply(1:dim(endometrialDf1)[1], function(i) paste0(endometrialDf1$Response[i], "_", seq(1:endometrialDf1$score[i]))))
endometrial.gr$value1 <- 100
endometrial.gr$value2 <- 100 - endometrial.gr$value1

endometrial.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", endometrial.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

endometrialDf3 <- ddply(endometrialDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
endometrialDf3 <- endometrialDf3[order(endometrialDf3$coord, endometrialDf3$Response),]
endometrial.gr$stack.factor <- as.character(endometrialDf3$idx)

#plotting
lolliplot(endometrial.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r chunkEndometrialOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#load co-occurring somatic variants
endometrialSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Endometrial", endCol = 7, startRow = 11,  endRow = 66)
#IMPACT data
endometrialSomaticsImpact <- endometrialSomatics[(endometrialSomatics$Gene != "not done"),]
#subsetting
endometrialSomaticsImpact <- endometrialSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
endometrialSomaticsImpact <- unique(endometrialSomaticsImpact)
#set Subject.id to character
endometrialSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(endometrialSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
endometrialSomaticsImpact <- na.omit(endometrialSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
endometrialSomaticsImpact <- endometrialSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(endometrialSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
endometrialSomaticsNonImpact <- endometrialSomatics[(endometrialSomatics$Gene == "not done"),]
endometrialSomaticsNonImpact <- endometrialSomaticsNonImpact[,c(1,6,7)]
endometrialSomaticsNonImpact <- unique(endometrialSomaticsNonImpact)
endometrialSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(endometrialSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
endometrialSomaticsNonImpact <- na.omit(endometrialSomaticsNonImpact)
endometrialSomaticsNonImpact <- endometrialSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(endometrialSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for endometrial tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r endometrialyWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align='left', fig.height= 4, fig.width= 4.8}
#Waterfalls
endometrialWF <- endometrial[,c(1, 12, 13, 17)]

#endometrialWF <- endometrialWF[complete.cases(endometrialWF$Percentage.Change.of.Tumor.Measurement),]
endometrialWF <- endometrialWF[order(-endometrialWF$Percentage.Change.of.Tumor.Measurement),]

endometrialWF$Subject.Identifier.for.the.Study <- as.character(endometrialWF$Subject.Identifier.for.the.Study)

g1 <- ggplot(endometrialWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 201) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top") 

g2 <- ggplot(endometrialWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(),  axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r endometrialOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width= 6.8, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
##d1 <- setdiff(gsub("\\*", "", colnames(mat)), endometrialWF$Subject.Identifier.for.the.Study)
##mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- endometrialWF[,c(1,4)]
df2<- as.data.frame(colnames(mat))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat <- mat[c(as.character(df3$subjid))]
mat <- mat[rowSums(is.na(mat)) < dim(mat)[2], ]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```
  
<hr />  
#Gastroesophageal  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r chunk_gastroesophageal, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading gastroesophageal enrolment data set
gastroesophageal <- readWorksheet(SummitBioMarker0818, sheet = "Gastroesophageal", endCol = 17, endRow = 7)
gastroesophageal[gastroesophageal == ""]  <- NA
#tabular views
kable(t(as.matrix(table(gastroesophageal$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(gastroesophageal$Mutation.code, gastroesophageal$Primary.Cell.Type
), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(gastroesophageal$Mutation.code, gastroesophageal$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
gastroesophageal$Best.Overall.Response <- as.factor(str_sub(gastroesophageal$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
gastroesophageal$Best.Overall.Response <- factor(gastroesophageal$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(gastroesophageal$Mutation.code, gastroesophageal$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(gastroesophageal$Mutation.code, gastroesophageal$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#gastroesophageal$Clinical.Benefit <- revalue(gastroesophageal$Clinical.Benefit, c("pending"="N/A"))

gastroesophageal$Clinical.Benefit <- factor(gastroesophageal$Clinical.Benefit, levels = c("YES", "NO", "N/A"), ordered = TRUE)
```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r chunk_gastroesophagealSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
gastroesophageal$ErBb2SomaticVarCoord <- as.numeric(str_extract(gastroesophageal$Mutation.code, 
    "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
gastroesophageal$ErBb2WildTypeAA <- str_sub(gastroesophageal$Mutation.code, 1, 1)
gastroesophageal$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", gastroesophageal$Mutation.code, 
    perl = T)

# reordering the dataframe according to the aa coordinate
gastroesophageal <- gastroesophageal[order(gastroesophageal$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
gastroesophagealDf1 <- data.frame(gastroesophageal$ErBb2SomaticVarCoord, paste0(gastroesophageal$ErBb2WildTypeAA, gastroesophageal$ErBb2SomaticVarCoord), gastroesophageal$ErBb2SubsAA, gastroesophageal$Best.Overall.Response)
#the gastroesophagealDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(gastroesophagealDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
gastroesophagealDf2 <- as.data.frame(unique(gastroesophagealDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
gastroesophagealDf2$SNV <- paste0(gastroesophagealDf2$WTaaCoord, gastroesophagealDf2$Varalleles)
#adding it fo the initial gastroesophagealDf1 df
gastroesophagealDf1 <- merge(gastroesophagealDf1, gastroesophagealDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
gastroesophagealDf1$VarRes <- paste0(gastroesophagealDf1$Response, gastroesophagealDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
gastroesophagealDf1$Response <- gsub("NA", "ND",gastroesophagealDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
gastroesophagealDf1$Response <- str_replace_all(str_c(gastroesophagealDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
gastroesophagealDf1 <- gastroesophagealDf1[order(gastroesophagealDf1$coord, gastroesophagealDf1$Response),]

##
gastroesophagealDf3 <- gastroesophagealDf1
##

gastroesophagealDf1 <- unique(ddply(gastroesophagealDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
gastroesophagealDf1 <- gastroesophagealDf1[order(gastroesophagealDf1$coord, gastroesophagealDf1$Response),]

#instantiating the GRanges object with the coordinates and names of somatic variations
gastroesophageal.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(gastroesophagealDf1)[1], function(i) rep(gastroesophagealDf1$coord[i], gastroesophagealDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(gastroesophagealDf1)[1], function(i) rep(gastroesophagealDf1$SNV[i], gastroesophagealDf1$score[i])))))

#adding the stack.factor attribute
gastroesophageal.gr$stack.factor <- unlist(sapply(1:dim(gastroesophagealDf1)[1], function(i) paste0(gastroesophagealDf1$Response[i], "_", seq(1:gastroesophagealDf1$score[i]))))
gastroesophageal.gr$value1 <- 100
gastroesophageal.gr$value2 <- 100 - gastroesophageal.gr$value1
#setting the Best Overall Response legend
response <- c("ND", "PD", "SD", "PR", "CR")
response.color.set <- as.list(as.data.frame(rbind(c("gray", "black", "yellow", "green", "red"), "#FFFFFFFF"), stringsAsFactors = FALSE))
names(response.color.set) <- response
gastroesophageal.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", gastroesophageal.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

gastroesophagealDf3 <- ddply(gastroesophagealDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
gastroesophagealDf3 <- gastroesophagealDf3[order(gastroesophagealDf3$coord, gastroesophagealDf3$Response),]
gastroesophageal.gr$stack.factor <- as.character(gastroesophagealDf3$idx)

#plotting
lolliplot(gastroesophageal.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r chunkGastroOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#load co-occurring somatic variants
gastroesophagealSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Gastroesophageal", endCol = 7, startRow = 9,  endRow = 75)
#IMPACT data
gastroesophagealSomaticsImpact <- gastroesophagealSomatics[(gastroesophagealSomatics$Gene != "not done"),]
#subsetting
gastroesophagealSomaticsImpact <- gastroesophagealSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
gastroesophagealSomaticsImpact <- unique(gastroesophagealSomaticsImpact)
#set Subject.id to character
gastroesophagealSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(gastroesophagealSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
gastroesophagealSomaticsImpact <- na.omit(gastroesophagealSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
gastroesophagealSomaticsImpact <- gastroesophagealSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(gastroesophagealSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
##gastroesophagealSomaticsNonImpact <- gastroesophagealSomatics[(gastroesophagealSomatics$Gene == "not done"),]
##gastroesophagealSomaticsNonImpact <- gastroesophagealSomaticsNonImpact[,c(1,6,7)]
##gastroesophagealSomaticsNonImpact <- unique(gastroesophagealSomaticsNonImpact)
##gastroesophagealSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(gastroesophagealSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
##gastroesophagealSomaticsNonImpact <- na.omit(gastroesophagealSomaticsNonImpact)
##gastroesophagealSomaticsNonImpact <- gastroesophagealSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
##mat2 <- as.data.frame(spread(gastroesophagealSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
##mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
##row.names(mat2) <- mat2$Gene.1
##mat2 <- mat2[,-1]

##mat <- merge(mat, mat2, by = "row.names", all = T)
##row.names(mat) <- mat$Row.names
##mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for gastroesophageal tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r gastroesophagealWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align ='left', fig.height = 4, fig.width = 5.1}
#Waterfalls
gastroesophagealWF <- gastroesophageal[,c(1, 12, 13, 17)]

gastroesophagealWF <- gastroesophagealWF[complete.cases(gastroesophagealWF$Percentage.Change.of.Tumor.Measurement),]
gastroesophagealWF <- gastroesophagealWF[order(-gastroesophagealWF$Percentage.Change.of.Tumor.Measurement),]

gastroesophagealWF$Subject.Identifier.for.the.Study <- as.character(gastroesophagealWF$Subject.Identifier.for.the.Study)

g1 <- ggplot(gastroesophagealWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top")

g2 <- ggplot(gastroesophagealWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(),  axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r gastroesophagealOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width= 6.8, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
##d1 <- setdiff(gsub("\\*", "", colnames(mat)), gastroesophagealWF$Subject.Identifier.for.the.Study)
##mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
##df1 <- gastroesophagealWF[,c(1,4)]
##df2<- as.data.frame(colnames(mat2))
##colnames(df2) <- "subjid"
##df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
##df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

##mat2 <- mat2[c(as.character(df3$subjid))]
##mat2 <- mat2[rowSums(is.na(mat2)) < dim(mat2)[2],]
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```

<hr /><hr />

#Lung  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r lung, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading lung enrolment data set
lung <- readWorksheet(SummitBioMarker0818, sheet = "Lung", endCol = 17, endRow = 28)
lung[lung == ""]  <- NA
#tabular views
kable(t(as.matrix(table(lung$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(lung$Mutation.code, lung$Primary.Cell.Type
), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(lung$Mutation.code, lung$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
lung$Best.Overall.Response <- as.factor(str_sub(lung$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
lung$Best.Overall.Response <- factor(lung$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(lung$Mutation.code, lung$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(lung$Mutation.code, lung$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#lung$Clinical.Benefit <- revalue(lung$Clinical.Benefit, c("pending"="N/A"))

#assessing the potential contribution of ErbB2 somatic variant on Best Overall Response
##chisq.test(table(na.omit(lung$Clinical.Benefit, lung$Mutation.code)))
```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r lungSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
lung$ErBb2SomaticVarCoord <- as.numeric(str_extract(lung$Mutation.code, "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
lung$ErBb2WildTypeAA <- str_sub(lung$Mutation.code, 1, 1)
lung$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", lung$Mutation.code, perl = T)

# reordering the dataframe according to the aa coordinate
lung <- lung[order(lung$ErBb2SomaticVarCoord), ]

# to show the 2nd variant
lung[4, 18] <- 259

#creating the data structure for the GRanges() GenomicRanges package function
lungDf1 <- data.frame(lung$ErBb2SomaticVarCoord, paste0(lung$ErBb2WildTypeAA, lung$ErBb2SomaticVarCoord), lung$ErBb2SubsAA, lung$Best.Overall.Response)
#the lungDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(lungDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
lungDf2 <- as.data.frame(unique(lungDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
lungDf2$SNV <- paste0(lungDf2$WTaaCoord, lungDf2$Varalleles)
#adding it fo the initial lungDf1 df
lungDf1 <- merge(lungDf1, lungDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
lungDf1$VarRes <- paste0(lungDf1$Response, lungDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
lungDf1$Response <- gsub("NA", "ND",lungDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
lungDf1$Response <- str_replace_all(str_c(lungDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
lungDf1 <- lungDf1[order(lungDf1$coord, lungDf1$Response),]

##
lungDf3 <- lungDf1
##

lungDf1 <- unique(ddply(lungDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
lungDf1 <- lungDf1[order(lungDf1$coord, lungDf1$Response),]

lungDf1[4, 4] <- "S310F and F258L"

#instantiating the GRanges object with the coordinates and names of somatic variations
lung.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(lungDf1)[1], function(i) rep(lungDf1$coord[i], lungDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(lungDf1)[1], function(i) rep(lungDf1$SNV[i], lungDf1$score[i])))))

#adding the stack.factor attribute
lung.gr$stack.factor <- unlist(sapply(1:dim(lungDf1)[1], function(i) paste0(lungDf1$Response[i], "_", seq(1:lungDf1$score[i]))))
lung.gr$value1 <- 100
lung.gr$value2 <- 100 - lung.gr$value1

lung.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", lung.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

lungDf3 <- ddply(lungDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
lungDf3 <- lungDf3[order(lungDf3$coord, lungDf3$Response),]
lung.gr$stack.factor <- as.character(lungDf3$idx)

#lungDf3$SNPsideID <- c(rep("top", 17), rep("bottom", 5), rep("top", 5))

#lung.gr$SNPsideID <- lungDf3$SNPsideID

#plotting
lolliplot(lung.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r LungOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7, fig.height = 8}
#load co-occurring somatic variants
LungSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Lung", endCol = 7, startRow = 31,  endRow = 183)
#IMPACT data
LungSomaticsImpact <- LungSomatics[(LungSomatics$Gene != "not done"),]
#subsetting
LungSomaticsImpact <- LungSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
LungSomaticsImpact <- unique(LungSomaticsImpact)
#set Subject.id to character
LungSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(LungSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
LungSomaticsImpact <- na.omit(LungSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
LungSomaticsImpact <- LungSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(LungSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data: commented out for this Jun 23 cut given no Non-IMPACt data
LungSomaticsNonImpact <- LungSomatics[(LungSomatics$Gene == "not done"),]
LungSomaticsNonImpact <- LungSomaticsNonImpact[,c(1,6,7)]
LungSomaticsNonImpact <- unique(LungSomaticsNonImpact)
LungSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(LungSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
LungSomaticsNonImpact <- na.omit(LungSomaticsNonImpact)
LungSomaticsNonImpact <- LungSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(LungSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for lung tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r LungyWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align='left', fig.height= 4, fig.width= 5.0}
#Waterfalls
LungWF <- lung[-4, c(1, 12, 13, 17)]

LungWF1 <- LungWF[complete.cases(LungWF$Percentage.Change.of.Tumor.Measurement),]
LungWF1$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(LungWF1$Percentage.Change.of.Tumor.Measurement))
LungWF1$Progression.Free.Survival.Time..Months. <- as.numeric(as.character(LungWF1$Progression.Free.Survival.Time..Months.))
LungWF1 <- LungWF1[order(-LungWF1$Percentage.Change.of.Tumor.Measurement),]

LungWF1$Subject.Identifier.for.the.Study <- as.character(LungWF1$Subject.Identifier.for.the.Study)

LungWF2 <- rbind(LungWF1, LungWF[is.na(LungWF$Percentage.Change.of.Tumor.Measurement),])
LungWF2$Percentage.Change.of.Tumor.Measurement <- as.numeric(as.character(LungWF2$Percentage.Change.of.Tumor.Measurement))

LungWF2$Subject.Identifier.for.the.Study <- as.character(LungWF2$Subject.Identifier.for.the.Study)

g1 <- ggplot(LungWF2, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top") 

g2 <- ggplot(LungWF2, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(),  axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.position = "bottom")  + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r LungOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 7.2, fig.height = 8.5, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
#d1 <- setdiff(gsub("\\*", "", colnames(mat)), LungWF$Subject.Identifier.for.the.Study)
#mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
#df1 <- LungWF[,c(1,4)]
#df2<- as.data.frame(colnames(mat2))
#colnames(df2) <- "subjid"
#df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
#df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

LungWF2 <- LungWF2[order(-LungWF2$Percentage.Change.of.Tumor.Measurement, LungWF2$Subject.Identifier.for.the.Study),]

df1 <- LungWF2[,c(1,4)]
df2<- as.data.frame(colnames(mat))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat <- mat[c(as.character(df3$subjid))]
mat <- mat[rowSums(is.na(mat)) < dim(mat)[2], ]

#mat <- mat[rowSums(is.na(mat)) < dim(mat)[2],]
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```

<hr /><hr />
#Ovarian  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r ovarian, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading lung enrolment data set
ovarian <- readWorksheet(SummitBioMarker0818, sheet = "Ovarian", endCol = 17, endRow = 5)
ovarian[ovarian == ""]  <- NA
#tabular views
kable(t(as.matrix(table(ovarian$Mutation.code))), caption = "ErBb2 somatic variants occurences")
kable(table(ovarian$Mutation.code, ovarian$Primary.Cell.Type
), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(ovarian$Mutation.code, ovarian$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
ovarian$Best.Overall.Response <- as.factor(str_sub(ovarian$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
ovarian$Best.Overall.Response <- factor(ovarian$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(ovarian$Mutation.code, ovarian$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(ovarian$Mutation.code, ovarian$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

ovarian$Clinical.Benefit <- revalue(ovarian$Clinical.Benefit, c("pending"="N/A"))

ovarian$Clinical.Benefit <- factor(ovarian$Clinical.Benefit, levels = c("YES", "NO", "N/A"), ordered = TRUE)

#assessing the potential contribution of ErbB2 somatic variant on Best Overall Response
##chisq.test(table(na.omit(ovarian$Clinical.Benefit, ovarian$Mutation.code)))
```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r ovarianSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
ovarian$ErBb2SomaticVarCoord <- as.numeric(str_extract(ovarian$Mutation.code, 
    "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
ovarian$ErBb2WildTypeAA <- str_sub(ovarian$Mutation.code, 1, 1)
ovarian$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", ovarian$Mutation.code, 
    perl = T)

# reordering the dataframe according to the aa coordinate
ovarian <- ovarian[order(ovarian$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
ovarianDf1 <- data.frame(ovarian$ErBb2SomaticVarCoord, paste0(ovarian$ErBb2WildTypeAA, ovarian$ErBb2SomaticVarCoord), ovarian$ErBb2SubsAA, ovarian$Best.Overall.Response)
#the ovarianDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(ovarianDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
ovarianDf2 <- as.data.frame(unique(ovarianDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
ovarianDf2$SNV <- paste0(ovarianDf2$WTaaCoord, ovarianDf2$Varalleles)
#adding it fo the initial ovarianDf1 df
ovarianDf1 <- merge(ovarianDf1, ovarianDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
ovarianDf1$VarRes <- paste0(ovarianDf1$Response, ovarianDf1$SNV)
ovarianDf1 <- unique(ddply(ovarianDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
ovarianDf1 <- ovarianDf1[order(ovarianDf1$coord, ovarianDf1$VarRes),]
#replacing NA with the "ND" string value for Not Determined
ovarianDf1$Response <- replace(as.character(ovarianDf1$Response), is.na(ovarianDf1$Response), "ND")

#Order lolliplot from top to bottom CR, PR, SD, PD, ND
ovarianDf1$Response <- str_replace_all(str_c(ovarianDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "NA" = "A-ND"))

#instantiating the GRanges object with the coordinates and names of somatic variations
ovarian.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(ovarianDf1)[1], function(i) rep(ovarianDf1$coord[i], ovarianDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(ovarianDf1)[1], function(i) rep(ovarianDf1$SNV[i], ovarianDf1$score[i])))))

#adding the stack.factor attribute
ovarian.gr$stack.factor <- unlist(sapply(1:dim(ovarianDf1)[1], function(i) paste0(ovarianDf1$Response[i], "_", seq(1:ovarianDf1$score[i]))))
ovarian.gr$value1 <- 100
ovarian.gr$value2 <- 100 - ovarian.gr$value1

ovarian.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", ovarian.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

ovarianDf1 <- ddply(ovarianDf1, .(WTaaCoord), mutate, id = seq_along(coord))
ovarianDf1 <- ovarianDf1[order(ovarianDf1$coord),]
ovarian.gr$stack.factor <- as.character(ovarianDf1$id)

#plotting
lolliplot(ovarian.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7)
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r OvarianOncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#load co-occurring somatic variants
OvarianSomatics <- readWorksheet(SummitBioMarker0818, sheet = "Ovarian", endCol = 7, startRow = 9,  endRow = 27)
#IMPACT data
OvarianSomaticsImpact <- OvarianSomatics[(OvarianSomatics$Gene != "not done"),]
#subsetting
OvarianSomaticsImpact <- OvarianSomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
OvarianSomaticsImpact <- unique(OvarianSomaticsImpact)
#set Subject.id to character
OvarianSomaticsImpact$Subject.Identifier.for.the.Study <- as.character(OvarianSomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
OvarianSomaticsImpact <- na.omit(OvarianSomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
OvarianSomaticsImpact <- OvarianSomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(OvarianSomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene

#mat <- as.data.frame(mat[,-1])
#colnames(mat) <- "53"

#Non IMPACT data
OvarianSomaticsNonImpact <- OvarianSomatics[(OvarianSomatics$Gene == "not done"),]
OvarianSomaticsNonImpact <- OvarianSomaticsNonImpact[,c(1,6,7)]
OvarianSomaticsNonImpact <- unique(OvarianSomaticsNonImpact)
OvarianSomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(OvarianSomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
OvarianSomaticsNonImpact <- na.omit(OvarianSomaticsNonImpact)
OvarianSomaticsNonImpact <- OvarianSomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(OvarianSomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
#mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names

mat <- mat[,-c(1,2,5)]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for ovarian tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```

##Waterfall and swim plots with oncoplot
```{r ovarianyWF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align = 'left', fig.height = 4, fig.width = 5.0}
#Waterfalls
ovarianWF <- ovarian[,c(1, 12, 13, 17)]

#selecting records for which there are IMPACT and Non-IMPACT genotype data
ovarianWF <- ovarianWF[(ovarianWF$Subject.Identifier.for.the.Study %in% intersect(ovarianWF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

#ovarianWF <- ovarianWF[complete.cases(ovarianWF$Percentage.Change.of.Tumor.Measurement),]
ovarianWF <- ovarianWF[order(-ovarianWF$Percentage.Change.of.Tumor.Measurement),]

ovarianWF$Subject.Identifier.for.the.Study <- as.character(ovarianWF$Subject.Identifier.for.the.Study)

g1 <- ggplot(ovarianWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 100) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top") 

g2 <- ggplot(ovarianWF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(),  axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r ovarianOncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 6.6, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
##d1 <- setdiff(gsub("\\*", "", colnames(mat)), ovarianWF$Subject.Identifier.for.the.Study)
##d1 <- sapply(d1, function(x) paste("^",x, "$", sep = ""))
##mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- ovarianWF[,c(1,4)]
df2 <- as.data.frame(colnames(mat))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat <- mat[c(as.character(df3$subjid))]
mat <- mat[rowSums(is.na(mat)) < dim(mat)[2],]
oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```

<hr /><hr />
#HER2_NOS  
##Distributions of ErBb2 somatic variants w.r.t. clinical endpoints.
```{r HER2_NOS, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
#reading HER2_NOS enrolment data set
HER2_NOS <- readWorksheet(SummitBioMarker0818, sheet = "HER2_NOS", endCol = 17, endRow = 18)
HER2_NOS[HER2_NOS == ""]  <- NA
#tabular views
kable(t(as.matrix(table(HER2_NOS$Mutation.code))), caption = "ErBb2 somatic variants occurences")
##kable(table(HER2_NOS$Mutation.code, HER2_NOS$Primary.Cell.Type), caption = "ErbB2 mutation by Primary Cell Type")
kable(table(HER2_NOS$Mutation.code, HER2_NOS$Objective.Response.at.Week.8), caption = "ErbB2 mutation by Objective Response at Week 8")

#~Best overall response
#keep first 2 characters for the Best.Overall.Response's attribute value
HER2_NOS$Best.Overall.Response <- as.factor(str_sub(HER2_NOS$Best.Overall.Response, 1, 2))
#setting values to the Best.Overall.Response attribute
HER2_NOS$Best.Overall.Response <- factor(HER2_NOS$Best.Overall.Response, levels = c("CR", "PR", "SD", "PD", "NA"), ordered = TRUE)
BestOverallResPerVar <-  as.data.frame.matrix(table(HER2_NOS$Mutation.code, HER2_NOS$Best.Overall.Response))
datatable(BestOverallResPerVar, caption = "ErbB2 mutation by Best Overall Response")

ClinicalBenefitPerVar <- as.data.frame.matrix(table(HER2_NOS$Mutation.code, HER2_NOS$Clinical.Benefit))
datatable(ClinicalBenefitPerVar, caption = "ErbB2 mutations by Clinical Benefit Status")

#HER2_NOS$Clinical.Benefit <- revalue(HER2_NOS$Clinical.Benefit, c("pending"="N/A"))

```
<hr /> 

##Lolliplot: Annotated ErBb2 somatic variants
```{r HER2_NOSSNV, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE}
# adding one variable, i.e. the somatic variant coordinate w.r.t. ErbB2 aa
# sequence
HER2_NOS$ErBb2SomaticVarCoord <- as.numeric(str_extract(HER2_NOS$Mutation.code, 
    "\\d+"))
# adding two additional variables: the wild type aa and the substituted aa
HER2_NOS$ErBb2WildTypeAA <- str_sub(HER2_NOS$Mutation.code, 1, 1)
HER2_NOS$ErBb2SubsAA <- gsub("^\\w{1}\\d+", "", HER2_NOS$Mutation.code, perl = T)

# reordering the dataframe according to the aa coordinate
HER2_NOS <- HER2_NOS[order(HER2_NOS$ErBb2SomaticVarCoord), ]

#creating the data structure for the GRanges() GenomicRanges package function
HER2_NOSDf1 <- data.frame(HER2_NOS$ErBb2SomaticVarCoord, paste0(HER2_NOS$ErBb2WildTypeAA, HER2_NOS$ErBb2SomaticVarCoord), HER2_NOS$ErBb2SubsAA, HER2_NOS$Best.Overall.Response)
#the HER2_NOSDf1 df Erbb2Features 4 columns, the coordinates where the variations map, the WT aa with the coordinate, the variant AA and the clinical response.
colnames(HER2_NOSDf1) <- c("coord", "WTaaCoord", "Var", "Response")
#creating a 2nd df in order to concatenate the substituions mapping to the same coordinate
HER2_NOSDf2 <- as.data.frame(unique(HER2_NOSDf1[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
HER2_NOSDf2$SNV <- paste0(HER2_NOSDf2$WTaaCoord, HER2_NOSDf2$Varalleles)
#adding it fo the initial HER2_NOSDf1 df
HER2_NOSDf1 <- merge(HER2_NOSDf1, HER2_NOSDf2[,-c(2,3)], by = "WTaaCoord")
#adding one attribute: concatenation of the clinical response and the variant
HER2_NOSDf1$VarRes <- paste0(HER2_NOSDf1$Response, HER2_NOSDf1$SNV)
# replacing NA with the 'ND' string value for Not Determined
HER2_NOSDf1$Response <- gsub("NA", "ND",HER2_NOSDf1$Response)
#set order for clinical response value to bottom CR, PR, SD, PD, ND
HER2_NOSDf1$Response <- str_replace_all(str_c(HER2_NOSDf1$Response), c("CR" = "E-CR", "PR" = "D-PR", "SD" = "C-SD", "PD" = "B-PD", "ND" = "A-ND"))
#order the df
HER2_NOSDf1 <- HER2_NOSDf1[order(HER2_NOSDf1$coord, HER2_NOSDf1$Response),]

##
HER2_NOSDf3 <- HER2_NOSDf1
##

HER2_NOSDf1 <- unique(ddply(HER2_NOSDf1[,-c(3)], "VarRes", mutate, score = length(VarRes)))
HER2_NOSDf1 <- HER2_NOSDf1[order(HER2_NOSDf1$coord, HER2_NOSDf1$Response),]

#instantiating the GRanges object with the coordinates and names of somatic variations
HER2_NOS.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(HER2_NOSDf1)[1], function(i) rep(HER2_NOSDf1$coord[i], HER2_NOSDf1$score[i]))), width = 1, names = unlist(sapply(1:dim(HER2_NOSDf1)[1], function(i) rep(HER2_NOSDf1$SNV[i], HER2_NOSDf1$score[i])))))

#adding the stack.factor attribute
HER2_NOS.gr$stack.factor <- unlist(sapply(1:dim(HER2_NOSDf1)[1], function(i) paste0(HER2_NOSDf1$Response[i], "_", seq(1:HER2_NOSDf1$score[i]))))
HER2_NOS.gr$value1 <- 100
HER2_NOS.gr$value2 <- 100 - HER2_NOS.gr$value1

HER2_NOS.gr$color <- response.color.set[gsub("_\\d*|\\w*-", "", HER2_NOS.gr$stack.factor)]
legend <- list(labels = response, col = "gray80", fill = sapply(response.color.set, `[`, 1))

HER2_NOSDf3 <- ddply(HER2_NOSDf3, .(WTaaCoord), mutate, idx = seq_along(coord))
HER2_NOSDf3 <- HER2_NOSDf3[order(HER2_NOSDf3$coord, HER2_NOSDf3$Response),]
HER2_NOS.gr$stack.factor <- as.character(HER2_NOSDf3$idx)

#plotting
lolliplot(HER2_NOS.gr, Erbb2Features, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .7, jitter = "label")
```

##Oncoplot: co-occurrence of somatic variations (both IMPACT and NON-IMPACT data sets) 
```{r HER2_NOS_Oncoplot, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 6.0, fig.height = 8.5}
#load co-occurring somatic variants
HER2Somatics <- readWorksheet(SummitBioMarker0818, sheet = "HER2_NOS", endCol = 7, startRow = 22,  endRow = 168)
#IMPACT data
HER2SomaticsImpact <- HER2Somatics[(HER2Somatics$Gene != "not done"),]
#subsetting
HER2SomaticsImpact <- HER2SomaticsImpact[,c(1,3,4)]
#filtering out duplication rows
HER2SomaticsImpact <- unique(HER2SomaticsImpact)
#set Subject.id to character
HER2SomaticsImpact$Subject.Identifier.for.the.Study <- as.character(HER2SomaticsImpact$Subject.Identifier.for.the.Study)
#filtering it out rows where missing values
HER2SomaticsImpact <- na.omit(HER2SomaticsImpact)
#concatenate variations' types whenever occuring in a given gene for a given subject
HER2SomaticsImpact <- HER2SomaticsImpact %>% group_by(Subject.Identifier.for.the.Study, Gene) %>% do(Alteration = paste(.$Alteration, collapse = '')) %>% ungroup() %>% mutate(Alteration = unlist(Alteration))
mat <- as.data.frame(spread(HER2SomaticsImpact, Subject.Identifier.for.the.Study, Alteration))
row.names(mat) <- mat$Gene
mat <- mat[,-1]

#Non IMPACT data
HER2SomaticsNonImpact <- HER2Somatics[(HER2Somatics$Gene == "not done"),]
HER2SomaticsNonImpact <- HER2SomaticsNonImpact[,c(1,6,7)]
HER2SomaticsNonImpact <- unique(HER2SomaticsNonImpact)
HER2SomaticsNonImpact$Subject.Identifier.for.the.Study <- as.character(paste(HER2SomaticsNonImpact$Subject.Identifier.for.the.Study, "*", sep = ""))
HER2SomaticsNonImpact <- na.omit(HER2SomaticsNonImpact)
HER2SomaticsNonImpact <- HER2SomaticsNonImpact %>% group_by(Subject.Identifier.for.the.Study, Gene.1) %>% do(Alteration.1 = paste(.$Alteration.1, collapse = '')) %>% ungroup() %>% mutate(Alteration.1 = unlist(Alteration.1))

#creating the mutations matrix for oncoprint
mat2 <- as.data.frame(spread(HER2SomaticsNonImpact, Subject.Identifier.for.the.Study, Alteration.1))
mat2 <- subset(mat2, select = colnames(mat2) != "NA*")
row.names(mat2) <- mat2$Gene.1
mat2 <- mat2[,-1]

mat <- merge(mat, mat2, by = "row.names", all = T)
row.names(mat) <- mat$Row.names
mat <- mat[,-1]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), column_title = "Oncoplot for HER2-NOS tumor", heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE)
```
  
##Waterfall and swim plots with oncoplot
```{r HER2WF, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.align ='left', fig.height = 4, fig.width = 5.2}
#Waterfalls
HER2WF <- HER2_NOS[,c(1, 12, 13, 17)]

#selecting records for which there are IMPACT and Non-IMPACT genotype data
HER2WF <- HER2WF[(HER2WF$Subject.Identifier.for.the.Study %in% intersect(HER2WF$Subject.Identifier.for.the.Study, gsub("\\*", "", colnames(mat)))),]

#HER2WF <- HER2WF[complete.cases(HER2WF$Percentage.Change.of.Tumor.Measurement),]
HER2WF <- HER2WF[order(-HER2WF$Percentage.Change.of.Tumor.Measurement),]

HER2WF$Subject.Identifier.for.the.Study <- as.character(HER2WF$Subject.Identifier.for.the.Study)

g1 <- ggplot(HER2WF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Percentage.Change.of.Tumor.Measurement, fill = Best.Overall.Response)) + 
    geom_col() + ylab("Change from baseline (%)") + theme_bw() +  ylim(-100, 106) + theme(axis.title.x = element_blank(), axis.text.x = element_blank(), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6), legend.text = element_text(size = 6), legend.title = element_text(size = 8), plot.margin = unit(c(0,0,0,1.6), "cm"), legend.position = "top")

g2 <- ggplot(HER2WF, aes(x = reorder(Subject.Identifier.for.the.Study, -Percentage.Change.of.Tumor.Measurement), y = Progression.Free.Survival.Time..Months., fill = Best.Overall.Response)) + 
    geom_col() + ylab("PFS (mo)") + theme_bw() + theme(axis.title.x = element_blank(),  axis.text.x = element_text(size = 6, angle = 90, hjust = 1), axis.title.y = element_text(size = 9), axis.text.y = element_text(size = 6)) + guides(fill=FALSE)

plot_grid(g1, g2, nrow = 2, align = "v")
```

```{r HER2Oncoplot2, echo = TRUE, message = FALSE, warning = FALSE,  tidy = TRUE, fig.width = 6.8, fig.height= 8, fig.align = "center"}
#given the occurrences of missing data for the %change of tumor measurement variable, subset out the mat matrix for the Subject.Identifier where such missing data.
##d1 <- setdiff(gsub("\\*", "", colnames(mat)), HER2WF$Subject.Identifier.for.the.Study)
##d1 <- sapply(d1, function(x) paste("^",x, "$", sep = ""))
##mat2 <- mat[, !colnames(mat) %in% grep(paste(d1, collapse = "|"), colnames(mat), value = T)]
#creating a df for the purpose of reordering the mat object passed to the Oncoprint function
df1 <- HER2WF[,c(1,4)]
df2 <- as.data.frame(colnames(mat))
colnames(df2) <- "subjid"
df2$Subject.Identifier.for.the.Study <- gsub("\\*", "", df2$subjid)
df3 <- merge(df1, df2, by = "Subject.Identifier.for.the.Study", sort = F)

mat <- mat[c(as.character(df3$subjid))]
mat <- mat[rowSums(is.na(mat2)) < dim(mat)[2],]

oncoPrint(mat, get_type = function(x) strsplit(x, ";")[[1]], alter_fun = alter_fun, col = col, row_names_gp = gpar(fontsize = 6), pct_gp = gpar(fontsize = 6), heatmap_legend_param = list(title = "Alterations", at = c("Missense", "AMP", "Nonsense", "Indel", "splice", "Gain", "Promoter", "DeepDel", "rearrangement", "MissenseAmp"), labels = c("Missense", "Amplification", "Nonsense", "Indel", "splice", "Gain", "Promoter", "Deep Del", "rearrangement","Missense & Amp")), show_column_names = TRUE, column_order = NULL)
```

  
#Lollilplot of all tumor types irrespective of the clinical response value.  
```{r alltumortypes, echo = TRUE, message = FALSE, warning = FALSE,  fig.height = 14, tidy = TRUE, fig.width = 10, fig.width = 10}
bladderDfTumType <- data.frame(bladder$ErBb2SomaticVarCoord, paste0(bladder$ErBb2WildTypeAA, bladder$ErBb2SomaticVarCoord), bladder$ErBb2SubsAA)
colnames(bladderDfTumType) <- c("coord", "WTaaCoord", "Var")
bladderDfTumType$TumorType <- "Bladder"

biliaryDfTumType <- data.frame(biliary$ErBb2SomaticVarCoord, paste0(biliary$ErBb2WildTypeAA, biliary$ErBb2SomaticVarCoord), biliary$ErBb2SubsAA)
colnames(biliaryDfTumType) <- c("coord", "WTaaCoord", "Var")
biliaryDfTumType$TumorType <- "Biliary"

breastmonoTumType <- data.frame(Breast_mono$ErBb2SomaticVarCoord, paste0(Breast_mono$ErBb2WildTypeAA, Breast_mono$ErBb2SomaticVarCoord), Breast_mono$ErBb2SubsAA)
colnames(breastmonoTumType) <- c("coord", "WTaaCoord", "Var")
breastmonoTumType$TumorType <- "Breast"

breastcomboTumType <- data.frame(Breast_combo$ErBb2SomaticVarCoord, paste0(Breast_combo$ErBb2WildTypeAA, Breast_combo$ErBb2SomaticVarCoord),Breast_combo$ErBb2SubsAA)
colnames(breastcomboTumType) <- c("coord", "WTaaCoord", "Var")
breastcomboTumType$TumorType <- "Breast"

cervicalTumType <- data.frame(cervical$ErBb2SomaticVarCoord, paste0(cervical$ErBb2WildTypeAA, cervical$ErBb2SomaticVarCoord),cervical$ErBb2SubsAA)
colnames(cervicalTumType) <- c("coord", "WTaaCoord", "Var")
cervicalTumType$TumorType <- "Cervical"

colorectalTumType <- data.frame(colorectal$ErBb2SomaticVarCoord, paste0(colorectal$ErBb2WildTypeAA, colorectal$ErBb2SomaticVarCoord),colorectal$ErBb2SubsAA)
colnames(colorectalTumType) <- c("coord", "WTaaCoord", "Var")
colorectalTumType$TumorType <- "Colorectal"

endometrialTumType <- data.frame(endometrial$ErBb2SomaticVarCoord, paste0(endometrial$ErBb2WildTypeAA, endometrial$ErBb2SomaticVarCoord), endometrial$ErBb2SubsAA)
colnames(endometrialTumType) <- c("coord", "WTaaCoord", "Var")
endometrialTumType$TumorType <- "Endometrial"

gastroesophagealTumType <- data.frame(gastroesophageal$ErBb2SomaticVarCoord, paste0(gastroesophageal$ErBb2WildTypeAA, gastroesophageal$ErBb2SomaticVarCoord), gastroesophageal$ErBb2SubsAA)
colnames(gastroesophagealTumType) <- c("coord", "WTaaCoord", "Var")
gastroesophagealTumType$TumorType <- "Gastroesophageal"

lungTumType <- data.frame(lung$ErBb2SomaticVarCoord, paste0(lung$ErBb2WildTypeAA, lung$ErBb2SomaticVarCoord), lung$ErBb2SubsAA)
colnames(lungTumType) <- c("coord", "WTaaCoord", "Var")
lungTumType$TumorType <- "Lung"

ovarianTumType <- data.frame(ovarian$ErBb2SomaticVarCoord, paste0(ovarian$ErBb2WildTypeAA, ovarian$ErBb2SomaticVarCoord), ovarian$ErBb2SubsAA)
colnames(ovarianTumType) <- c("coord", "WTaaCoord", "Var")
ovarianTumType$TumorType <- "Ovarian"

HER2_NOSTumType <- data.frame(HER2_NOS$ErBb2SomaticVarCoord, paste0(HER2_NOS$ErBb2WildTypeAA, HER2_NOS$ErBb2SomaticVarCoord), HER2_NOS$ErBb2SubsAA)
colnames(HER2_NOSTumType) <- c("coord", "WTaaCoord", "Var")
HER2_NOSTumType$TumorType <- "HER2NOS"

AllTumorTypes <- rbind(biliaryDfTumType, bladderDfTumType, breastmonoTumType, breastcomboTumType, cervicalTumType, colorectalTumType, endometrialTumType, gastroesophagealTumType, lungTumType, ovarianTumType, HER2_NOSTumType)

AllTumorTypes <- AllTumorTypes[order(AllTumorTypes$coord), ]

AllTumorTypes <- AllTumorTypes[(AllTumorTypes$coord != 259 & AllTumorTypes$coord != 779),]

#creating a 2nd df in order to concatenate the substitutions mapping to the same coordinate
AllTumorTypes2 <- as.data.frame(unique(AllTumorTypes[,-4]) %>% group_by(WTaaCoord, coord) %>% do(Varalleles = paste(.$Var, collapse = "/")) %>% ungroup() %>% mutate(Varalleles = unlist(Varalleles)))
#adding one variable referred to as SNV which reads the WT aa, the coordinate and the substituted aa
AllTumorTypes2$SNV <- paste0(AllTumorTypes2$WTaaCoord, AllTumorTypes2$Varalleles)

AllTumorTypes <- merge(AllTumorTypes, AllTumorTypes2[,-c(2,3)], by = "WTaaCoord")

#adding one attribute: concatenation of the tumor type and the variant
AllTumorTypes$VarRes <- paste0(AllTumorTypes$TumorType, AllTumorTypes$SNV)

AllTumorTypes <- AllTumorTypes[order(AllTumorTypes$coord, AllTumorTypes$TumorType),]

#duplicated the AllTumorTypes Df object for later plotting on the kinase domain
AllTumorTypes_2 <- AllTumorTypes

AllTumorTypes3 <- AllTumorTypes

AllTumorTypes <- unique(ddply(AllTumorTypes[,-c(3)], "VarRes", mutate, score = length(VarRes)))
AllTumorTypes <- AllTumorTypes[order(AllTumorTypes$coord, AllTumorTypes$TumorType),]

#instantiating the GRanges object with the coordinates and names of somatic variations
AllTumorTypes.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(AllTumorTypes)[1], function(i) rep(AllTumorTypes$coord[i], AllTumorTypes$score[i]))), width = 1, names = unlist(sapply(1:dim(AllTumorTypes)[1], function(i) rep(AllTumorTypes$SNV[i], AllTumorTypes$score[i])))))

#adding the stack.factor attribute
AllTumorTypes.gr$stack.factor <- unlist(sapply(1:dim(AllTumorTypes)[1], function(i) paste0(AllTumorTypes$TumorType[i], "_", seq(1:AllTumorTypes$score[i]))))

AllTumorTypes.gr$value1 <- 100
AllTumorTypes.gr$value2 <- 100 - AllTumorTypes.gr$value1

tumor <- c("Biliary", "Bladder", "Breast", "Cervical", "Colorectal", "Endometrial", "Gastroesophageal", "Lung", "Ovarian", "HER2NOS")

tumor.color.set <- as.list(as.data.frame(rbind(c("red", "blue", "yellow", "green", "azure", "darkorange", "cyan", "darkmagenta", "black", "bisque"), "#FFFFFFFF"), stringsAsFactors = FALSE))
names(tumor.color.set) <- tumor

AllTumorTypes.gr$color <- tumor.color.set[gsub("_\\d*|\\w*-", "", AllTumorTypes.gr$stack.factor)]
legend <- list(labels = tumor, col = "gray80", fill = sapply(tumor.color.set, `[`, 1))


AllTumorTypes3 <- ddply(AllTumorTypes3, .(WTaaCoord), mutate, idx = seq_along(coord))
AllTumorTypes3 <- AllTumorTypes3[order(AllTumorTypes3$coord, AllTumorTypes3$TumorType),]


AllTumorTypes3$idx <- str_replace_all(str_c(AllTumorTypes3$idx), c("37" = "ZK", "36" = "ZJ", "35" = "ZI", "34" = "ZH", "33" = "ZG", "32" = "ZF", "31" = "ZE", "30" = "ZD", "29" = "ZC", "28" = "ZB", "27" = "ZA", "26" = "Z", "25" = "Y",  "24" = "X", "23" = "W", "22" = "V", "21" = "U", "20" = "T", "19" = "S", "18" = "R", "17" = "Q", "16" = "P", "15" = "O", "14" = "N", "13" = "M", "12" = "L", "11" = "K", "10" = "J","1" = "A", "2" = "B", "3" = "C", "4" = "D", "5" = "E", "6" = "F", "7" = "G", "8" = "H", "9" = "I"))

AllTumorTypes.gr$stack.factor <- as.character(AllTumorTypes3$idx)


#AllTumorTypes3$SNPsideID <- c(rep("top", 7), rep("bottom", 1), rep("top", 50), rep("bottom", 1), rep("top", 24), rep("bottom", 1), rep("top", 1), rep("bottom", 9), rep("top", 15), rep("bottom", 1), rep("top", 1), rep("bottom", 8), rep("top", 21), rep("bottom", 10), rep("top", 1), rep("bottom", 1), rep("top", 1), rep("bottom", 1), rep("top", 1), rep("bottom", 2), rep("top", 6), rep("bottom", 2),rep("top", 4), rep("bottom", 2), rep("top", 1))

#AllTumorTypes.gr$SNPsideID <- AllTumorTypes3$SNPsideID

#plotting
trackViewer::lolliplot(AllTumorTypes.gr, Erbb2Features2, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .4)
```


#Lolliplot of all tumor types irrespective of the clinical response in the Erbb2 kinase domain
```{r alltumortypesErbb2kinase, echo = TRUE, message = FALSE, warning = FALSE, fig.height = 10, tidy = TRUE}

AllTumorTypes_2 <- AllTumorTypes_2[(AllTumorTypes_2$coord > 674 & AllTumorTypes_2$coord < 1000), ]
AllTumorTypes_2$coord <- AllTumorTypes_2$coord - 674

AllTumorTypes3_2 <- AllTumorTypes_2

AllTumorTypes_2 <- unique(ddply(AllTumorTypes_2[,-c(3)], "VarRes", mutate, score = length(VarRes)))
AllTumorTypes_2 <- AllTumorTypes_2[order(AllTumorTypes_2$coord, AllTumorTypes_2$TumorType),]

# instantiating the GRanges object with the coordinates and names of somatic
# variations
AllTumorTypes2.gr <- GRanges("ERBB2", IRanges(unlist(sapply(1:dim(AllTumorTypes_2)[1], 
    function(i) rep(AllTumorTypes_2$coord[i], AllTumorTypes_2$score[i]))), width = 1, 
    names = unlist(sapply(1:dim(AllTumorTypes_2)[1], function(i) rep(AllTumorTypes_2$SNV[i], 
        AllTumorTypes_2$score[i])))))

#adding the stack.factor attribute
AllTumorTypes2.gr$stack.factor <- unlist(sapply(1:dim(AllTumorTypes_2)[1], function(i) paste0(AllTumorTypes_2$TumorType[i], "_", seq(1:AllTumorTypes_2$score[i]))))

AllTumorTypes2.gr$value1 <- 100
AllTumorTypes2.gr$value2 <- 100 - AllTumorTypes2.gr$value1

AllTumorTypes2.gr$color <- tumor.color.set[gsub("_\\d*|\\w*-", "", AllTumorTypes2.gr$stack.factor)]
legend <- list(labels = tumor, col = "gray80", fill = sapply(tumor.color.set, `[`, 1))

AllTumorTypes3_2 <- ddply(AllTumorTypes3_2, .(WTaaCoord), mutate, idx = seq_along(coord))
AllTumorTypes3_2 <- AllTumorTypes3_2[order(AllTumorTypes3_2$coord, AllTumorTypes3_2$TumorType),]


AllTumorTypes3_2$idx <- str_replace_all(str_c(AllTumorTypes3_2$idx), c("25" = "Y", "24" = "X", "23" = "W", "22" = "V", "21" = "U", "20" = "T", "19" = "S", "18" = "R", "17" = "Q", "16" = "P", "15" = "O", "14" = "N", "13" = "M", "12" = "L", "11" = "K", "10" = "J", "1" = "A", "2" = "B", "3" = "C", "4" = "D", "5" = "E", "6" = "F", "7" = "G", "8" = "H", "9" = "I"))

AllTumorTypes2.gr$stack.factor <- as.character(AllTumorTypes3_2$idx)

#plotting
trackViewer::lolliplot(AllTumorTypes2.gr, Erbb2KinaseFeature, type = "pie.stack", legend = legend, dashline.col = "gray", cex = .6)
```


#Citations  
```{r citations}  
sapply(c("XLConnect", "knitr", "ggplot2", "GenomicRanges", "rtracklayer", "trackViewer", "plyr", "dplyr", "DT", "stringr", "tidyr", "magrittr","ComplexHeatmap", "GetoptLong"), citation)
```

```{r CloseUp}
sessionInfo()
```