GitHub - akatrib/bio-visual: Collection of bio-themed data visuals

Bio-Themed Visuals

Collection of major bio-themed data visuals by Amal Katrib

(1) Violin Plot

A box plot and kernel density plot hybrid that shows summary statistics as well as the full distribution of the data

Below is a sample code that can be used to generate violin plots in R:

# load the appropriate packages
library(ggplot2)

# generate plots
plist = list()
x = 1
for (i in unique(data$geneName)) {
  p = ggplot(data[data$geneName %in% i, ], aes(x = group1, y = value)) +
        geom_violin(scale = "count", position = position_dodge(width = 1), trim = F) +
        geom_boxplot(aes(x = group1, y = value), notch = F) +
        geom_point(position = position_jitterdodge(jitter.width = 0.5), aes(color = group2)) +
        geom_vline(xintercept = c(x,y)) +
        labs(x = "", y = ""))

  plot_list[[x]] = p
  x = x + 1  }
names(plist) = unique(data$geneName)

# save plots
lapply(1:length(plot_list), function(i) {
           png("violionPlot.png"), 5, 5, res = 300, units = "in")
           print(plot_list[[i]])
           dev.off() })

(2) Heatmap

A hierarchical clustering visual with a color scale-rendition of numerical data to help reveal underlying patterns

I recommend using the heatmap.3() function in R so you can include multiple row and column side bars with added sample and gene info. Data inputs, and their corresponding formats, include:

"data" matrix log-/variance stabilization-transformed normalized read counts (when used in next-gen seq)
"clab" matrix color mapping of sample of info matrix

	sample 1	sample 2	sample 3	sample 4
gene 1	3	10	9	5
gene 2	9	4	6	10
gene 3	3	6	6	9
gene 4	8	6	8	10

	infoColor 1	infoColor 2	infoColor 3	infoColor 4
sample 1	red	yellow	orange	darkblue
sample 2	red	green	black	darkred
sample 3	blue	yellow	orange	darkblue
sample 4	blue	yellow	black	darkblue

Below is a sample code that can be used to generate heatmaps in R:

hr <- hclust(as.dist(1-cor(t(data), method="pearson")), method="average")
hc <- hclust(as.dist(1-cor(data, method="pearson")), method="average")

heatmap.3(data,
                  Rowv = as.dendrogram(hr), Colv = as.dendrogram(hc),
                  dendrogram = "both", col = palette, ColSideColors = clab, key = TRUE)

# select a data-representative color palette
palette <- colorRampPalette(c("yellow3","white","darkblue"))

Depending on what you intend to visualize, data can be scaled to mean = 0 & standard deviation = 1 either by:

Setting the scale parameter in the heatmap function using heatmap.3(scale = "row" )
Directly scaling the matrix content using t(scale(t(data)))

Re-arrange columns in the heatmap to best convey your message, either by:

Maintaining the original sample order

Using unsupervised hierchical clustering of samples

Pay attention to the color scheme:

Use Diverging Palettes such as red-blue or yellow-blue if you want to have 2 contrasting colors that represent variation from a reference value. This is often used in heatmaps when representing differential analysis results

Use Sequential Palettes such as white-lightgrey-darkgrey-black if you want to represent sequential (increasing / decreasing) data such as age and height

Use Categorical Palettes such as red-black-yellow-orange if we want to represent categorical data such as gender and disease state

Select a color scheme that color-blind individuals can readily see.AVOID RED-GREEN

Avoid excessive inclusion of colors so as to not confuse your audience

Consider the well-perceived "viridis" color scale: install.packages("viridis")

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
img		img
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bio-Themed Visuals

(1) Violin Plot

(2) Heatmap

About

akatrib/bio-visual

Folders and files

Latest commit

History

Repository files navigation

Bio-Themed Visuals

(1) Violin Plot

(2) Heatmap

About

Topics

Resources

Stars

Watchers

Forks