Collection of major bio-themed data visuals by Amal Katrib
A box plot and kernel density plot hybrid that shows summary statistics as well as the full distribution of the data
Below is a sample code that can be used to generate violin plots in R:
# load the appropriate packages
library(ggplot2)
# generate plots
plist = list()
x = 1
for (i in unique(data$geneName)) {
p = ggplot(data[data$geneName %in% i, ], aes(x = group1, y = value)) +
geom_violin(scale = "count", position = position_dodge(width = 1), trim = F) +
geom_boxplot(aes(x = group1, y = value), notch = F) +
geom_point(position = position_jitterdodge(jitter.width = 0.5), aes(color = group2)) +
geom_vline(xintercept = c(x,y)) +
labs(x = "", y = ""))
plot_list[[x]] = p
x = x + 1 }
names(plist) = unique(data$geneName)
# save plots
lapply(1:length(plot_list), function(i) {
png("violionPlot.png"), 5, 5, res = 300, units = "in")
print(plot_list[[i]])
dev.off() })
A hierarchical clustering visual with a color scale-rendition of numerical data to help reveal underlying patterns
I recommend using the heatmap.3() function in R so you can include multiple row and column side bars with added sample and gene info. Data inputs, and their corresponding formats, include:
- "data" matrix log-/variance stabilization-transformed normalized read counts (when used in next-gen seq)
- "clab" matrix color mapping of sample of info matrix
sample 1 | sample 2 | sample 3 | sample 4 | |
---|---|---|---|---|
gene 1 | 3 | 10 | 9 | 5 |
gene 2 | 9 | 4 | 6 | 10 |
gene 3 | 3 | 6 | 6 | 9 |
gene 4 | 8 | 6 | 8 | 10 |
infoColor 1 | infoColor 2 | infoColor 3 | infoColor 4 | |
---|---|---|---|---|
sample 1 | red | yellow | orange | darkblue |
sample 2 | red | green | black | darkred |
sample 3 | blue | yellow | orange | darkblue |
sample 4 | blue | yellow | black | darkblue |
Below is a sample code that can be used to generate heatmaps in R:
hr <- hclust(as.dist(1-cor(t(data), method="pearson")), method="average")
hc <- hclust(as.dist(1-cor(data, method="pearson")), method="average")
heatmap.3(data,
Rowv = as.dendrogram(hr), Colv = as.dendrogram(hc),
dendrogram = "both", col = palette, ColSideColors = clab, key = TRUE)
# select a data-representative color palette
palette <- colorRampPalette(c("yellow3","white","darkblue"))
Depending on what you intend to visualize, data can be scaled to mean = 0 & standard deviation = 1 either by:
- Setting the
scale
parameter in the heatmap function usingheatmap.3(scale = "row" )
- Directly scaling the matrix content using
t(scale(t(data)))
- Re-arrange columns in the heatmap to best convey your message, either by:
- Maintaining the original sample order
- Using unsupervised hierchical clustering of samples
- Pay attention to the color scheme:
- Use Diverging Palettes such as red-blue or yellow-blue if you want to have 2 contrasting colors that represent variation from a reference value. This is often used in heatmaps when representing differential analysis results
- Use Sequential Palettes such as white-lightgrey-darkgrey-black if you want to represent sequential (increasing / decreasing) data such as age and height
- Use Categorical Palettes such as red-black-yellow-orange if we want to represent categorical data such as gender and disease state
- Select a color scheme that color-blind individuals can readily see.AVOID RED-GREEN
- Avoid excessive inclusion of colors so as to not confuse your audience
- Consider the well-perceived "viridis" color scale:
install.packages("viridis")