IMPORTANT: this package is abandond, and will be deleted soon. Current developement happenes in https://github.com/JetiLab/blotIt
The present package is a rewritten version of blotIt2 by Daniel Kaschek. The aim of this toolbox is to scale biological replicate data to a common scale, making the quantitative data of different gels comparable.
Please note that blotIt3 and blotIt2 can be used in parallel. All functions have different names, so they can not only be installed but also loaded and used simultaneously (great for double checking).
blotIt3 requires the R
packages utils, MASS, data.table, ggplot2, rootSolve
and trust
. Additionally, the package devtools
is needed to install blotIt3 from github. If not already done, the required packages can be installed by executing
install.packages(c("utils", "MASS", "data.table", "ggplot2", "rootSolve", "trust", "devtools"))
blotIt3 then is installed via devtools
:
devtools::install_github("SeverinBang/blotIt3")
First, the package is imported
library(blotIt3)
A .csv file is imported and is formatted by the function read_wide
. An example data file is supplied. It can be accessed by
example_data_path <- system.file(
"extdata", "sim_data_wide.csv",
package = "blotIt3"
)
This reads out the provided example file, transfers it to a temporary location and stores the path to this temporary location in example_data_path
.
The example file is structured as follows
time | condition | ID | pAKT | pEPOR | pJAK2 | ... |
---|---|---|---|---|---|---|
0 | 0Uml Epo | 1.1 | 116.838271399017 | 295.836863524109 | ... | |
5 | 0Uml Epo | 1.1 | 138.808500374087 | 245.229971713582 | ... | |
... | ... | ... | ... | ... | ... | ... |
0 | 0Uml Epo | 2 | 94.4670174938645 | 293.604761934545 | ... | |
5 | 0Uml Epo | 2 | 398.958892340432 | ... | ||
... | ... | ... | ... | ... | ... | ... |
The first three columns contain description data: time points, measurement conditions and IDs (e.g. the IDs of the different gels). All following columns contain the measurements of different targets, with the first row containing the names and the following the measurement values corresponding to the time, condition and ID stated in the first columns.
The information which columns contain descriptions has to be passed to read_wide
:
imported_data <- read_wide(
file = example_data_path, # path to the example file
description = seq(1,3), # Indices of columns containing the information
sep = ",", # sign seperating the colums
dec = "." # decimal sign
)
The result is then a long table of the form
time | condition | ID | name | value | |
---|---|---|---|---|---|
pAKT1 | 0 | 0Uml Epo | 1 | pAKT | 116.83827 |
pAKT2 | 5 | 0Uml Epo | 1 | pAKT | 138.80850 |
pAKT3 | 10 | 0Uml Epo | 1 | pAKT | 99.09068 |
pAKT4 | 20 | 0Uml Epo | 1 | pAKT | 106.68584 |
pAKT5 | 30 | 0Uml Epo | 1 | pAKT | 115.02805 |
pAKT6 | 60 | 0Uml Epo | 1 | pAKT | 111.91323 |
pAKT7 | 240 | 0Uml Epo | 1 | pAKT | 132.56618 |
... | ... | ... | ... | ... | ... |
While the first (nameless) columns just contains (unique) row names. New are the columns name
and value
. While the column names of the original file are pasted in the former, the latter contains the respective values.
The data.frame imported_data
can now be passed to the main function.
The full function call is
scaled_data <- align_me(
data = imported_data,
model = "yi / sj",
error_model = "value * sigmaR",
biological = yi ~ name + time + condition,
scaling = sj ~ name + ID,
error = sigmaR ~ name + 1,
parameter_fit_scale = "log",
normalize = TRUE,
average_techn_rep = FALSE,
verbose = FALSE,
normalize_input = TRUE
)
We will go now through the parameters individually:
data
A long table, usually the output ofread_wide
model
A formula like describing the model used for aligning. The present oneyi / sj
means that the measured valuesY_i
are the real valuesyi
scaled by scaling factorssj
. The model therefore is the real value divided by the corresponding scaling factor.error_model
A description of which errors affect the data. Here, only a relative error is present, where the parametersigmaR
is scaled by the respectivevalue
biological
Description of which parameter (left hand side of the tilde) represented by which columns (right hand side of the tilde) contain the "biological effects". In the present example, the model states that the real value is represented byyi
-- which is the left hand side of the presentbiological
entry. The present right hand side is "name", "time" and "condition". In short: we state that the entries "name", "time" and "condition" contain real, biological differences.scaling
Same as above, but here is defined which columns contain identificators of different scaling. Here it is "name" and "ID", meaning that measurements with differ in this effects, (but have the samebiological
effects) are scaled upon another.error
Describes how the error affects the values individually. The present formulation means, that the error parameter is not individually adjusted.parameter_fit_scale
Describes the scale on which the parameter are fitted.align_me()
accepts "linear", "log", "log2" and "log10". The default is "Linear".average_techn_rep
A logical parameter that indicates, if technical replicates should be averaged before the scaling.verbose
If set toTRUE
additional information will be printed in the console.normalize_input
If set toTRUE
, the data will be scaled before the actual scaling. This means that the raw input will be scaled to a common order of magnitude before the scaling parameters will be calculated. This is only a computational aid, to eliminate a rare fail of convergence when the different values differ by many orders of magnitude. Setting this toTRUE
makes only sense (and is only supported) forparameter_fit_scale = "linear"
.
The result of align_me()
is a list with the entries
aligned
Adata.frame
with the columns containing the biological effects as well as the columnsvalue
containing the "estimated true values" andsigma
containing the uncertainty of the fits. Both are on commonscaled
The original data but with the values scaled to common scale and errors from the evaluation of the error model, also scaled to common scale (obeying Gaussian error propagation).prediction
The scales and sigma are from the evaluation of the respective models (on original scale).original
Just the original parametersoriginal_with_parameters
As above but with additional columns for the estimated parameters.biological
Names of the columns defined to contain thebiological
effects.scaling
Names of the columns defined to contain thescaling
effects.
blotIt3
provides one plotting function plot_align_me()
which data set will be plotted can be specified per parameter
plot_align_me(
out_list = scaled_data,
plot_points = "aligned",
plot_line = "aligned",
spline = FALSE,
scales = "free",
align_zeros = TRUE,
plot_caption = TRUE,
ncol = NULL,
my_colors = NULL,
duplicate_zero_points = FALSE,
my_order = NULL
)
The parameters again are:
out_list
the result ofalign_me()
plot_points
It can separately specified which data sets should be plotted as dots and as line. Here the data set for the dots is defined. It can be either oforiginal
,scaled
,prediction
oraligned
.plot_line
Same above but for the line.spline
Logical parameter, if set toTRUE
, the line plotted will be not straight lines connecting points but a smooth spline.scales
String passed asscales
argument tofacet_wrap
.align_zeros
Logical parameter, if set toTRUE
the zero ticks will be aligned throughout all the sub plots, although the axis can have different scales.plot_caption
Logical parameter, indicating if a caption describing which data is plotted should be added to the plot.ncol
Numerical passed asncol
argument tofacet_wrap
.my_colors
list of custom color values as taken by thevalues
argument in thescale_color_manual
method forggplot
objects, if not set the defaultggplot
color scheme is used.duplicate_zero_points
Logical, if setTRUE
all zero time points are assumed to belong to the first condition. E.g. when the different conditions consist of treatments added at time zero. Default isFALSE
.my_order
Optional list of target names in the custom order that will be used for faceting...
Logical expression used for subsetting the data frames, e.g.name == "pAKT" & time < 60