-
Notifications
You must be signed in to change notification settings - Fork 2
Running SAFER
-
Make sure you have gone through the setup steps
-
Pull the latest version of SAFER from the main branch.
-
Open Terminal and type ‘R’, or open Rstudio (this will start R).
-
Build the package by typing
devtools::document(‘../GitHub/SAFER')
Note: change the filepath to your local clone of the repository
Note: if warnings suggest running an
rm()
command on a function, execute those and repeat (1)
You should see the following output:
data:image/s3,"s3://crabby-images/b30b2/b30b2cafdd60175265c9b5d44acb466349239d0c" alt="image"
- Open a previously used param.yaml file or see parameter file setup
- check to make sure
tmp.dir
is set to a location that exists or can be created (no nested folders). For example, you want to create a directory for this run in…/Documents
calledcurrent_run
. Settmp.dir
to…/Documents/current_run
. A timestamped directory will be created here for each run you do on the machine you're using.- check that the study parameters are correct
- check the following are set to full filepaths on your machine:
-
…/lib.data.RDS
this file should match your dataset spectrometer frequency as closely as possible, and can be pulled from our list of GISSMO slices here: http://ftp.ebi.ac.uk/pub/databases/metabolights/studies/mariana/gissmo_ref/ -
…/spectral.matrix.RDS
Options for this include:
-
- supply your own spectral matrix (see specifications)
- pull from MetaboLights via a study page
- use one of the pre-converted MTBLS study matrices: http://ftp.ebi.ac.uk/pub/databases/metabolights/studies/mariana/spectral_matrices/
- check the
corrpocket
params.
Note: If not enough features are found, try the following (in order):
- open up the
half.window
by 50% (from default0.03
ppm to~0.05
ppm) - raise the
noise.percentile
to ~ .99, or - lower the
r.cutoff
to as little as ~ 0.6. Ifhalf.window
is too small, peak pairs may be getting missed, but if it is opened too wide, computational demands increase and inter-peak relationships may be captured. In general, the maximum expected J-coupling observed in a multiplet should be a good starting place, and this parameter shouldn’t affect too much as this is just a seed for STORM. Thenoise.percentile
may also be too low (too strict). This should be near the top of this graph (here, set to0.95
):
data:image/s3,"s3://crabby-images/48a4a/48a4ae7b903b0deac745ffbc6d8e12df3828f1ea" alt="image"
More on this in the FSE description. As a last effort, you could lower either r.cutoff for STORM if the dataset is inherently very misaligned. Note that this will decrease the specificity of features, however, as it is a STOCSY threshold. You may need more samples to get misaligned features.
-
Run the pipeline:
pipeline(params_loc = '…/param_template.yaml')
Note: change the filepath to match the params.yaml file you modified above! This will be copied to the tmp.dir, so it can be anywhere on your machine.
You should see the pipeline log scripts begin to print, starting with the FSE module:
data:image/s3,"s3://crabby-images/c531b/c531bfda23fda346ddc52e36988d5c8086f8d401" alt="image"
That’s it! For the average 250-spectrum x 130K points dataset, about 2-10K features will be generated. Expected runtime is ~ 1h on 50 cores.