-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements for translational efficiency analysis with anota2seq #90
Comments
In general, the module was written with structure copied over from other related modules. Not all options were 'plumbed in' to the workflow during the initial development.
Maybe file a separate feature request for this to increase workflow flexibility, useful for someone to do in a future release.
This is (or should be) passed through from the pair column in the contrast file, as per the documentation, please file a separate bug if that's not happening.
Since you say the pipeline crashes without this set to true, lets just hard code
Maybe file a separate feature request for this, useful for someone to do in a future release.
As for the pair column, this is (or should be) passed through from the pair column in the contrast file, as per the documentation, please file a separate bug if that's not happening.
|
Description of feature
Background: We used a public dataset with matched Ribo-seq and RNA-seq and tried to get the
anota2seq
step to work. [by we I mostly mean my colleague @naiarabediaga]As the workflow failed via the ribo-seq pipeline, we tried to get it to work by downloading all relevant files and run it locally. As contrast file we used:
Script: anota2seqrun.r
We noticed that several options are missing in the opt list (lines 111-140), and some of them are crucial for the pipeline to run correctly. I assume some of these can be passed via the
extra_anota2seq_run_args
parameter, but it's currently a little obscure. Maybe important ones could be exposed and/or get mentioned more explicitly?Here are the missing options Naiara identified, which have prevented the pipeline from running smoothly:
opt$gene_id_col
: This option defines therow.names
when creating thecount.table
in line 198. While I assume this option may have been carried over from previous scripts, I don't see it. As a temporary workaround, I manually defined row.names as "gene_id", but it would be better if this option were properly included.opt$samples_pairing_col
: While this doesn't seem to be a major issue (because, in its absence, the script uses the order in the sample sheet), it would still be useful to have the option to explicitly specify the column for sample pairing. If this parameter were added to the opt list, the script would be able to take sample pairing into account during processing (lines 300), which would improve the analysis pipeline.opt$subset_to_contrast_samples
: This is a real issue. This variable is set toFALSE
by default. If you don't subset both the counts table and the sample sheet to include only the samples involved in the contrast, the pipeline crashes. The subsetting is already there (lines 264-269), but since the variable is set to FALSE the condition will never get executed. Since we can currently only run a single contrast, but will likely have merged salmon matrices, the pipeline will currently crashes if there were additional samples in the run. Exposing this value as a boolean switch (maybe with the default beingTRUE
) should solve this issue.opt$exclude_samples_col
: This option is meant to remove samples with specified values in a given field (largely complementary to the outcome of 3). The comments say "probably don't use this (4.) as well as the above (3.)),". Exposing this variable would allow excluding samples, e.g. if QC steps indicate failure.opt$samples_batch_col
: This option is set toNULL
, and thus seems to be missing from the script. Likeopt$samples_pairing_col
, it hasn't caused the script to crash, but it would be beneficial to have the ability to define a batch column for batch effect correction or other related purposes. The conditional for this option is mentioned in lines 321-323.Most of these changes don't appear to be major, but currently prevent the
ANOTA2SEQ
process to complete successfully with real world data.Again, many thanks!
The text was updated successfully, but these errors were encountered: