-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host read removal with Bowtie 2 #49
Conversation
…stom content file
Bowtie2 host removal is fine. |
Ok thanks, will do! |
main.nf
Outdated
zcat ${reads[0]} | echo "Read pairs before removal: \$((`wc -l`/4))" >>${name}_remove_host.log | ||
zcat ${name}_host_unmapped_1.fastq.gz | echo "Read pairs after removal: \$((`wc -l`/4))" >>${name}_remove_host.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just for debugging? I can't see where ${name}_remove_host.log
is used elsewhere.
Might be a bit of an expensive operation if so, zcat
on a big FastQ file can take quite a while. And I guess we get this info from your bowtie logs MultiQC module anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I originally copied it from the remove_phix
process, but it can be removed here.
|
||
script: | ||
def sensitivity = params.host_removal_verysensitive ? "--very-sensitive" : "--sensitive" | ||
if ( !params.single_end ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for avoiding to duplicate most of the process code (sin gle end and paired end), you can define an input parameter, like at https://github.com/nf-core/mag/blob/master/main.nf#L672
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but in this case there are multiple parts in the process code affected:
-1 "${reads[0]}" -2 "${reads[1]}"
--un-conc-gz ${name}_host_unmapped_%.fastq.gz \
--al-conc-gz ${name}_host_mapped_%.fastq.gz \
zcat ${name}_host_mapped_1.fastq.gz | awk '{if(NR%4==1) print substr(\$0, 2)}' > ${name}_host_mapped_1.read_ids.txt
zcat ${name}_host_mapped_2.fastq.gz | awk '{if(NR%4==1) print substr(\$0, 2)}' > ${name}_host_mapped_2.read_ids.txt
Maybe in this case it doesn't necessarily get cleaner if solved like this?
OK, I now added a test using the Currently this host removal only works for short reads. If |
In the MAG |
Hi @ewels or @apeltzer, could one of you tell me by any chance what the purpose of the line |
Link to line in question: Line 18 in 4c2f61c
Nothing to do with me I'm afraid. Looks like it was added by @HadrienG Phil |
(but I agree, I can't see anything obvious that it is doing, and I suspect that it can be removed from both config files) |
OK, thanks @ewels ! |
I think that's fine. Using proper settings for ONT qc (e.g. |
Ok, will change it then so that the already filtered short reads will be used to filter the long reads. |
…ost removal is run in combination with long reads.
560b7c3
to
e3db180
Compare
Hi @d4straub, thanks for your input. The following points were added/changed now:
I tested this locally for a very basic example, and with
Best, |
If I am not mistaken there are 3 tests now:
wouldn't it be good to test also
My reasoning is that there are now some channels that are not tested otherwise. And further changes breaking processes that use these channels might go undetected when the test is missing. What do you think? edit: layout |
Ok, I added a corresponding test, but it is more for channel testing, as I didn't add host reads to the long read dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks! Looks good!
I made the saving of the host read ids optional and added some sorting, since the order of the Bowtie 2 results is not reproducible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Requested changes were either addressed or reasonably dismissed
Here is a suggestion for host read removal using bowtie2. For the host reference sequence either iGenomes or a user specified Fasta reference file can be used.
I also added a MultiQC section for this to display which fraction of reads maps against the host reference and is thus filtered out using a custom content file as suggested by @ewels and @drpatelh (MultiQC/MultiQC#1199), instead of using the standard Bowtie 2 MultiQC format (with its multiple different mapping categories). Moreover, I separated the MultiQC FastQC for before and after preprocessing.
If using this bowtie2 based strategy would be fine for you, I can add a test.
PR checklist
nextflow run . -profile test,docker
).nf-core lint .
).docs
is updatedCHANGELOG.md
is updatedREADME.md
is updatedLearn more about contributing: https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md