-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] Add UMI Handling to the pipeline #164
Conversation
Reverting changes to a non-linted version and added the umitools modules.
Added the umitools workflow and integrated it into the smrnaseq workflow
Add additional documentation to use UMI tools as part of the pipeline. Most of the documentation has been copied from nf-core/rnaseq.
The bam2fq module is neccessary to convert the deduplicated bam files back into a fastq format to be fed into the existing pipeline.
Added the umitools extract modules.config lines from nf-core/rnaseq to this pipeline.
Added configurations for umi deduplication.
Initial comit of the umi dedup subworkflow. The workflow combines already existing modules of the pipeline and nf-core module to deduplicate the reads by mapping them to the species genome and re-converting them to fastq after deduplication.
includes the optional umitools deduplication step after the read QC.
Added additional configuration to change the output file name of samtools sort.
Added the documentation detailing the output files of the UMI-tools deduplication step.
After deduplication the reads that remained unaligned to the provided reference genome are merged with the set of deduplicated reads to enable the use of the full spectrum of reads, independent of potential reference bias. This behaviour can be deactivated by setting --umi_merge_unmapped false
Information on the new --umi_merge_unmapped command were added to both the CHANGELOG, as well as the output markdown script.
@apeltzer there is still a problem with nf-core lint and prettier. The |
Can you resolve the conflicts with |
|
Done! Fixed the prettier vs listing issue as well. |
Hi, has somebody run this in a real dataset? at least without UMI to make sure you get the same results? I don't quite follow whether the trimming will be exactly the same. Where is the params.protocol variable sync with the parameters for nf-core/trimgalore now? Before was in the local/trimgalore. I can try to run this in some of our samples next week to check we get the same results if we don't use the UMI option. |
@apeltzer I'm currently working with miRNA-seq data using UMIs and would love for this feature to get merged into dev -- is there anything I can use to vet this functionality on the datasets I'm using? |
Hi, I think this feature is really needed and useful. I just want to reactivate this thread again. I will run a test in the coming week. |
You can run using the |
Hi @apeltzer Thanks. But I cannot run it as you suggested.
And the error message I got:
Any idea? We used QIAseq™ miRNA Library QC Spike-Ins. |
Should it be the below?
|
Thanks, @sean-at-tessera it works now. I will report after it is done. |
Yes sorry, mistakenly thought the branch is already here in smrnaseq. |
@chaochungkuo I also tried to look at resolving the merge conflicts, but I think it would take your insight to do quickly. It looks like enough has changed since you implemented |
@chaochungkuo can you document what |
I am not the one who implements this. It was done by @CKComputomics. I also have single-end reads with QIAseq™ miRNA Library QC Spike-Ins. These parameters are specific for this kit:
I got my results but it seems like didn't go through to the end. The processes of umitools takes too much memory and time. The error messages are:
I am not sure the root of this issue. However, I will increase the limit and run it again. Any advice is appreciated. |
Hi, the --bc-pattern error seems to originate from UMItools directly. The corresponding parameter in the nextflow run would be I hope this helps you to solve the issue. If not I will do some digging and see if I can figure out what is going wrong. @chaochungkuo using more memory sounds like a reasonable option. I have only ever tested the pipeline with small datasets and thus have no idea how this scales on full sets. |
When I run umitools directly outside of nfcore/smrnaseq, I use the following command:
I works fine and I got the trimmed FASTQs I want. When I pass these parameters into this branch now, I modified them as:
I thought the name of the parameter is changed from |
@CKComputomics I have the same problem as @chaochungkuo; specifying Looking at the file for the extract command, I'm guessing |
@sean-at-tessera
However, I still get error message as
No idea yet... |
Here is the exact error I received. Could someone help me to diagnose? Thanks.
|
@chaochungkuo exit status 137 generally indicates a memory error. Could you allocate more memory and try again? |
I was able to almost run the The pipeline only crashed on one stage in the edgeR step. This is because this PR doesn't include the fix incorporated in this pull request. @CKComputomics , could you please resolve the merge conflicts with |
Hi guys, just want to activate this thread again, wondering when the UMI handling feature will be added to the repo? |
Ok, will give this a go now that more people requrested it. My hope was that someone is quicker at this but that seems not to be the case ;-) |
I will pull in upstream changes, then try to resolve conflicts and merge it |
Adds the option to use UMIs directly in the pipeline. This can be activated by setting
--with_umi
For the extraction step the nf-core sub workflow has been imported. The deduplication step had to be implemented in a new subworkflow. It utilizes the existing bowtie modules to map the reads to a reference genome and deduplicates based on this mapping. The deduplicated reads are merged with the unmapped reads into one file. This behavior can be deactivated by setting
--umi_merge_unmapped false
.Using UMIs can result in fast files with very little reads. To few reads can result in a fail of mirtop. this needs to be considered when using this feature.
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).