-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nanocompore SampComp stuck at 0% #222
Comments
Hi @sidizhao, It looks like you ran nanocompore on a genomic reference instead of a transcriptomic reference. Sometimes Nanocompore can stall when the reference sequences are super long (greater than 50kb), and this is likely the reason that you're experiencing a long execution time. You can either kill the process and start it again and see if it gets through the stall that way (this sometimes works and we don't know why), or start the whole pipeline over again aligning to a transcriptome reference fasta. Given you have all the data together, it might be worth simply restarting it and seeing if that works, but I suspect that redoing the pipeline with a transcriptome reference is better. If you provide SampComp a bed file, it will do an internal liftover from transcriptome reference coordinates to genome reference coordinates. I hope this helps, |
Hi,
Thank you for the prompt response. By transcriptomic reference, do you mean
only the exonic regions of the fasta file? Or if I were to provide a bed
file, what should the bed file contain? Just trying to clarify.
…On Tue, Jul 11, 2023 at 04:48 lmulroney ***@***.***> wrote:
Hi @sidizhao <https://github.com/sidizhao>,
It looks like you ran nanocompore on a genomic reference instead of a
transcriptomic reference. Sometimes Nanocompore can stall when the
reference sequences are super long (greater than 50kb), and this is likely
the reason that you're experiencing a long execution time. You can either
kill the process and start it again and see if it gets through the stall
that way (this sometimes works and we don't know why), or start the whole
pipeline over again aligning to a transcriptome reference fasta. Given you
have all the data together, it might be worth simply restarting it and
seeing if that works, but I suspect that redoing the pipeline with a
transcriptome reference is better. If you provide SampComp a bed file, it
will do an internal liftover from transcriptome reference coordinates to
genome reference coordinates.
I hope this helps,
Logan
—
Reply to this email directly, view it on GitHub
<#222 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKH54EOAWW3LSH6CN3NBS6LXPUOO5ANCNFSM6AAAAAA2EZ4I7I>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi @sidizhao, Yes, by transcriptomic reference I mean a reference fasta file of each contiguous transcript isoform with no introns present, and one reference sequence per isoform. Alternatively, you can create a reference transcriptome fasta from the reference genome and a gtf file using bedtools get fasta. And by bed file, I mean a bed file that matches the transcriptome reference fasta file in genomic coordinates. So you can use something like bedparse (https://github.com/tleonardi/bedparse) to convert a gtf file to bed12 format. You can find the gencode reference gtf file on the gencode home page. Does this make sense? |
Yes. Thank you. Would it work if I keep the current genomic fasta file but add a bed12 file of only the transcripts? Or do I necessarily need to download the transcripts only fasta? |
You essentially need to start over from the minimap2 step using the transcriptome reference fasta instead of the genome reference fasta. This will require that you redo eventalign and eventalign collapse as well from this new bam file. Importantly, you do not want to align in splice aware mode when using a transcriptome reference fasta. You can find more detailed instructions here ( https://doi.org/10.1002/cpz1.683) or here (https://nanocompore.rna.rocks/) if you want a more in depth breakdown of the steps. Let me know if you have more questions. Logan |
Oh wow, I see. That is going to take a while since these Direct RNA-seq files take a long time on nanopolish. I will come back with more questions if it still doesn't work. Thank you so much. |
You can try using f5c instead of nanopolish. It is a c implementation of nanopolish and is roughly 10 times faster. There are a few flags you need to use that are unique to f5c that are not used by nanopolish. The protocol paper I posted earlier goes through all the necessary differences using f5c compared to nanopolish if you decide to give it a try. Briefly, you need to use --rna --min-mapq=0 --secondary=yes in addition to all the normal nanopolish commands But I'm doing this from memory, so double check the help messages to make sure I have the spelling correct!!! Logan |
Thank you so much! We’ve been using slow5tools to process the fast5 files
first before nanopolish, and it’s been decently fast.
…On Tue, Jul 11, 2023 at 13:12 lmulroney ***@***.***> wrote:
You can try using f5c instead of nanopolish. It is a c implementation of
nanopolish and is roughly 10 times faster. There are a few flags you need
to use that are unique to f5c that are not used by nanopolish. The protocol
paper I posted earlier goes through all the necessary differences using f5c
compared to nanopolish if you decide to give it a try.
Briefly, you need to use --rna --min-mapq=0 --secondary=yes in addition to
all the normal nanopolish commands
But I'm doing this from memory, so double check the help messages to make
sure I have the spelling correct!!!
Logan
—
Reply to this email directly, view it on GitHub
<#222 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKH54EJK2IKBOTGTNDR5B5TXPWJQ3ANCNFSM6AAAAAA2EZ4I7I>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Describe the bug
Hi, I've been trying to run SampComp on 6 samples of ONT Direct RNA-seq on a METTL3 KD cell line for some time, and have yet to get past the "parse transcript" step. I have 512G of RAM requested to run this and it just gets stuck for multiple days. Here's the log:
To Reproduce
I ran a bash script based on our linux computing cluster using the docker image
quay.io/biocontainers/nanocompore:1.0.4--pyhdfd78af_0
. Here's the commad:Would you be able to help me? Thank you.
The text was updated successfully, but these errors were encountered: