Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

low alignment rate with MeRanGh #4

Open
Vishvak2000 opened this issue Dec 18, 2023 · 13 comments
Open

low alignment rate with MeRanGh #4

Vishvak2000 opened this issue Dec 18, 2023 · 13 comments

Comments

@Vishvak2000
Copy link

Hello, I have been trying to use MeRanTK to align my paired end BS-RNA-Seq reads but I am getting extremely low alignment % (somewhere around ~0.05%). Here is my call:

~/software/meRanTK/meRanGh.pl align \                                                                                                                                              ─╯
            -o test/meRanGhResult \
            -f RB_WT1_S1_L001_R1_001.fastq \
            -r RB_WT1_S1_L001_R2_001.fastq \
            -t 10 \
            -S test_RNA_BSseq.sam \
            -GTF gencode.v44.annotation.gtf \
            -id transcriptome_Gh_hisat_built \
            -dt \

When I use hisat3n to align my reads, I get much better alignment ~60%. If I try to use the bam retrieved from hisat3n on meRanCall, I seem to be stuck at the processing stage:

processing 706 sequences: [KI270757.1 - 100.00%] [overall - 100.00%] done ...

@riederd
Copy link
Member

riederd commented Dec 18, 2023

Is your library reverse stranded?
You might try to reverse complement your fastq files first using for e.g. fastx_reverse_complement

@Vishvak2000
Copy link
Author

Reverse complementing works - getting much higher alignment rate. However, when I run meRanCall, I seem to be stuck at this step:

Working on: test_RNA_BSseq_sorted.bam
No region BED file specified: calling m5Cs on entire alignment file: test_RNA_BSseq_sorted.bam ...
Starting to process: 549 targets on 20 CPUs ...
processing 549 sequences: [KI270755.1 - 100.00%] [overall - 100.00%] done ...

Here is my function call:
-fasta ../../GRCh38.p14.genome.fa \ -bam test_RNA_BSseq_sorted.bam \ -o meRanCall_result.txt \ -genomeDBref \ -p 20

Using top, I can see that the processes are still running, however, it is taking a suspiciously long time - wondering if this is expected behavior.

@Vishvak2000
Copy link
Author

Hello! Just following up on this - it is still stuck on the above step.

@riederd
Copy link
Member

riederd commented Dec 19, 2023

You did not pass a GTF file to meRanCall so it might take quite long.
Was the bam test_RNA_BSseq_sorted.bam generated with meRanGs or meRanGh ? as you are passing -genomeDBref

@Vishvak2000
Copy link
Author

It was generated with meRanGh, should I be passing -tref?

@riederd
Copy link
Member

riederd commented Dec 19, 2023

No, but a GTF should speed up things

@riederd
Copy link
Member

riederd commented Dec 20, 2023

processing 549 sequences: [KI270755.1 - 100.00%] [overall - 100.00%] done ...

however, means that it is finished. Did you get any result files?

@Vishvak2000
Copy link
Author

Vishvak2000 commented Dec 20, 2023

I only get the header of the text file in my output. I also don’t get the regular summary output of # of Cs analyzed, etc

@riederd
Copy link
Member

riederd commented Dec 21, 2023

can you try to run with -debug and see what you get?

@Vishvak2000
Copy link
Author

thanks for the suggestion:

chr21:9997359:+ QB:G Cs: 10 : 37

chr21:9997360:+ QB:G Cs: 10 : 37

chr21:9997361:+ QB:T Cs: 10 : 37
processing 549 sequences: [KI270755.1 - 100.00%] [overall - 100.00%] done ...
Done...

Analyzed: test/meRanGhResult/test_RNA_BSseq_sorted.bam

Summary:
Analysis of Cs with minimum coverage of 20

Total duplicate reads filtered: 0

Total analyzed Cs on reference: 0
Total analyzed methylated Cs (<= 80% C->T conversion) on reference: 0
Total analyzed unconverted Cs from queries: 0
Total analyzed unconverted Cs from mutation: 0
Total analyzed C to T conversion rate: 0

Summary over all Cs:

Total Cs on reference covered by seq data: 578988
Total Cs from queries unconverted: 1012141
Total C to T conversion rate estimated: 0.9299

Result file: meRanCall_result.txt

The file returns empty

@riederd
Copy link
Member

riederd commented Jan 9, 2024

Would it be possible for to share the fastq file? Or let's say the first 1 M reads:

cat RB_WT1_S1_L001_R1_001.fastq | head -4000000 | gzip -c > test_R1_sub.fastq.gz
cat RB_WT1_S1_L001_R2_001.fastq | head -4000000 | gzip -c > test_R2_sub.fastq.gz

@Vishvak2000
Copy link
Author

Here are the test files. These are already reverse complemented:

https://app.box.com/folder/243307876541?s=wz09z7wcbjcm5wmidajqq79uw1sxtabv

@Vishvak2000
Copy link
Author

Hello! Just following up to see if there are any issues with the fastqs themselves. Appreciate the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants