Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map non noisy ONT #1127

Closed
Axze-rgb opened this issue Nov 8, 2023 · 21 comments
Closed

Map non noisy ONT #1127

Axze-rgb opened this issue Nov 8, 2023 · 21 comments

Comments

@Axze-rgb
Copy link

Axze-rgb commented Nov 8, 2023

Hello,

I have a question: according to Oxford nanopore their last cells produce very accurate reads. Does "map-ont" still work as the best setting to map those reads? I am asking because the manual still refers to "long noisy reads". Thanks for minimap2 and for your time.

@Axze-rgb
Copy link
Author

Sorry I hadn't seen issue

#1030 (comment)

So, I understand that Dorado is accounted for now in the map-ont settings?

Thanks for all the work you are doing.
Alex

@iiSeymour
Copy link
Contributor

@Axze-rgb dorado aligner has not yet changed any of the index settings and when we do we would like them upstreamed here.

@lh3
Copy link
Owner

lh3 commented Nov 10, 2023

For now, use map-ont. You can try -x map-hifi -w10 (HiFi scoring and k-mer length with more seeds) for Q20 reads but you need to have a way to evaluate whether that gives better results.

I hope I can find some time in the next several months to improve minimap2 a little bit. Along this I will be testing alternative scoring for v14 data.

@lh3
Copy link
Owner

lh3 commented Nov 10, 2023

@iiSeymour When you find more appropriate parameters for aligning Q20 reads, I will be happy to add a new preset for that. This will also save me some time. Thanks!

@Checunmily
Copy link

For now, use map-ont. You can try -x map-hifi -w10 (HiFi scoring and k-mer length with more seeds) for Q20 reads but you need to have a way to evaluate whether that gives better results.

I hope I can find some time in the next several months to improve minimap2 a little bit. Along this I will be testing alternative scoring for v14 data.

hello, recently I've been dealing with some R10 data and I want to know if there are any plans to make some improvements of minimap2 on ONT R10 in the next few months? Or any new suggestions for R10 data?

@iiSeymour
Copy link
Contributor

@lh3 from our internal benchmarking we find speed and downstream accuracy are maximized with -x map-ont -k19 -w 19 -U50,500 -g10k.

@Mon3trK
Copy link

Mon3trK commented Mar 5, 2024

For now, use map-ont. You can try -x map-hifi -w10 (HiFi scoring and k-mer length with more seeds) for Q20 reads but you need to have a way to evaluate whether that gives better results.

I hope I can find some time in the next several months to improve minimap2 a little bit. Along this I will be testing alternative scoring for v14 data.

Hi @lh3, accuracy of ONT sequencing has advanced a lot from duplex or R10.4 pore. I also wonder if there is any plan for setting different preset for R9 and R10 nanopore? And also different basecallers have significant impact on sequencing accuracy, it seem unappropriate to just mixed in -x map-ont.

@lh3
Copy link
Owner

lh3 commented Mar 5, 2024

from our internal benchmarking we find speed and downstream accuracy are maximized with -x map-ont -k19 -w 19 -U50,500 -g10k.

-x map-hifi is equivalent to -x map-ont -k19 -w 19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200. The main difference here is the scoring. How scoring affects the downstream tools? If the map-hifi scoring also works, I can add an alias to map-hifi, something like lr:hq.

also different basecallers have significant impact on sequencing accuracy

That is why it is more appropriate to choose a conservative setting that can give you good results on input of varying quality.

@iiSeymour
Copy link
Contributor

If the map-hifi scoring also works

Unfortunately not, the map-hifi scoring leads to both fewer mapped reads (~3%) and small regressions in SNP/INDEL calling. It's possible these regressions could be recovered from new models trained on updated scoring parameters but it seems -x map-ont -k19 -w 19 -U50,500 -g10k is the sweet spot.

lh3 added a commit that referenced this issue Mar 10, 2024
 * Added the lr:hq preset suggested by Nanopore developers (#1127)
 * Fixed transition scoring. It did not work with presets.
 * Cleaned up preset documentation
@lh3
Copy link
Owner

lh3 commented Mar 12, 2024

The next release will have a lr:hq preset for -k19 -w 19 -U50,500 -g10k.

@lh3 lh3 closed this as completed Mar 12, 2024
@bepoli
Copy link

bepoli commented Mar 13, 2024

Thanks @lh3 !
I understand that the new preset lr:hq is not meant for spliced alignment.
Should I use the existing preset splice:hq with highly accurate Nanopore cDNA reads? (with average quality >= 20)

@lh3
Copy link
Owner

lh3 commented Mar 13, 2024

Yes

@lh3
Copy link
Owner

lh3 commented Mar 13, 2024

I will hijack the thread and ask a question here: are there public Q20 cDNA-seq data? Perhaps because the SQK-PCS114 kit still at the early-access stage, most cDNA reads in papers were produced with R9 or older kits.

@dolittle007
Copy link

Hi @lh3, I have PacBio HiFi Iso-Seq data, should I use the existing preset splice:hq along with the new preset lr:hq, or I can just use -k19 -w 19 -U50,500 -g10k -xsplice -C5 -O6,24 -B4?
Thanks a lot.

@jelber2
Copy link

jelber2 commented Mar 14, 2024

The next release will have a lr:hq preset for -k19 -w 19 -U50,500 -g10k.

Shouldn't it be
-x map-ont -k19 -w 19 -U50,500 -g10k ? According to @iiSeymour

@FatYuanBao
Copy link

@iiSeymour I noticed the latest Minimap2-2.27 (r1193) includes an updated lr:hq preset. I conducted a small benchmark between this new preset and the old map-ont preset on a human R10.4.1 database using dorado 0.4.1 in HAC mode.

For -x map-ont:

19072496 + 0 mapped (99.93% : N/A)
12791592 + 0 primary mapped (99.90% : N/A)

For -x lr:hq:

18636130 + 0 mapped (99.79% : N/A)
12765068 + 0 primary mapped (99.69% : N/A)

It appears that there are fewer mapped reads (~0.14%) with the new lr:hq preset. Considering the relatively high coverage (>50X) of this data, this difference could be significant.

@lh3
Copy link
Owner

lh3 commented Mar 21, 2024

Read count-based metrics are often misleading. The difference mostly comes from short reads and low-quality reads that may interfere with analyses on the contrary. PS: also, not all reads are supposed to get mapped to a reference genome.

@dolittle007
Copy link

The next release will have a lr:hq preset for -k19 -w 19 -U50,500 -g10k.

Shouldn't it be -x map-ont -k19 -w 19 -U50,500 -g10k ? According to @iiSeymour

Thanks a lot. @jelber2 splice:hq works for RNA and lr:hq works for DNA.

preset lr:hq => -x map-ont -k19 -w 19 -U50,500 -g10k
preset splice:hq => -x splice -C5 -O6,24 -B4
preset splice => -x map-ont -k15 -w5 --splice -g2k -G200k -U10,1000000 -A1 -B2 -O2,32 -E1,0 -b0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0 --splice-flank=yes

So parameters from lr:hq and splice:hq will cause conflicts.

@camillaugolini-iit
Copy link

Hello,

@lh3 and @iiSeymour, as far as I understood, splice:hq is the best option for R10 Nanopore cDNA reads.
Would it be optimal also for the new RNA004 ?
In other words, which setting would you use to optimally align reads from the new RNA pore to a genomic and a transcriptomic reference?

Thank you for your time

@camillaugolini-iit
Copy link

Also, if provided a --junc-bed file, would this have any conflict with the splice:hq options?

@dolittle007
Copy link

@camillaugolini-iit Using the --junc-bed option, minimap2 prioritizes splicing events based on the provided annotations. It will not cause any conflict with splice:hq options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants