Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allow longer minimum overlaps than 10kb. #736

Closed
JohnUrban opened this issue Oct 18, 2024 · 5 comments
Closed

Feature Request: Allow longer minimum overlaps than 10kb. #736

JohnUrban opened this issue Oct 18, 2024 · 5 comments

Comments

@JohnUrban
Copy link

JohnUrban commented Oct 18, 2024

The title says it all.

I have high accuracy nanopore data with read N50 of >100 kb (representing >30-50X coverage) and I would like to try minimum overlaps of 25kb and 50 kb, but get this error:

flye: error: argument -m/--min-overlap: value should be in the range [1000, 10000]

It looks like this could be as simple as changing the min and max values in the argument parser around line 624 in flye/main.py:

    623     parser.add_argument("-m", "--min-overlap", dest="min_overlap", metavar="int",
    624                         type=lambda v: check_int_range(v, 1000, 10000),
    625                         default=None, help="minimum overlap between reads [auto]")

...but I don't know how that will affect anything downstream that may assume a max of 10 kb.....

If there is a reason longer overlaps are not allowed, please let me know.

Many thanks.

(p.s. I will try messing around with the arg parser in the mean time)

@JohnUrban
Copy link
Author

I can say that, since this feature request, I made the adjustment that I suggested, and in some cases, allowing 25kb, 50kb, and/or 75kb overlaps lead to higher contiguity for ONT-UL asseblies -- and 15kb-20kb for HiFi.
(I cannot tell you if the extra contiguity was accurate or not though.)

@mikolmogorov
Copy link
Owner

In principle, it should be possible to increase, but this will require extensive testing. Is there evidence that you are getting better assemblies with increased minimum overlap?

@JohnUrban
Copy link
Author

When the coverage is high enough and the reads are long enough, I did see contiguity increase.

As for other metrics, if you don't mind waiting, I will report back anything I learn about them in the coming month or two.

As you know better than anyone, Flye sets an overlap length (limited to 10 kb the longest) based on read N50 seemingly w/o considering the amount of coverage. So it sets the same overlap for 30X coverage as for 300X coverage.

I have 120X ultra-long nanopore and 600X HiFi, so I wanted to test the longer overlap cutoffs since I technically have far more coverage than needed for a great Flye assembly.

@mikolmogorov
Copy link
Owner

Thanks (and sorry for the late response)! Sure, would be happy to look at the stats whenever you have those.
The key question is as follows, let's say you are increasing the overlap from 10k to X, how many repeats in the genome have the length between 10k and X. The downside of increasing overlap length is potentially introducing gaps during disjointig generation, or adding potential misassemblies due to alignment artifacts.

@mikolmogorov
Copy link
Owner

Assuming this is resolved, feel free to follow up if you have more questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants