Poly n Plannotate #396

Koeng101 · 2023-11-07T05:46:20Z

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262757/

I'd like to get the plannotate auto annotation suite working with poly.

Here's a link to the code: https://github.com/mmcguffi/pLannotate/tree/master

Basically, this would let us auto-annotate plasmids. A very useful task!

abondrn · 2023-11-07T19:16:17Z

Would be happy to take this on! Here's a 2 ways we could approach it:

Closely integrate with the plannotate batch CLI, which takes fasta files and produces output files (genbank or csv). This has the obvious benefit of being quicker to implement, test, and review; and because this calls out to Python via the CLI, future updates from plannotate would not have to be ported to go in order to be utilized. This is what I am leaning towards.
Faithfully port the core logic, which uses several CLI tools (blastn, diamond, infernal) to query several databases that are distributed with plannotate found here and are then aggregated with pandas. This option would result in fast annotation, and by adding local alignment search natively to poly it unlocks future functionality such as CRISPR gRNA design.

In short, both require calling out to external CLI tooling, but option 2 does not have a Python dependency but requires additional work as a result.

abondrn · 2023-11-07T19:25:38Z

One callout: plannotate is distributed under the GNU GPL v3 license. This shouldn't impact option 1, as we do not plan to distribute poly with pLannotate, but it will impact users that may want to use poly + plannotate. Option 2 may be impacted, as poly may become a derivative work, which we don't want if we want to keep using the MIT license.

Koeng101 · 2023-11-07T19:40:56Z

One callout: plannotate is distributed under the GNU GPL v3 license. This shouldn't impact option 1, as we do not plan to distribute poly with pLannotate, but it will impact users that may want to use poly + plannotate. Option 2 may be impacted, as poly may become a derivative work, which we don't want if we want to keep using the MIT license.

One bit here: DNA cannot be copyrighted. The most important thing that they've made, in my opinion, is the sweet,sweet database of part features. The raw sequences we should be able to use without infringing on any copyright. Translation to a whole new language means it probably isn't derivative work on the code-level.

I suppose I should be more specific with the desire here: I would like the abilities of plannotate, regardless of implementation. So option 2, though I don't think we have to care much about faithfully reproducing the core logic! We just need 98% matching to the full sequence - ie, table 1.

If the goal is to write no new code, you can probably just select down the possible matches using mash, then do a Needleman-Wunsch alignment using align. I've found it's really really really slow, though, but could work. There is a reason blast is a thing

The other option would be getting blast or minimap2 or the like integrated into Poly. I've been looking at doing this with biowasm, but there are some annoying points around getting that to work (been posting my work on discord). It could also be done with cgo, but again, that is also annoying.

I would love this to be done and am very willing to help!

Koeng101 · 2023-11-07T19:41:45Z

I've never really used DIAMOND, but I'd also be fine with just looking for perfect amino acid matches right now. The nucleotide matching is the important part for a version 1 IMO

TimothyStiles · 2023-11-29T19:54:49Z

I'm not sure the approach but this may be a good external thing to start?

Koeng101 added enhancement New feature or request help wanted Extra attention is needed intermediate Will take some time to fix labels Nov 7, 2023

TimothyStiles closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poly n Plannotate #396

Poly n Plannotate #396

Koeng101 commented Nov 7, 2023

abondrn commented Nov 7, 2023

abondrn commented Nov 7, 2023

Koeng101 commented Nov 7, 2023

Koeng101 commented Nov 7, 2023

TimothyStiles commented Nov 29, 2023

Poly n Plannotate #396

Poly n Plannotate #396

Comments

Koeng101 commented Nov 7, 2023

abondrn commented Nov 7, 2023

abondrn commented Nov 7, 2023

Koeng101 commented Nov 7, 2023

Koeng101 commented Nov 7, 2023

TimothyStiles commented Nov 29, 2023