feature: Allow to alignment between protein or nucleotide sequences #16

notalfredo · 2022-12-31T03:31:03Z

This can be done with Needleman–Wunsch algorithm. Like the title mentions its an algorithm that allowed you to align protein or nucleotide sequences. This algorithm will be in its own file to follow the standard of the project.

neoncitylights · 2022-12-31T04:01:19Z

Thanks for submitting! It's an interesting idea, and it's definitely a use case for using the Levenshtein distance algorithm. Is this algorithm purely for biology?

From the perspective of a library user (not developer), the Hamming & Levenshtein distance algorithms have a various/wide set of applications to use them in. This includes biology, but it's not solely biology. Ideally, a library should only ship what will be used. Those 2 algorithms (at least as of right now) are the main focus, but Needleman-Wunsch is biology focused.

I do like the idea though, and I think it'd make better sense if we turn this repository into a monorepo of related crates. We can do this by using a "Cargo workspace" (https://doc.rust-lang.org/book/ch14-03-cargo-workspaces.html). If you look at the GitHub repository for the serde crate, there's multiple crates in there like the main serde crate, serde_derive, and serde_derive_internals (crate purely internal for developers).

I think we can do something like this, except move it to where all crates are in a /crates directory. So, we could have like:

differ: Library for just the pure distance/similarity algorithms
needleman_wunsch: Library for the Needleman Wunsch algorithm, which can have differ as a dependency (if it needs it)

And then in the future, having a workspace would also give way for a crate like semantic_differ (example/placeholder name). I think you remember us talking about this, it would be semantic-like diffing which can compute the difference between two words in a linguistic manner. e.g "were" and "was" are technically 1/4 similar, but they're just two differences. Another is "person" and "people", which would give sort of low-ish scores, even though semantically they're similar, it just became plural.

neoncitylights · 2022-12-31T04:10:24Z

If this is something you're interested in, then we should create an issue first to setup the repository for a monorepo, and then we can create a crate for the Needleman-Wunsch algorithm.

notalfredo · 2023-01-01T20:32:12Z

As of right now there are two algorithms I would like to implement on bio_diff that being

Needleman–Wunsch algorithm
Smith–Waterman algorithm

Both have to do with aligning protien or nucleotide sequences. Each algorithm will have their own file similar to how differ is structured. I am thinking of re using the same enums and structs EXECPT I plan on implementing the memory optimizations on issue #25 from the start.

neoncitylights · 2023-01-02T01:16:23Z

I am thinking of re using the same enums and structs EXECPT I plan on implementing the memory optimizations on issue #25 from the start.

By the way, I mentioned earlier you can have a crate as a dependency for another crate :) So, you can have bio_diff depend on the differ crate. By doing this, you won't have to re-implement anything.

notalfredo · 2023-01-02T01:18:43Z

I am thinking of re using the same enums and structs EXECPT I plan on implementing the memory optimizations on issue #25 from the start.

By the way, I mentioned earlier you can have a crate as a dependency for another crate :) So, you can have bio_diff depend on the differ crate. By doing this, you won't have to re-implement anything.

If I crate depends on another crate does this have any performance downsides ? Also if bio_diff depends on differ does the user just have access to bio_diff or also differ ?

neoncitylights · 2023-01-02T01:39:07Z

If I crate depends on another crate does this have any performance downsides ?

No performance downsides here. Think of it this way; it would be a performance downside by having both libraries duplicate code if a user used both libraries, assuming bio_diff didn't depend on differ. It'd also be a burden on the software developer to maintain duplicate code.

Also if bio_diff depends on differ does the user just have access to bio_diff or also differ ?

They'd just have access to bio_diff, but the user can specify differ as an explicit dependency. Rust has a feature called dependency resolving in the situation where a project has common dependencies, to keep the binary size as small as possible, so this is not a worry. :) There's an official page on this which is a longer read, if you want to learn more about the internal details: https://doc.rust-lang.org/cargo/reference/resolver.html

fixes #16

neoncitylights · 2023-07-01T03:09:36Z

Declining for now, see #42, #43. This can be written inside a separate repository

notalfredo added lvl-1-easy Easy-ranking issue p1-low Priority 1: Generally no one plans to work on the task, but it would be nice if someone decides to. t-feature-request Type: Idea/request of an enhancement towards a library/framework labels Dec 31, 2022

notalfredo changed the title ~~feature: Allow to alignment between protein or nucleotide sequences #12~~ feature: Allow to alignment between protein or nucleotide sequences Dec 31, 2022

neoncitylights assigned notalfredo Dec 31, 2022

notalfredo added a commit that referenced this issue Jan 5, 2023

feat: bio_diff

9c673d6

fixes #16

notalfredo mentioned this issue Jan 5, 2023

feat: bio_diff #34

Closed

This was referenced Jan 5, 2023

Roadmap for differ.rs #35

Open

refactor: better architecture for differ library #36

Closed

neoncitylights closed this as not planned Won't fix, can't repro, duplicate, stale Jul 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Allow to alignment between protein or nucleotide sequences #16

feature: Allow to alignment between protein or nucleotide sequences #16

notalfredo commented Dec 31, 2022

neoncitylights commented Dec 31, 2022 •

edited

Loading

neoncitylights commented Dec 31, 2022

notalfredo commented Jan 1, 2023

neoncitylights commented Jan 2, 2023

notalfredo commented Jan 2, 2023

neoncitylights commented Jan 2, 2023 •

edited

Loading

neoncitylights commented Jul 1, 2023 •

edited

Loading

feature: Allow to alignment between protein or nucleotide sequences #16

feature: Allow to alignment between protein or nucleotide sequences #16

Comments

notalfredo commented Dec 31, 2022

neoncitylights commented Dec 31, 2022 • edited Loading

neoncitylights commented Dec 31, 2022

notalfredo commented Jan 1, 2023

neoncitylights commented Jan 2, 2023

notalfredo commented Jan 2, 2023

neoncitylights commented Jan 2, 2023 • edited Loading

neoncitylights commented Jul 1, 2023 • edited Loading

neoncitylights commented Dec 31, 2022 •

edited

Loading

neoncitylights commented Jan 2, 2023 •

edited

Loading

neoncitylights commented Jul 1, 2023 •

edited

Loading