Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plans for sourmash 4.0 #835

Closed
ctb opened this issue Jan 8, 2020 · 6 comments
Closed

plans for sourmash 4.0 #835

ctb opened this issue Jan 8, 2020 · 6 comments
Labels
4.0 issues to address for a 4.0 release
Milestone

Comments

@ctb
Copy link
Contributor

ctb commented Jan 8, 2020

Excerpted from #762, now that 3.0 is out --

thoughts for 4.0 include,

  • follow @standage hints in Implement improved & consistent argument parsing #785 and deprecate some subcommands
  • consider making --scaled the default instead of --num-hashes (this is controversial tho :)
  • would like to do a better job of simulations and theory before we release sourmash 4.0, or at least get it on the radar. We need to start understanding (and explicating) where the basic scaled approach is good and not so good.
@standage
Copy link
Contributor

standage commented Jan 8, 2020

Drop Python 2.7 support?

@luizirber
Copy link
Member

There are some leftovers in https://github.com/dib-lab/sourmash/projects/2 for sourmash 3.0, should we create another project and track 4.0 (or just use issues/labels, since the projects are not being used anyway? 😬 )

@satishv
Copy link

satishv commented Jan 10, 2020

We love it, if you guys can improve gather's performance.

here are some stats... looks like v3 may be slower than v2?

Should we file a task? we are happy to help. perhaps in v4? Is a smaller release planned on top of v3 codebase anytime soon?

thanks

sourmash compute -k 31 --scaled 5000 -o testv2.sig test.01M_R1_L001.fastq.gz
0m17.907s

sourmash gather -k 31 --scaled 5000 -o gatherv2 testv2.sig db/ecolidb.sbt.json
0m1.896s

sourmash v3.0.1
sourmash compute -k 31 --scaled 5000 -o testv3.sig test.01M_R1_L001.fastq.gz
0m42.243s

sourmash gather -k 31 --scaled 5000 -o gatherv3 testv3.sig db/ecolidb.sbt.json
0m4.429s```

@ctb
Copy link
Contributor Author

ctb commented Jan 11, 2020 via email

@ctb
Copy link
Contributor Author

ctb commented Jan 13, 2020

regarding versions, optimizations and speed, per @satishv comment -- a collection of random thoughts.

3.x should be compatible with 2.x in terms of databases and core functionality, although we may be adding new command line flags. So you can always use 2.x for now!

In terms of optimization, my personal perspective (not necessarily shared by others :) is that functionality & correctness, maintainability, user experience, and memory usage come before speed. There are no hard and fast rules here, of course, but we have finite attentional resources and have to prioritize somehow.

That having been said, we are always happy to take contributions. The move to rust in 3.0 is opening up a lot of potential optimizations, since rust is (among other things) threadsafe and robust, and we would be happy to receive PRs for specific optimizations. We are also enthusiastic about benchmarking that highlights problem areas, because more information is always better - so thanks, Satish!

@luizirber luizirber added the 4.0 issues to address for a 4.0 release label Jan 14, 2020
@luizirber luizirber added this to the 4.0 milestone Jan 14, 2020
@ctb
Copy link
Contributor Author

ctb commented Jan 10, 2021

closing, since I think this has mostly been addressed!

also see #1016

@ctb ctb closed this as completed Jan 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.0 issues to address for a 4.0 release
Projects
None yet
Development

No branches or pull requests

4 participants