plans for sourmash 4.0 #835

ctb · 2020-01-08T14:14:51Z

Excerpted from #762, now that 3.0 is out --

thoughts for 4.0 include,

follow @standage hints in Implement improved & consistent argument parsing #785 and deprecate some subcommands
consider making --scaled the default instead of --num-hashes (this is controversial tho :)
would like to do a better job of simulations and theory before we release sourmash 4.0, or at least get it on the radar. We need to start understanding (and explicating) where the basic scaled approach is good and not so good.

The text was updated successfully, but these errors were encountered:

standage · 2020-01-08T14:41:35Z

Drop Python 2.7 support?

luizirber · 2020-01-08T15:40:56Z

There are some leftovers in https://github.com/dib-lab/sourmash/projects/2 for sourmash 3.0, should we create another project and track 4.0 (or just use issues/labels, since the projects are not being used anyway? 😬 )

satishv · 2020-01-10T05:25:01Z

We love it, if you guys can improve gather's performance.

here are some stats... looks like v3 may be slower than v2?

Should we file a task? we are happy to help. perhaps in v4? Is a smaller release planned on top of v3 codebase anytime soon?

thanks

sourmash compute -k 31 --scaled 5000 -o testv2.sig test.01M_R1_L001.fastq.gz
0m17.907s

sourmash gather -k 31 --scaled 5000 -o gatherv2 testv2.sig db/ecolidb.sbt.json
0m1.896s

sourmash v3.0.1
sourmash compute -k 31 --scaled 5000 -o testv3.sig test.01M_R1_L001.fastq.gz
0m42.243s

sourmash gather -k 31 --scaled 5000 -o gatherv3 testv3.sig db/ecolidb.sbt.json
0m4.429s```

ctb · 2020-01-11T13:19:37Z

On Wed, Jan 08, 2020 at 07:40:57AM -0800, Luiz Irber wrote: There are some leftovers in https://github.com/dib-lab/sourmash/projects/2 for sourmash 3.0, should we create another project and track 4.0 (or just use issues/labels, since the projects are not being used anyway? 😬 )

oh, I'm fine with using projects and can do that for 4.0. but sometimes discussions are also good :)

ctb · 2020-01-13T15:16:03Z

regarding versions, optimizations and speed, per @satishv comment -- a collection of random thoughts.

3.x should be compatible with 2.x in terms of databases and core functionality, although we may be adding new command line flags. So you can always use 2.x for now!

In terms of optimization, my personal perspective (not necessarily shared by others :) is that functionality & correctness, maintainability, user experience, and memory usage come before speed. There are no hard and fast rules here, of course, but we have finite attentional resources and have to prioritize somehow.

That having been said, we are always happy to take contributions. The move to rust in 3.0 is opening up a lot of potential optimizations, since rust is (among other things) threadsafe and robust, and we would be happy to receive PRs for specific optimizations. We are also enthusiastic about benchmarking that highlights problem areas, because more information is always better - so thanks, Satish!

ctb · 2021-01-10T15:17:02Z

closing, since I think this has mostly been addressed!

also see #1016

luizirber mentioned this issue Jan 11, 2020

Gather performance improvements discussion #838

Closed

luizirber added the 4.0 issues to address for a 4.0 release label Jan 14, 2020

luizirber added this to the 4.0 milestone Jan 14, 2020

ctb closed this as completed Jan 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plans for sourmash 4.0 #835

plans for sourmash 4.0 #835

ctb commented Jan 8, 2020

standage commented Jan 8, 2020

luizirber commented Jan 8, 2020

satishv commented Jan 10, 2020

ctb commented Jan 11, 2020 via email

ctb commented Jan 13, 2020

ctb commented Jan 10, 2021

plans for sourmash 4.0 #835

plans for sourmash 4.0 #835

Comments

ctb commented Jan 8, 2020

standage commented Jan 8, 2020

luizirber commented Jan 8, 2020

satishv commented Jan 10, 2020

ctb commented Jan 11, 2020 via email

ctb commented Jan 13, 2020

ctb commented Jan 10, 2021