Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] prevent collisions in signature saving even when there are duplicate md5sums. #1497

Merged
merged 1 commit into from
May 5, 2021

Conversation

ctb
Copy link
Contributor

@ctb ctb commented May 4, 2021

PR into #1493.

This PR generates new filenames even when md5sums are identical, thus saving each added signature without ever overwriting old ones.

@ctb ctb changed the base branch from latest to add/save_signatures_to_loc May 4, 2021 23:13
@ctb
Copy link
Contributor Author

ctb commented May 4, 2021

@bluegenes this should be ready for review (and merge) once tests have passed.

@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #1497 (13a79bb) into add/save_signatures_to_loc (833645b) will decrease coverage by 0.01%.
The diff coverage is 96.36%.

Impacted file tree graph

@@                      Coverage Diff                       @@
##           add/save_signatures_to_loc    #1497      +/-   ##
==============================================================
- Coverage                       94.94%   94.93%   -0.02%     
==============================================================
  Files                              97       97              
  Lines                           16197    16266      +69     
  Branches                         1510     1515       +5     
==============================================================
+ Hits                            15379    15442      +63     
- Misses                            591      595       +4     
- Partials                          227      229       +2     
Flag Coverage Δ
python 94.93% <96.36%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/sourmash_args.py 94.37% <90.00%> (-0.23%) ⬇️
tests/test_sourmash_args.py 100.00% <100.00%> (ø)
src/sourmash/lca/lca_db.py 91.22% <0.00%> (-1.09%) ⬇️
tests/test_lca.py 99.87% <0.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 833645b...13a79bb. Read the comment docs.

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

hm, your testing made me realize that this has an interesting (and perhaps undesirable) consequence -- true duplicated sigs will not be caught/will be treated separately. This doesn't seem like too much of an issue? If we have a true duplicated sig in a database, it would return identical results and gather would randomly choose one to return, right?

Presumably (eventually), if we're selecting by name, we can choose to select just one of the duplicated sigs, if all metadata match?

@ctb
Copy link
Contributor Author

ctb commented May 5, 2021

lgtm!

hm, your testing made me realize that this has an interesting (and perhaps undesirable) consequence -- true duplicated sigs will not be caught/will be treated separately. This doesn't seem like too much of an issue? If we have a true duplicated sig in a database, it would return identical results and gather would randomly choose one to return, right?

Presumably (eventually), if we're selecting by name, we can choose to select just one of the duplicated sigs, if all metadata match?

you put into words one of my uneasy concerns - thanks! punting to issue for further discussion. #1501

@ctb ctb merged commit b4cdbe8 into add/save_signatures_to_loc May 5, 2021
@ctb ctb deleted the add/save_signatures_to_loc_dup branch May 5, 2021 17:06
ctb added a commit that referenced this pull request May 5, 2021
…ariety of formats (#1493)

* copy over SaveSignaturesToLocation code from other branch

* minor edits

* add repr; add tests; support stdout

* refactor signature saving to use new sourmash_args collection saving

* specify utf-8 encoding for output

* add flexible output to compute/sketch

* add test to trigger rust panic

* test search --save-matches

* 'fix' issue and test

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update src/sourmash/cli/sig/rename.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* do not overwrite signature even if duplicate md5sum (#1497)

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
luizirber added a commit that referenced this pull request May 10, 2021
…etch` functionality. (#1370)

* more refactor - filename stuff

* add 'location' to SBT objects

* finish removing filename

* fix prefetch after merging in #1373

* implement a CounterGatherIndex

* remove sort

* update counter logic to remove proper intersection

* make 'find' a generator

* remove comment

* begin refactoring 'categorize'

* have the 'find' function for SBTs return signatures

* fix majority of tests

* comment & then fix test

* torture the tests into working

* split find and _find_nodes to take different kinds of functions

* redo 'find' on index

* refactor lca_db to use new find

* refactor SBT to use new find

* comment/cleanup

* refactor out common code

* fix up gather

* use 'passes' properly

* attempted cleanup

* minor fixes

* get a start on correct downsampling

* adjust tree downsampling for regular minhashes, too

* remove now-unused search functions in sbtmh

* refactor categorize to use new find

* cleanup and removal

* remove redundant code in lca_db

* remove redundant code in SBT

* add notes

* remove more unused code

* refactor most of the test_sbt tests

* fix one minor issue

* fix jaccard calculation in sbt

* check for compatibility of search fn and query signature

* switch tests over to jaccard similarity, not containment

* fix test

* remove test for unimplemented LCA_Database.find method

* document threshold change; update test

* refuse to run abund signatures

* flatten sigs internally for gather

* reinflate abundances for saving

* fix problem where sbt indices coudl be created with abund signatures

* more

* split flat and abund search

* make ignore_abundance work again for categorize

* turn off best-only, since it triggers on self-hits.

* add test: 'sourmash index' flattens sigs

* add note about something to test

* fix typo; still broken tho

* location is now a property

* move search code into search.py

* remove redundant scaled checking code

* best-only now works properly for two tests

* 'fix' tests by removing v1 and v2 SBT compatibility

* simplify (?) downsampling code

* require keyword args in MinHash.downsample(...)

* fix bug with downsample

* require keyword args in MinHash.downsample(...)

* fix test to use proper downsampling, reverse order to match scaled

* add test for revealed bug

* remove unnecessary comment

* flatten subject MinHash, too

* add testme comment

* clean up sbt find

* clean up lca find

* add IndexSearchResult namedtuple for search and gather results

* add more tests for Index classes

* add tests for subj & query num downsampling

* tests for Index.search_abund

* refactor a bit

* refactor make_jaccard_search_query; start tests

* even more tests

* test collect, best_only

* more search tests

* remove unnec space

* add minor comment

* deal with status == None on SystemExit

* upgrade and simplify categorize

* restore test

* merge

* fix abundance search in SBT for categorize

* code cleanup and refactoring; check for proper error messages

* add explicit test for incompatible num

* refactor MinHash.downsample

* deal with status == None on SystemExit

* fix test

* fix comment mispelling

* properly pass kwargs; fix search_sbt_index

* add simple tests for SBT load and search API

* allow arbitrary kwargs for LCA_DAtabase.find

* add testing of passthru-kwargs

* re-enable test

* add notes to update docstrings

* docstring updates

* fix test

* fix location reporting in prefetch

* fix prefetch location by fixing MultiIndex

* temporary prefetch_gather intervention

* 'gather' only returns best match

* turn prefetch on by default, for now

* better tests for gather --save-unassigned

* remove unused print

* remove unnecessary check-me comment

* clear out docstring

* SBT search doesn't work on v1 and v2 SBTs b/c no min_n_below

* start adding tests

* test some basic prefetch stuff

* update index for prefetch

* add fairly thorough tests

* fix my dumb mistake with gather

* simplify, refactor, fix

* fix remaining tests

* propogate ValueErrors better

* fix tests

* flatten prefetch queries

* fix for genome-grist alpha test

* fix threshold bugarooni

* fix gather/prefetch interactions

* fix sourmash prefetch return value

* minor fixes

* pay proper attention to threshold

* cleanup and refactoring

* remove unnecessary 'scaled'

* minor cleanup

* added LazyLinearLindex and prefetch --linear

* fix abundance problem

* save matches to a directory

* test for saving matches to a directory

* add a flexible progressive signature output class

* add tests for .sig.gz and .zip outputs

* update save_signatures code; add tests; use in gather and search too

* update comment

* cleanup and refactor of SaveSignaturesToLocation code

* docstrings & cleanup

* add 'run' and 'runtmp' test fixtures

* remove unnecessary track_abundance fixture call

* restore original;

* linear and prefetch fixtures + runtmp

* fix use of runtmp

* copy over SaveSignaturesToLocation code from other branch

* docs for sourmash prefetch

* more doc

* minor edits

* Re-implement the actual gather protocol with a cleaner interface. (#1489)

* initial refactor of CounterGather stuff

* refactor into peek and consume

* move next method over to query specific class

* replace gather implementation with new CounterGather

* many more tests for CounterGather

* remove scaled arg from peek

* open-box test for counter internal data structures

* add num query & subj tests

* add repr; add tests; support stdout

* refactor signature saving to use new sourmash_args collection saving

* specify utf-8 encoding for output

* add flexible output to compute/sketch

* add test to trigger rust panic

* test search --save-matches

* add --save-prefetch to sourmash gather

* remove --no-prefetch option :)

* added --save-prefetch functionality

* add back a mostly-functioning --no-prefetch argument :)

* add --no-prefetch back in

* check for JSON in first byte of LCA DB file

* start adding linear tests

* use fixtures to test prefetch and linear more thoroughly

* comments, etc

* upgrade docs for --linear and --prefetch

* 'fix' issue and test

* fix a last test ;)

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update src/sourmash/cli/sig/rename.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* write tests for LazyLinearIndex

* add some basic prefetch tests

* properly test linear!

* add more tests for LazyLinearIndex

* test zipfile bool

* remove unnecessary try/except; comment

* fix signatures() call

* fix --prefetch snafu; doc

* do not overwrite signature even if duplicate md5sum (#1497)

* try adding loc to return values from Index.find

* made use of new IndexSearchResult.find throughout

* adjust note

* provide signatures_with_location on all Index objects

* cleanup and fix

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* fix bug around --save-prefetch with multiple databases

* comment/doc minor updates

Co-authored-by: Luiz Irber <luizirber@users.noreply.github.com>
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
luizirber added a commit that referenced this pull request May 12, 2021
…thon set API. (#1512)

* make 'find' a generator
* remove comment

* begin refactoring 'categorize'

* have the 'find' function for SBTs return signatures

* fix majority of tests

* comment & then fix test

* torture the tests into working

* split find and _find_nodes to take different kinds of functions

* redo 'find' on index

* refactor lca_db to use new find

* refactor SBT to use new find

* comment/cleanup

* refactor out common code

* fix up gather

* use 'passes' properly

* attempted cleanup

* minor fixes

* get a start on correct downsampling

* adjust tree downsampling for regular minhashes, too

* remove now-unused search functions in sbtmh

* refactor categorize to use new find

* cleanup and removal

* remove redundant code in lca_db

* remove redundant code in SBT

* add notes

* remove more unused code

* refactor most of the test_sbt tests

* fix one minor issue

* fix jaccard calculation in sbt

* check for compatibility of search fn and query signature

* switch tests over to jaccard similarity, not containment

* fix test

* remove test for unimplemented LCA_Database.find method

* document threshold change; update test

* refuse to run abund signatures

* flatten sigs internally for gather

* reinflate abundances for saving

* fix problem where sbt indices coudl be created with abund signatures

* more

* split flat and abund search

* make ignore_abundance work again for categorize

* turn off best-only, since it triggers on self-hits.

* add test: 'sourmash index' flattens sigs

* add note about something to test

* fix typo; still broken tho

* location is now a property

* move search code into search.py

* remove redundant scaled checking code

* best-only now works properly for two tests

* 'fix' tests by removing v1 and v2 SBT compatibility

* simplify (?) downsampling code

* require keyword args in MinHash.downsample(...)

* fix bug with downsample

* require keyword args in MinHash.downsample(...)

* fix test to use proper downsampling, reverse order to match scaled

* add test for revealed bug

* remove unnecessary comment

* flatten subject MinHash, too

* add testme comment

* clean up sbt find

* clean up lca find

* add IndexSearchResult namedtuple for search and gather results

* add more tests for Index classes

* add tests for subj & query num downsampling

* tests for Index.search_abund

* refactor a bit

* refactor make_jaccard_search_query; start tests

* even more tests

* test collect, best_only

* more search tests

* remove unnec space

* add minor comment

* deal with status == None on SystemExit

* upgrade and simplify categorize

* restore test

* merge

* fix abundance search in SBT for categorize

* code cleanup and refactoring; check for proper error messages

* add explicit test for incompatible num

* refactor MinHash.downsample

* deal with status == None on SystemExit

* fix test

* fix comment mispelling

* properly pass kwargs; fix search_sbt_index

* add simple tests for SBT load and search API

* allow arbitrary kwargs for LCA_DAtabase.find

* add testing of passthru-kwargs

* re-enable test

* add notes to update docstrings

* docstring updates

* fix test

* fix location reporting in prefetch

* fix prefetch location by fixing MultiIndex

* temporary prefetch_gather intervention

* 'gather' only returns best match

* turn prefetch on by default, for now

* better tests for gather --save-unassigned

* remove unused print

* remove unnecessary check-me comment

* clear out docstring

* SBT search doesn't work on v1 and v2 SBTs b/c no min_n_below

* start adding tests

* test some basic prefetch stuff

* update index for prefetch

* add fairly thorough tests

* fix my dumb mistake with gather

* simplify, refactor, fix

* fix remaining tests

* propogate ValueErrors better

* fix tests

* flatten prefetch queries

* fix for genome-grist alpha test

* fix threshold bugarooni

* fix gather/prefetch interactions

* fix sourmash prefetch return value

* minor fixes

* pay proper attention to threshold

* cleanup and refactoring

* remove unnecessary 'scaled'

* minor cleanup

* added LazyLinearLindex and prefetch --linear

* fix abundance problem

* save matches to a directory

* test for saving matches to a directory

* add a flexible progressive signature output class

* add tests for .sig.gz and .zip outputs

* update save_signatures code; add tests; use in gather and search too

* update comment

* cleanup and refactor of SaveSignaturesToLocation code

* docstrings & cleanup

* add 'run' and 'runtmp' test fixtures

* remove unnecessary track_abundance fixture call

* restore original;

* linear and prefetch fixtures + runtmp

* fix use of runtmp

* copy over SaveSignaturesToLocation code from other branch

* docs for sourmash prefetch

* more doc

* minor edits

* Re-implement the actual gather protocol with a cleaner interface. (#1489)

* initial refactor of CounterGather stuff

* refactor into peek and consume

* move next method over to query specific class

* replace gather implementation with new CounterGather

* many more tests for CounterGather

* remove scaled arg from peek

* open-box test for counter internal data structures

* add num query & subj tests

* add repr; add tests; support stdout

* refactor signature saving to use new sourmash_args collection saving

* specify utf-8 encoding for output

* add flexible output to compute/sketch

* add test to trigger rust panic

* test search --save-matches

* add --save-prefetch to sourmash gather

* remove --no-prefetch option :)

* added --save-prefetch functionality

* add back a mostly-functioning --no-prefetch argument :)

* add --no-prefetch back in

* check for JSON in first byte of LCA DB file

* start adding linear tests

* use fixtures to test prefetch and linear more thoroughly

* comments, etc

* upgrade docs for --linear and --prefetch

* 'fix' issue and test

* fix a last test ;)

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update src/sourmash/cli/sig/rename.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* write tests for LazyLinearIndex

* add some basic prefetch tests

* properly test linear!

* add more tests for LazyLinearIndex

* test zipfile bool

* remove unnecessary try/except; comment

* fix signatures() call

* fix --prefetch snafu; doc

* do not overwrite signature even if duplicate md5sum (#1497)

* try adding loc to return values from Index.find

* made use of new IndexSearchResult.find throughout

* adjust note

* provide signatures_with_location on all Index objects

* cleanup and fix

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* fix bug around --save-prefetch with multiple databases

* comment/doc minor updates

* move away from Python sets to MinHash objects

* return intersect_mh from _find_best

* put _subtract_and_downsample inline

* clean up and remove old code

* remove max_hash

* more cleanup

Co-authored-by: Luiz Irber <luizirber@users.noreply.github.com>
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
luizirber added a commit that referenced this pull request May 15, 2021
* have the 'find' function for SBTs return signatures

* fix majority of tests

* comment & then fix test

* torture the tests into working

* split find and _find_nodes to take different kinds of functions

* redo 'find' on index

* refactor lca_db to use new find

* refactor SBT to use new find

* comment/cleanup

* refactor out common code

* fix up gather

* use 'passes' properly

* attempted cleanup

* minor fixes

* get a start on correct downsampling

* adjust tree downsampling for regular minhashes, too

* remove now-unused search functions in sbtmh

* refactor categorize to use new find

* cleanup and removal

* remove redundant code in lca_db

* remove redundant code in SBT

* add notes

* remove more unused code

* refactor most of the test_sbt tests

* fix one minor issue

* fix jaccard calculation in sbt

* check for compatibility of search fn and query signature

* switch tests over to jaccard similarity, not containment

* fix test

* remove test for unimplemented LCA_Database.find method

* document threshold change; update test

* refuse to run abund signatures

* flatten sigs internally for gather

* reinflate abundances for saving

* fix problem where sbt indices coudl be created with abund signatures

* more

* split flat and abund search

* make ignore_abundance work again for categorize

* turn off best-only, since it triggers on self-hits.

* add test: 'sourmash index' flattens sigs

* add note about something to test

* fix typo; still broken tho

* location is now a property

* move search code into search.py

* remove redundant scaled checking code

* best-only now works properly for two tests

* 'fix' tests by removing v1 and v2 SBT compatibility

* simplify (?) downsampling code

* require keyword args in MinHash.downsample(...)

* fix bug with downsample

* require keyword args in MinHash.downsample(...)

* fix test to use proper downsampling, reverse order to match scaled

* add test for revealed bug

* remove unnecessary comment

* flatten subject MinHash, too

* add testme comment

* clean up sbt find

* clean up lca find

* add IndexSearchResult namedtuple for search and gather results

* add more tests for Index classes

* add tests for subj & query num downsampling

* tests for Index.search_abund

* refactor a bit

* refactor make_jaccard_search_query; start tests

* even more tests

* test collect, best_only

* more search tests

* remove unnec space

* add minor comment

* deal with status == None on SystemExit

* upgrade and simplify categorize

* restore test

* merge

* fix abundance search in SBT for categorize

* code cleanup and refactoring; check for proper error messages

* add explicit test for incompatible num

* refactor MinHash.downsample

* deal with status == None on SystemExit

* fix test

* fix comment mispelling

* properly pass kwargs; fix search_sbt_index

* add simple tests for SBT load and search API

* allow arbitrary kwargs for LCA_DAtabase.find

* add testing of passthru-kwargs

* re-enable test

* add notes to update docstrings

* docstring updates

* fix test

* fix location reporting in prefetch

* fix prefetch location by fixing MultiIndex

* temporary prefetch_gather intervention

* 'gather' only returns best match

* turn prefetch on by default, for now

* better tests for gather --save-unassigned

* remove unused print

* remove unnecessary check-me comment

* clear out docstring

* SBT search doesn't work on v1 and v2 SBTs b/c no min_n_below

* start adding tests

* test some basic prefetch stuff

* update index for prefetch

* add fairly thorough tests

* fix my dumb mistake with gather

* simplify, refactor, fix

* fix remaining tests

* propogate ValueErrors better

* fix tests

* flatten prefetch queries

* fix for genome-grist alpha test

* fix threshold bugarooni

* fix gather/prefetch interactions

* fix sourmash prefetch return value

* minor fixes

* pay proper attention to threshold

* cleanup and refactoring

* remove unnecessary 'scaled'

* minor cleanup

* added LazyLinearLindex and prefetch --linear

* fix abundance problem

* save matches to a directory

* test for saving matches to a directory

* add a flexible progressive signature output class

* add tests for .sig.gz and .zip outputs

* update save_signatures code; add tests; use in gather and search too

* update comment

* cleanup and refactor of SaveSignaturesToLocation code

* docstrings & cleanup

* add 'run' and 'runtmp' test fixtures

* remove unnecessary track_abundance fixture call

* restore original;

* linear and prefetch fixtures + runtmp

* fix use of runtmp

* copy over SaveSignaturesToLocation code from other branch

* docs for sourmash prefetch

* more doc

* minor edits

* Re-implement the actual gather protocol with a cleaner interface. (#1489)

* initial refactor of CounterGather stuff

* refactor into peek and consume

* move next method over to query specific class

* replace gather implementation with new CounterGather

* many more tests for CounterGather

* remove scaled arg from peek

* open-box test for counter internal data structures

* add num query & subj tests

* add repr; add tests; support stdout

* refactor signature saving to use new sourmash_args collection saving

* specify utf-8 encoding for output

* add flexible output to compute/sketch

* add test to trigger rust panic

* test search --save-matches

* add --save-prefetch to sourmash gather

* remove --no-prefetch option :)

* added --save-prefetch functionality

* add back a mostly-functioning --no-prefetch argument :)

* add --no-prefetch back in

* check for JSON in first byte of LCA DB file

* start adding linear tests

* use fixtures to test prefetch and linear more thoroughly

* comments, etc

* upgrade docs for --linear and --prefetch

* 'fix' issue and test

* fix a last test ;)

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update src/sourmash/cli/sig/rename.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update tests/test_sourmash_args.py

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* write tests for LazyLinearIndex

* add some basic prefetch tests

* properly test linear!

* add more tests for LazyLinearIndex

* test zipfile bool

* remove unnecessary try/except; comment

* fix signatures() call

* fix --prefetch snafu; doc

* do not overwrite signature even if duplicate md5sum (#1497)

* try adding loc to return values from Index.find

* made use of new IndexSearchResult.find throughout

* adjust note

* provide signatures_with_location on all Index objects

* cleanup and fix

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* Update doc/command-line.md

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

* fix bug around --save-prefetch with multiple databases

* comment/doc minor updates

* initial trial implementation of ImmutableMinHash

* fix tests

* provide our own pickle for ImmutableMinHash

* ok, a few more plcaes to change.

* rename to FrozenMinHash per luiz

* finish renaming, add some tests

* thanks, I hate the old behavior

* copy.copy is no longer needed

* docs and an explicit 'frozen' method

* switch to using 'to_frozen' and 'to_mutable'

Co-authored-by: Luiz Irber <luizirber@users.noreply.github.com>
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants