Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add FrozenMinHash #1508

Merged
merged 274 commits into from
May 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
274 commits
Select commit Hold shift + click to select a range
af6fd84
have the 'find' function for SBTs return signatures
ctb Mar 12, 2021
8a92936
fix majority of tests
ctb Mar 12, 2021
5138e83
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb Mar 12, 2021
f3ed42f
Merge branch 'add/prefetch_cli' into add/prefetch_index
ctb Mar 12, 2021
c4adabf
Merge branch 'latest' of github.com:dib-lab/sourmash into fix/sbt_find
ctb Mar 12, 2021
cdb4159
comment & then fix test
ctb Mar 12, 2021
a414624
torture the tests into working
ctb Mar 12, 2021
6f7d368
split find and _find_nodes to take different kinds of functions
ctb Mar 13, 2021
7b2f624
Merge branch 'fix/sbt_find' into refactor/categorize
ctb Mar 13, 2021
b5ab6d7
redo 'find' on index
ctb Mar 13, 2021
ed7d52b
refactor lca_db to use new find
ctb Mar 13, 2021
aec730e
refactor SBT to use new find
ctb Mar 13, 2021
590b3d6
comment/cleanup
ctb Mar 13, 2021
eb7d661
refactor out common code
ctb Mar 13, 2021
0639c3e
fix up gather
ctb Mar 13, 2021
a65c79b
use 'passes' properly
ctb Mar 13, 2021
02794ee
attempted cleanup
ctb Mar 13, 2021
f94e909
minor fixes
ctb Mar 13, 2021
c3a65ac
get a start on correct downsampling
ctb Mar 13, 2021
9054cb8
adjust tree downsampling for regular minhashes, too
ctb Mar 13, 2021
db740ec
remove now-unused search functions in sbtmh
ctb Mar 13, 2021
03a5e60
refactor categorize to use new find
ctb Mar 13, 2021
b3718dd
cleanup and removal
ctb Mar 13, 2021
e8e4702
remove redundant code in lca_db
ctb Mar 13, 2021
b40963c
remove redundant code in SBT
ctb Mar 13, 2021
055bd60
add notes
ctb Mar 13, 2021
2329009
remove more unused code
ctb Mar 13, 2021
e6d90f6
refactor most of the test_sbt tests
ctb Mar 13, 2021
2baa8c3
fix one minor issue
ctb Mar 13, 2021
0ec99ea
fix jaccard calculation in sbt
ctb Mar 13, 2021
c583a37
check for compatibility of search fn and query signature
ctb Mar 13, 2021
d565e67
switch tests over to jaccard similarity, not containment
ctb Mar 13, 2021
8eb43f7
fix test
ctb Mar 13, 2021
5c75e39
remove test for unimplemented LCA_Database.find method
ctb Mar 13, 2021
83ee16b
document threshold change; update test
ctb Mar 14, 2021
7bfa0e1
refuse to run abund signatures
ctb Mar 14, 2021
2c28568
flatten sigs internally for gather
ctb Mar 14, 2021
9adae36
reinflate abundances for saving
ctb Mar 14, 2021
c979b17
fix problem where sbt indices coudl be created with abund signatures
ctb Mar 14, 2021
0bf34cd
more
ctb Mar 15, 2021
3844b02
split flat and abund search
ctb Mar 16, 2021
f6fe0de
make ignore_abundance work again for categorize
ctb Mar 16, 2021
863e4de
turn off best-only, since it triggers on self-hits.
ctb Mar 16, 2021
731df73
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Mar 16, 2021
21e8867
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Mar 20, 2021
80c14c2
add test: 'sourmash index' flattens sigs
ctb Mar 20, 2021
138bd16
add note about something to test
ctb Mar 20, 2021
9dcf25b
Merge branch 'latest' into add/prefetch_cli
ctb Apr 3, 2021
d438f9c
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 3, 2021
e406a99
fix typo; still broken tho
ctb Apr 3, 2021
dc91322
Merge branch 'add/prefetch_cli' of github.com:dib-lab/sourmash into a…
ctb Apr 3, 2021
182ad62
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 4, 2021
74c925d
location is now a property
ctb Apr 4, 2021
87811a4
move search code into search.py
ctb Apr 4, 2021
45b1f5e
remove redundant scaled checking code
ctb Apr 4, 2021
7b76751
best-only now works properly for two tests
ctb Apr 4, 2021
2248b06
'fix' tests by removing v1 and v2 SBT compatibility
ctb Apr 4, 2021
0aa4bd2
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 9, 2021
66dc4a7
simplify (?) downsampling code
ctb Apr 9, 2021
b7a3ba2
require keyword args in MinHash.downsample(...)
ctb Apr 9, 2021
7d3885e
fix bug with downsample
ctb Apr 9, 2021
c686662
require keyword args in MinHash.downsample(...)
ctb Apr 9, 2021
39d13cc
fix test to use proper downsampling, reverse order to match scaled
ctb Apr 9, 2021
86e1f41
add test for revealed bug
ctb Apr 9, 2021
78aa70c
remove unnecessary comment
ctb Apr 9, 2021
d4b291a
Merge branch 'fix/downsample_kwargs' into refactor/index_find
ctb Apr 9, 2021
cb712c0
flatten subject MinHash, too
ctb Apr 9, 2021
ba7352e
add testme comment
ctb Apr 9, 2021
31d08e0
clean up sbt find
ctb Apr 9, 2021
9feda90
clean up lca find
ctb Apr 9, 2021
9b9d518
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 10, 2021
36cc35e
add IndexSearchResult namedtuple for search and gather results
ctb Apr 10, 2021
a6cd259
add more tests for Index classes
ctb Apr 10, 2021
54126ae
add tests for subj & query num downsampling
ctb Apr 10, 2021
16c464e
tests for Index.search_abund
ctb Apr 10, 2021
2e0bc9d
refactor a bit
ctb Apr 10, 2021
87ffe00
refactor make_jaccard_search_query; start tests
ctb Apr 10, 2021
1a4cfd4
even more tests
ctb Apr 10, 2021
184e541
test collect, best_only
ctb Apr 10, 2021
ebd5aac
more search tests
ctb Apr 10, 2021
430cb2e
remove unnec space
ctb Apr 10, 2021
b218540
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 11, 2021
cc2ec29
add minor comment
ctb Apr 11, 2021
c2b4eda
deal with status == None on SystemExit
ctb Apr 11, 2021
1bda989
upgrade and simplify categorize
ctb Apr 11, 2021
a7f5306
restore test
ctb Apr 11, 2021
2db2586
merge
ctb Apr 11, 2021
8c84397
fix abundance search in SBT for categorize
ctb Apr 13, 2021
1c6a539
code cleanup and refactoring; check for proper error messages
ctb Apr 13, 2021
8af9187
add explicit test for incompatible num
ctb Apr 14, 2021
379743d
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 14, 2021
5b4b5ed
refactor MinHash.downsample
ctb Apr 14, 2021
1e70d07
deal with status == None on SystemExit
ctb Apr 11, 2021
495f0bf
fix test
ctb Apr 14, 2021
1660df5
fix comment mispelling
ctb Apr 14, 2021
77f6e0a
properly pass kwargs; fix search_sbt_index
ctb Apr 14, 2021
72639bd
add simple tests for SBT load and search API
ctb Apr 14, 2021
e916214
Merge branch 'refactor/minhash_downsample' into refactor/index_find
ctb Apr 14, 2021
a735445
Merge branch 'fix/sys_exit_none' into refactor/index_find
ctb Apr 14, 2021
922db44
Merge branch 'fix/search_sbt_index' into refactor/index_find
ctb Apr 14, 2021
5b8d83c
allow arbitrary kwargs for LCA_DAtabase.find
ctb Apr 14, 2021
8adc01c
add testing of passthru-kwargs
ctb Apr 15, 2021
f70af9c
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 15, 2021
b07c61d
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 15, 2021
d9c07ce
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 16, 2021
5b308bc
re-enable test
ctb Apr 16, 2021
02c04d6
add notes to update docstrings
ctb Apr 16, 2021
e4e542a
Merge branch 'refactor/index_find' into merge_find_and_prefetch
ctb Apr 16, 2021
c052319
Merge branch 'add/prefetch_index' into merge_find_and_prefetch
ctb Apr 16, 2021
db52ee7
docstring updates
ctb Apr 16, 2021
c50dcdb
fix test
ctb Apr 16, 2021
e4cfe97
Merge branch 'latest' into refactor/index_find
luizirber Apr 16, 2021
11b7486
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 16, 2021
b072090
Merge branch 'latest' of github.com:dib-lab/sourmash into merge_find_…
ctb Apr 17, 2021
9c6d368
Merge branch 'refactor/index_find' into merge_find_and_prefetch
ctb Apr 17, 2021
c067af1
fix location reporting in prefetch
ctb Apr 17, 2021
a4ed221
fix prefetch location by fixing MultiIndex
ctb Apr 17, 2021
e48588d
temporary prefetch_gather intervention
ctb Apr 17, 2021
96ca217
'gather' only returns best match
ctb Apr 17, 2021
c0b2735
turn prefetch on by default, for now
ctb Apr 17, 2021
637723b
Merge branch 'latest' into refactor/index_find
ctb Apr 17, 2021
7759314
better tests for gather --save-unassigned
ctb Apr 18, 2021
8376ce5
Merge branch 'refactor/index_find' of github.com:dib-lab/sourmash int…
ctb Apr 18, 2021
e877490
Merge branch 'refactor/index_find' into merge_find_and_prefetch
ctb Apr 18, 2021
423fff4
remove unused print
ctb Apr 18, 2021
593a907
remove unnecessary check-me comment
ctb Apr 19, 2021
4132162
clear out docstring
ctb Apr 19, 2021
23166df
SBT search doesn't work on v1 and v2 SBTs b/c no min_n_below
ctb Apr 19, 2021
c494032
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb Apr 19, 2021
3cf42f0
start adding tests
ctb Apr 19, 2021
3a5901e
Merge branch 'latest' of github.com:dib-lab/sourmash into refactor/in…
ctb Apr 20, 2021
4c0362e
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb Apr 21, 2021
18219ae
test some basic prefetch stuff
ctb Apr 21, 2021
50a94d4
Merge branch 'add/prefetch_cli' into merge_find_and_prefetch
ctb Apr 21, 2021
3ed4af0
update index for prefetch
ctb Apr 21, 2021
ba8beb6
add fairly thorough tests
ctb Apr 21, 2021
79e0166
Merge branch 'add/prefetch_cli' into merge_find_and_prefetch
ctb Apr 21, 2021
57467cd
fix my dumb mistake with gather
ctb Apr 21, 2021
06f5d03
Merge branch 'refactor/index_find' into merge_find_and_prefetch
ctb Apr 21, 2021
16f1ee2
Merge branch 'latest' of github.com:dib-lab/sourmash into merge_find_…
ctb Apr 22, 2021
98957b8
simplify, refactor, fix
ctb Apr 22, 2021
67e7954
fix remaining tests
ctb Apr 22, 2021
c9109a6
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb Apr 23, 2021
3151ff5
propogate ValueErrors better
ctb Apr 23, 2021
634e84e
fix tests
ctb Apr 23, 2021
7852fa1
flatten prefetch queries
ctb Apr 24, 2021
808ae37
fix for genome-grist alpha test
ctb Apr 24, 2021
eb178bb
fix threshold bugarooni
ctb Apr 24, 2021
ee7a6c2
fix gather/prefetch interactions
ctb Apr 24, 2021
174ebbe
fix sourmash prefetch return value
ctb Apr 24, 2021
bea17b3
minor fixes
ctb Apr 24, 2021
ad03e1e
pay proper attention to threshold
ctb Apr 24, 2021
cf86954
cleanup and refactoring
ctb Apr 25, 2021
293fc43
remove unnecessary 'scaled'
ctb Apr 25, 2021
fb87777
minor cleanup
ctb Apr 25, 2021
7631157
added LazyLinearLindex and prefetch --linear
ctb Apr 25, 2021
87be7fa
fix abundance problem
ctb Apr 26, 2021
f90a21f
save matches to a directory
ctb Apr 26, 2021
18d72c4
test for saving matches to a directory
ctb Apr 26, 2021
b1d54df
add a flexible progressive signature output class
ctb Apr 27, 2021
f1556d0
add tests for .sig.gz and .zip outputs
ctb Apr 27, 2021
65b7cbe
update save_signatures code; add tests; use in gather and search too
ctb Apr 27, 2021
9680355
update comment
ctb Apr 27, 2021
f1b742c
cleanup and refactor of SaveSignaturesToLocation code
ctb Apr 28, 2021
a9e5221
docstrings & cleanup
ctb Apr 28, 2021
67e000e
add 'run' and 'runtmp' test fixtures
ctb Apr 28, 2021
ee4b7a0
remove unnecessary track_abundance fixture call
ctb Apr 28, 2021
255014e
restore original;
ctb Apr 28, 2021
c6a607c
Merge branch 'add/run_fixtures' into add/prefetch_cli
ctb Apr 28, 2021
e0ee951
linear and prefetch fixtures + runtmp
ctb Apr 28, 2021
15fb06f
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb Apr 28, 2021
4f11bff
fix use of runtmp
ctb Apr 28, 2021
903239b
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb Apr 28, 2021
591f3b1
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb May 1, 2021
83742b9
copy over SaveSignaturesToLocation code from other branch
ctb May 1, 2021
36defa7
docs for sourmash prefetch
ctb May 1, 2021
10c700a
more doc
ctb May 1, 2021
941afdb
minor edits
ctb May 2, 2021
475a515
Re-implement the actual gather protocol with a cleaner interface. (#1…
ctb May 2, 2021
b196ecc
add repr; add tests; support stdout
ctb May 2, 2021
af0f49c
refactor signature saving to use new sourmash_args collection saving
ctb May 2, 2021
c613b43
specify utf-8 encoding for output
ctb May 2, 2021
e19861c
add flexible output to compute/sketch
ctb May 2, 2021
0878218
add test to trigger rust panic
ctb May 2, 2021
345513f
test search --save-matches
ctb May 2, 2021
4ce0f7b
Merge branch 'add/save_signatures_to_loc' into add/prefetch_cli
ctb May 3, 2021
7c117e5
add --save-prefetch to sourmash gather
ctb May 3, 2021
b731df3
Merge branch 'add/prefetch_cli' of github.com:dib-lab/sourmash into a…
ctb May 3, 2021
78e8ef3
remove --no-prefetch option :)
ctb May 3, 2021
d9ad9af
added --save-prefetch functionality
ctb May 3, 2021
b1f79fa
add back a mostly-functioning --no-prefetch argument :)
ctb May 3, 2021
8eeb5c1
add --no-prefetch back in
ctb May 3, 2021
f6fdee3
check for JSON in first byte of LCA DB file
ctb May 3, 2021
566a127
Merge branch 'update/lca_db_load' into add/prefetch_cli
ctb May 3, 2021
2acc218
start adding linear tests
ctb May 3, 2021
d7494a6
use fixtures to test prefetch and linear more thoroughly
ctb May 3, 2021
e64fc47
comments, etc
ctb May 3, 2021
45b36ae
upgrade docs for --linear and --prefetch
ctb May 3, 2021
b3ba89f
'fix' issue and test
ctb May 3, 2021
a17e76b
Merge branch 'add/save_signatures_to_loc' into add/prefetch_cli
ctb May 3, 2021
32fd87d
fix a last test ;)
ctb May 3, 2021
10522c1
Update doc/command-line.md
ctb May 4, 2021
a15ebb9
Update src/sourmash/cli/sig/rename.py
ctb May 4, 2021
f20c354
Update tests/test_sourmash_args.py
ctb May 4, 2021
bb3a0cd
Update tests/test_sourmash_args.py
ctb May 4, 2021
02c6fca
Update tests/test_sourmash_args.py
ctb May 4, 2021
b1f8a8e
Update tests/test_sourmash_args.py
ctb May 4, 2021
1f58564
Update tests/test_sourmash_args.py
ctb May 4, 2021
833645b
Update doc/command-line.md
ctb May 4, 2021
2019e81
Merge branch 'add/save_signatures_to_loc' of github.com:dib-lab/sourm…
ctb May 4, 2021
a4b573a
write tests for LazyLinearIndex
ctb May 4, 2021
1e0f94d
add some basic prefetch tests
ctb May 5, 2021
1b0a424
properly test linear!
ctb May 5, 2021
1135fd8
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb May 5, 2021
92ee772
add more tests for LazyLinearIndex
ctb May 5, 2021
6b2668f
test zipfile bool
ctb May 5, 2021
c100bf0
remove unnecessary try/except; comment
ctb May 5, 2021
53ec3cf
fix signatures() call
ctb May 5, 2021
8c3b67a
fix --prefetch snafu; doc
ctb May 5, 2021
b4cdbe8
do not overwrite signature even if duplicate md5sum (#1497)
ctb May 5, 2021
c158c69
Merge branch 'latest' into add/save_signatures_to_loc
ctb May 5, 2021
bc7802c
Merge branch 'add/save_signatures_to_loc' into add/prefetch_cli
ctb May 5, 2021
f9fcfb6
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb May 5, 2021
b1e82ba
try adding loc to return values from Index.find
ctb May 5, 2021
3b5be03
made use of new IndexSearchResult.find throughout
ctb May 6, 2021
1eef0f1
adjust note
ctb May 6, 2021
4d080f1
provide signatures_with_location on all Index objects
ctb May 6, 2021
028487f
cleanup and fix
ctb May 6, 2021
9a3c1fe
Update doc/command-line.md
ctb May 6, 2021
66e3b6c
Update doc/command-line.md
ctb May 6, 2021
2a33d41
fix bug around --save-prefetch with multiple databases
ctb May 7, 2021
394da46
comment/doc minor updates
ctb May 7, 2021
958d465
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb May 7, 2021
7564f67
initial trial implementation of ImmutableMinHash
ctb May 8, 2021
36a5d4a
Merge branch 'add/prefetch_cli' into add/immutable_minhash_countergather
ctb May 8, 2021
acbd9bd
fix tests
ctb May 8, 2021
92a2511
Merge branch 'latest' of github.com:dib-lab/sourmash into add/prefetc…
ctb May 8, 2021
1c41915
Merge branch 'add/prefetch_cli' into add/immutable_minhash_countergather
ctb May 8, 2021
1b2afdc
provide our own pickle for ImmutableMinHash
ctb May 8, 2021
1ee3d64
ok, a few more plcaes to change.
ctb May 8, 2021
f395d72
rename to FrozenMinHash per luiz
ctb May 8, 2021
0317342
finish renaming, add some tests
ctb May 8, 2021
7a91cda
thanks, I hate the old behavior
ctb May 9, 2021
a99b2af
copy.copy is no longer needed
ctb May 9, 2021
8457daf
docs and an explicit 'frozen' method
ctb May 9, 2021
f36beb0
Merge branch 'latest' into add/immutable_minhash_countergather
ctb May 10, 2021
37e7513
Merge branch 'latest' of github.com:dib-lab/sourmash into add/immutab…
ctb May 12, 2021
df7ef36
Merge branch 'latest' of github.com:dib-lab/sourmash into add/immutab…
ctb May 13, 2021
d2dfcef
switch to using 'to_frozen' and 'to_mutable'
ctb May 13, 2021
69c61e8
Merge branch 'latest' into add/immutable_minhash_countergather
ctb May 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions src/sourmash/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -1106,7 +1106,8 @@ def prefetch(args):

# iterate over signatures in db one at a time, for each db;
# find those with sufficient overlap
noident_mh = copy.copy(query_mh)
noident_mh = query_mh.to_mutable()

did_a_search = False # track whether we did _any_ search at all!
for dbfilename in args.databases:
notify(f"loading signatures from '{dbfilename}'")
Expand Down Expand Up @@ -1164,7 +1165,7 @@ def prefetch(args):
notify(f"saved {matches_out.count} matches to CSV file '{args.output}'")
csvout_fp.close()

matched_query_mh = copy.copy(query_mh)
matched_query_mh = query_mh.to_mutable()
matched_query_mh.remove_many(noident_mh.hashes)
notify(f"of {len(query_mh)} distinct query hashes, {len(matched_query_mh)} were found in matches above threshold.")
notify(f"a total of {len(noident_mh)} query hashes remain unmatched.")
Expand Down
106 changes: 105 additions & 1 deletion src/sourmash/minhash.py
Original file line number Diff line number Diff line change
Expand Up @@ -588,7 +588,7 @@ def __add__(self, other):
if self.num != other.num:
raise TypeError(f"incompatible num values: self={self.num} other={other.num}")

new_obj = self.__copy__()
new_obj = self.to_mutable()
new_obj += other
return new_obj

Expand Down Expand Up @@ -645,3 +645,107 @@ def moltype(self): # TODO: test in minhash tests
return 'hp'
else:
return 'DNA'

def to_mutable(self):
"Return a copy of this MinHash that can be changed."
return self.__copy__()

def to_frozen(self):
"Return a frozen copy of this MinHash that cannot be changed."
new_mh = self.__copy__()
new_mh.__class__ = FrozenMinHash
return new_mh


class FrozenMinHash(MinHash):
def add_sequence(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def add_kmer(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def add_many(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def remove_many(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def add_hash(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def add_hash_with_abundance(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def clear(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def remove_many(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def set_abundances(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def add_protein(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def downsample(self, *, num=None, scaled=None):
if scaled and self.scaled == scaled:
return self
if num and self.num == num:
return self

return MinHash.downsample(self, num=num, scaled=scaled).to_frozen()

def flatten(self):
if not self.track_abundance:
return self
return MinHash.flatten(self).to_frozen()

def __iadd__(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def merge(self, *args, **kwargs):
raise TypeError('FrozenMinHash does not support modification')

def to_mutable(self):
"Return a copy of this MinHash that can be changed."
mut = MinHash.__new__(MinHash)
state_tup = self.__getstate__()

# is protein/hp/dayhoff?
if state_tup[2] or state_tup[3] or state_tup[4]:
state_tup = list(state_tup)
# adjust ksize.
state_tup[1] = state_tup[1] * 3
mut.__setstate__(state_tup)
return mut

def to_frozen(self):
"Return a frozen copy of this MinHash that cannot be changed."
return self

def __setstate__(self, tup):
"support pickling via __getstate__/__setstate__"
(n, ksize, is_protein, dayhoff, hp, mins, _, track_abundance,
max_hash, seed) = tup

self.__del__()

hash_function = (
lib.HASH_FUNCTIONS_MURMUR64_DAYHOFF if dayhoff else
lib.HASH_FUNCTIONS_MURMUR64_HP if hp else
lib.HASH_FUNCTIONS_MURMUR64_PROTEIN if is_protein else
lib.HASH_FUNCTIONS_MURMUR64_DNA
)

scaled = _get_scaled_for_max_hash(max_hash)
self._objptr = lib.kmerminhash_new(
scaled, ksize, hash_function, seed, track_abundance, n
)
if track_abundance:
MinHash.set_abundances(self, mins)
else:
MinHash.add_many(self, mins)

def __copy__(self):
return self
1 change: 1 addition & 0 deletions src/sourmash/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,7 @@ def gather_databases(query, counters, threshold_bp, ignore_abundance):

# construct a new query, subtracting hashes found in previous one.
new_query_mh = query.minhash.downsample(scaled=cmp_scaled)
new_query_mh = new_query_mh.to_mutable()
new_query_mh.remove_many(set(found_mh.hashes))
new_query = SourmashSignature(new_query_mh)

Expand Down
4 changes: 2 additions & 2 deletions src/sourmash/signature.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

from .logging import error
from . import MinHash
from .minhash import to_bytes
from .minhash import to_bytes, FrozenMinHash
from ._lowlevel import ffi, lib
from .utils import RustObject, rustcall, decode_str

Expand Down Expand Up @@ -42,7 +42,7 @@ def __init__(self, minhash, name="", filename=""):

@property
def minhash(self):
return MinHash._from_objptr(
return FrozenMinHash._from_objptr(
self._methodcall(lib.signature_first_mh)
)

Expand Down
3 changes: 2 additions & 1 deletion tests/test_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1326,6 +1326,7 @@ def is_found(ss, xx):

def _consume_all(query_mh, counter, threshold_bp=0):
results = []
query_mh = query_mh.to_mutable()

last_intersect_size = None
while 1:
Expand Down Expand Up @@ -1891,7 +1892,7 @@ def test_counter_gather_3_test_consume():

## round 1

cur_query = copy.copy(query_ss.minhash)
cur_query = query_ss.minhash.to_mutable()
(sr, intersect_mh) = counter.peek(cur_query)
assert sr.signature == match_ss_1
assert len(intersect_mh) == 10
Expand Down
37 changes: 37 additions & 0 deletions tests/test__minhash.py → tests/test_minhash.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
import sourmash
from sourmash.minhash import (
MinHash,
FrozenMinHash,
hash_murmur,
_get_scaled_for_max_hash,
_get_max_hash_for_scaled,
Expand Down Expand Up @@ -1908,3 +1909,39 @@ def test_max_containment_equal():
assert mh2.contained_by(mh1) == 1
assert mh1.max_containment(mh2) == 1
assert mh2.max_containment(mh1) == 1


def test_frozen_and_mutable_1(track_abundance):
# mutable minhashes -> mutable minhashes creates new copy
mh1 = MinHash(0, 21, scaled=1, track_abundance=track_abundance)
mh2 = mh1.to_mutable()

mh1.add_hash(10)
assert 10 not in mh2.hashes


def test_frozen_and_mutable_2(track_abundance):
# check that mutable -> frozen are separate
mh1 = MinHash(0, 21, scaled=1, track_abundance=track_abundance)
mh1.add_hash(10)

mh2 = mh1.to_frozen()
assert 10 in mh2.hashes
mh1.add_hash(11)
assert 11 not in mh2.hashes


def test_frozen_and_mutable_3(track_abundance):
# check that mutable -> frozen -> mutable are all separate from each other
mh1 = MinHash(0, 21, scaled=1, track_abundance=track_abundance)
mh1.add_hash(10)

mh2 = mh1.to_frozen()
assert 10 in mh2.hashes
mh1.add_hash(11)
assert 11 not in mh2.hashes

mh3 = mh2.to_mutable()
mh3.add_hash(12)
assert 12 not in mh2.hashes
assert 12 not in mh1.hashes
2 changes: 1 addition & 1 deletion tests/test_prefetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ def test_prefetch_nomatch_hashes(runtmp, linear_gather):
ss47 = sourmash.load_one_signature(sig47, ksize=31)
ss63 = sourmash.load_one_signature(sig63, ksize=31)

remain = ss47.minhash
remain = ss47.minhash.to_mutable()
remain.remove_many(ss63.minhash.hashes)

ss = sourmash.load_one_signature(nomatch_out)
Expand Down
2 changes: 1 addition & 1 deletion tests/test_sourmash.py
Original file line number Diff line number Diff line change
Expand Up @@ -3111,7 +3111,7 @@ def test_gather_f_match_orig(runtmp, linear_gather, prefetch_gather):
print(runtmp.last_result.err)

combined_sig = sourmash.load_one_signature(testdata_combined, ksize=21)
remaining_mh = copy.copy(combined_sig.minhash)
remaining_mh = combined_sig.minhash.to_mutable()

def approx_equal(a, b, n=5):
return round(a, n) == round(b, n)
Expand Down