Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a MinHash function to copy abundances from one MinHash to another #1463

Closed
ctb opened this issue Apr 16, 2021 · 3 comments · Fixed by #1620
Closed

create a MinHash function to copy abundances from one MinHash to another #1463

ctb opened this issue Apr 16, 2021 · 3 comments · Fixed by #1620
Labels
good next issue An issue that should be ready to resolve. python Pull requests that update Python code

Comments

@ctb
Copy link
Contributor

ctb commented Apr 16, 2021

Over in #1392, I implement a code block that retrieves abundances from a sketch using hashes; in src/sourmash/commands.py, function gather:

            if is_abundance:
                # reinflate abundances
                hashes = set(next_query.minhash.hashes)
                orig_abunds = orig_query_mh.hashes
                abunds = { h: orig_abunds[h] for h in hashes }

                abund_query_mh = orig_query_mh.copy_and_clear()
                # orig_query might have been downsampled...
                abund_query_mh.downsample(scaled=next_query.minhash.scaled)
                abund_query_mh.set_abundances(abunds)
                next_query.minhash = abund_query_mh

Here, next_query.minhash is a flattened MinHash object derived from orig_query_mh, and we are constructing a new MinHash object abund_query_mh that has all of the hashes from next_query.minhash with abundances for those hashes taken from orig_query_mh.

This is potentially a generically useful function that is kind of the converse of flatten(), so let's make it a method on the MinHash class!

TODO:

  • create a new method on MinHash objects (see src/sourmash/minhash.py) named something like inflate;
  • this method should take one argument in addition to self, a MinHash object with track_abundance=True;
  • this method should create a new MinHash object that borrows the abundances from the second argument using only the hashes from self;
  • replace the code block in commands.gather() with using this new function (and see if it can apply to multigather;
  • write some tests for tests/test_minhash.py;
@ctb ctb added python Pull requests that update Python code good next issue An issue that should be ready to resolve. labels Apr 16, 2021
luizirber added a commit that referenced this issue Apr 16, 2021
luizirber added a commit that referenced this issue Apr 16, 2021
luizirber added a commit that referenced this issue Apr 19, 2021
luizirber added a commit that referenced this issue Apr 27, 2021
luizirber added a commit that referenced this issue Apr 28, 2021
luizirber added a commit that referenced this issue Apr 28, 2021
luizirber added a commit that referenced this issue May 10, 2021
add getset, wip parallel feature flag

wip colors

simpler impl first

parallel hash to color construction

wip

Revert "wip"

This reverts commit d65da76.

must insert small_color into large_color before setting it

trying out a small set impl

try compressing colors inside reduce

size test

use new released vec-collections

update cbindgen

make parallel/sequential more maintainable

some notes on partial serde

start revindex in py

start ffi

first test passing

second test passing

modify colors.update to accept an iter instead of slices

color count tracker

update sourmash.h

blanket implementation for counter_gather

start working on memstorage

niv update

avoid a mut ref in save by using lots of mutexes

fix codecov path fixes

expose InnerStorage

basic test passing in-memory sigs working

revert counter_gather to gather in search.py

lint cleanup

cbindgen fixes

moved MemStorage to #1463

implement signatures()

fix initialization
@keyabarve
Copy link
Contributor

keyabarve commented Jun 19, 2021

@ctb Is this inflate function supposed to be written in src/sourmash/minhash.py? Should this function be inside any particular class?

@ctb
Copy link
Contributor Author

ctb commented Jun 19, 2021

yes, in the MinHash class. It probably needs to be defined on FrozenMinHash where it should raise an exception, too.

@ctb
Copy link
Contributor Author

ctb commented Jun 19, 2021

(order doesn't matter)

luizirber added a commit that referenced this issue Jun 21, 2021
add getset, wip parallel feature flag

wip colors

simpler impl first

parallel hash to color construction

wip

Revert "wip"

This reverts commit d65da76.

must insert small_color into large_color before setting it

trying out a small set impl

try compressing colors inside reduce

size test

use new released vec-collections

update cbindgen

make parallel/sequential more maintainable

some notes on partial serde

start revindex in py

start ffi

first test passing

second test passing

modify colors.update to accept an iter instead of slices

color count tracker

update sourmash.h

blanket implementation for counter_gather

start working on memstorage

niv update

avoid a mut ref in save by using lots of mutexes

fix codecov path fixes

expose InnerStorage

basic test passing in-memory sigs working

revert counter_gather to gather in search.py

lint cleanup

cbindgen fixes

moved MemStorage to #1463

implement signatures()

fix initialization
@ctb ctb closed this as completed in #1620 Aug 8, 2021
luizirber added a commit that referenced this issue Dec 19, 2021
add getset, wip parallel feature flag

wip colors

simpler impl first

parallel hash to color construction

wip

Revert "wip"

This reverts commit d65da76.

must insert small_color into large_color before setting it

trying out a small set impl

try compressing colors inside reduce

size test

use new released vec-collections

update cbindgen

make parallel/sequential more maintainable

some notes on partial serde

start revindex in py

start ffi

first test passing

second test passing

modify colors.update to accept an iter instead of slices

color count tracker

update sourmash.h

blanket implementation for counter_gather

start working on memstorage

niv update

avoid a mut ref in save by using lots of mutexes

fix codecov path fixes

expose InnerStorage

basic test passing in-memory sigs working

revert counter_gather to gather in search.py

lint cleanup

cbindgen fixes

moved MemStorage to #1463

implement signatures()

fix initialization
luizirber added a commit that referenced this issue Feb 12, 2022
add getset, wip parallel feature flag

wip colors

simpler impl first

parallel hash to color construction

wip

Revert "wip"

This reverts commit d65da76.

must insert small_color into large_color before setting it

trying out a small set impl

try compressing colors inside reduce

size test

use new released vec-collections

update cbindgen

make parallel/sequential more maintainable

some notes on partial serde

start revindex in py

start ffi

first test passing

second test passing

modify colors.update to accept an iter instead of slices

color count tracker

update sourmash.h

blanket implementation for counter_gather

start working on memstorage

niv update

avoid a mut ref in save by using lots of mutexes

fix codecov path fixes

expose InnerStorage

basic test passing in-memory sigs working

revert counter_gather to gather in search.py

lint cleanup

cbindgen fixes

moved MemStorage to #1463

implement signatures()

fix initialization
luizirber added a commit that referenced this issue Feb 13, 2022
add getset, wip parallel feature flag

wip colors

simpler impl first

parallel hash to color construction

wip

Revert "wip"

This reverts commit d65da76.

must insert small_color into large_color before setting it

trying out a small set impl

try compressing colors inside reduce

size test

use new released vec-collections

update cbindgen

make parallel/sequential more maintainable

some notes on partial serde

start revindex in py

start ffi

first test passing

second test passing

modify colors.update to accept an iter instead of slices

color count tracker

update sourmash.h

blanket implementation for counter_gather

start working on memstorage

niv update

avoid a mut ref in save by using lots of mutexes

fix codecov path fixes

expose InnerStorage

basic test passing in-memory sigs working

revert counter_gather to gather in search.py

lint cleanup

cbindgen fixes

moved MemStorage to #1463

implement signatures()

fix initialization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good next issue An issue that should be ready to resolve. python Pull requests that update Python code
Projects
None yet
2 participants