Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: gather --save-prefetch-csv does not save all matches #2318

Open
ctb opened this issue Oct 6, 2022 · 0 comments
Open

bug: gather --save-prefetch-csv does not save all matches #2318

ctb opened this issue Oct 6, 2022 · 0 comments
Labels

Comments

@ctb
Copy link
Contributor

ctb commented Oct 6, 2022

when @bluegenes was digging into some classification results, she discovered that gather was not outputting all of the prefetch results (as evaluated by comparing to sourmash prefetch -o ...).

(IMO they should always be the same.)

This is a bug / behavior change introduced in #2116, merged into latest in #2111, and released in sourmash v4.4.2.

In brief,

The relevant code in commands.py::gather is:

        for db in databases:
            counter = None
            try:
                counter = db.counter_gather(prefetch_query, args.threshold_bp)
            except ValueError:
                # catch "no signatures to search" ValueError if empty db.       
                continue

            save_prefetch.add_many(counter.signatures())

            if prefetch_csvout_fp:
                for found_sig in counter.signatures():
                    ... # write info to CSV

and here .signatures() is not returning all signatures.

I'm not 100% sure how to resolve this. One option would be to adjust CounterGather to save prefetch results internally; possible, but maybe messy? Another would be to store all the signatures in a list, too.

Ideally any solution would result in the same code being used in commands.py::gather and commands.py::prefetch so that this doesn't happen again ;). And of course we'll want some tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant