Collect Result functionality into a common class #416

luizirber · 2018-02-23T20:51:56Z

We have a bunch of Result namedtuples lying around, and I think we understand them enough to pull duplicated functionality into a common class.

The text was updated successfully, but these errors were encountered:

ctb · 2018-02-23T23:39:34Z

would be nice to maybe put together loading code and good pandas display code for them too!

ctb · 2021-05-08T12:48:53Z

meh. There are four different ones now. Three in src/sourmash/search.py (IndexSearchResult, GatherSearchResult, and one in src/sourmash/index.py, IndexSearchResult. They all serve different purposes and have quite different output.

But:

I'd be in favor of changing the names to be clearer and maybe combining to provide common csv outputting functionality that also supports progressive output during search for better UX (#1350), much like #1493 did for signatures. I think this is a "good next issue" idea, in fact.

ctb · 2021-09-23T12:19:44Z

One way to start this would be to refactor the gather() function in commands.py to save line by line to the CSV inside the for result, weighted_missed in gather_iter: loop.

create a new class sourmash_args.ProgressiveFileOutputCSV modeled on the SaveSignaturesToLocation API, but simpler;
before the for loop, instantiate the class if CSV output was requested;
inside the for loop, write each result line to the CSV
after the for loop, close the CSV file
for bonus, the code inside the if found and args.save_matches: condition could be introduced into that same for loop.

ctb · 2022-04-27T15:20:43Z

hmm @bluegenes any comments on what your work in #1934, #1952, #1955, #1966, #1967 means for this issue?

bluegenes · 2022-04-27T16:38:32Z

Hah, did not know this was already an issue 😂 ! Well:

[MRG] standardize and simplify search, prefetch, gather results by using dataclasses #1955 converted the search.py namedtuples into classes
- BaseResult for shared functionality
- SearchResult allows both num and scaled sketches
- PrefetchResult and GatherResult require scaled sketches.
[MRG] output estimated ANI from sourmash compare, search, prefetch, and gather #1967 adds some more csv outputting functionality. We already enabled writing each result as you go. This currently happens when outputting a csv from sourmash prefetch or a prefetch csv from gather, so I wrote these to keep that enabled.
- we initialize a csv dictwriter from within the first *Result, which allows us to use the column information from within the result, rather than needing to grab the column names separately
- then, we pass in the dictwriter to allow each result to write itself as a row into the csv file

Some thoughts going forward:

it is now straightforward to update the columns that are written to each csv file by modifying the write_cols within each *Result dataclass (and presents an opportunity for updates and standardization)
IndexSearchResult remains a tuple namedtuple('Result', 'score, signature, location')
- changing this to a *Result class would probably stop us from needing to go back and compare the two signatures again in order to have sufficient information to estimate ANI (which is what is happening in search right now)
it would (still) be straightforward to modify to write gather results to csv as we go. If that happens, I think it would be worth refactoring gather a bit more to better use GatherDatabases and GatherResult together + make sure we're avoiding recalculations.

luizirber mentioned this issue Feb 23, 2018

[MRG] add an lca gather signature decomposition approach #390

Merged

11 tasks

ctb mentioned this issue Apr 27, 2022

CSV output for sourmash search needs upgrading #1390

Open

ctb mentioned this issue Jan 28, 2024

update GatherResult to no longer store MinHashes #2950

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect Result functionality into a common class #416

Collect Result functionality into a common class #416

luizirber commented Feb 23, 2018

ctb commented Feb 23, 2018 via email

ctb commented May 8, 2021 •

edited

Loading

ctb commented Sep 23, 2021

ctb commented Apr 27, 2022

bluegenes commented Apr 27, 2022

Collect Result functionality into a common class #416

Collect Result functionality into a common class #416

Comments

luizirber commented Feb 23, 2018

ctb commented Feb 23, 2018 via email

ctb commented May 8, 2021 • edited Loading

ctb commented Sep 23, 2021

ctb commented Apr 27, 2022

bluegenes commented Apr 27, 2022

ctb commented May 8, 2021 •

edited

Loading