Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect Result functionality into a common class #416

Open
luizirber opened this issue Feb 23, 2018 · 5 comments
Open

Collect Result functionality into a common class #416

luizirber opened this issue Feb 23, 2018 · 5 comments

Comments

@luizirber
Copy link
Member

Punted from #390 (comment)

We have a bunch of Result namedtuples lying around, and I think we understand them enough to pull duplicated functionality into a common class.

@ctb
Copy link
Contributor

ctb commented Feb 23, 2018 via email

@ctb
Copy link
Contributor

ctb commented May 8, 2021

meh. There are four different ones now. Three in src/sourmash/search.py (IndexSearchResult, GatherSearchResult, and one in src/sourmash/index.py, IndexSearchResult. They all serve different purposes and have quite different output.

But:

I'd be in favor of changing the names to be clearer and maybe combining to provide common csv outputting functionality that also supports progressive output during search for better UX (#1350), much like #1493 did for signatures. I think this is a "good next issue" idea, in fact.

@ctb
Copy link
Contributor

ctb commented Sep 23, 2021

One way to start this would be to refactor the gather() function in commands.py to save line by line to the CSV inside the for result, weighted_missed in gather_iter: loop.

  • create a new class sourmash_args.ProgressiveFileOutputCSV modeled on the SaveSignaturesToLocation API, but simpler;
  • before the for loop, instantiate the class if CSV output was requested;
  • inside the for loop, write each result line to the CSV
  • after the for loop, close the CSV file
  • for bonus, the code inside the if found and args.save_matches: condition could be introduced into that same for loop.

@ctb
Copy link
Contributor

ctb commented Apr 27, 2022

hmm @bluegenes any comments on what your work in #1934, #1952, #1955, #1966, #1967 means for this issue?

@bluegenes
Copy link
Contributor

Hah, did not know this was already an issue 😂 ! Well:

  • [MRG] standardize and simplify search, prefetch, gather results by using dataclasses #1955 converted the search.py namedtuples into classes

    • BaseResult for shared functionality
    • SearchResult allows both num and scaled sketches
    • PrefetchResult and GatherResult require scaled sketches.
  • [MRG] output estimated ANI from sourmash compare, search, prefetch, and gather #1967 adds some more csv outputting functionality. We already enabled writing each result as you go. This currently happens when outputting a csv from sourmash prefetch or a prefetch csv from gather, so I wrote these to keep that enabled.

    • we initialize a csv dictwriter from within the first *Result, which allows us to use the column information from within the result, rather than needing to grab the column names separately
    • then, we pass in the dictwriter to allow each result to write itself as a row into the csv file

Some thoughts going forward:

  • it is now straightforward to update the columns that are written to each csv file by modifying the write_cols within each *Result dataclass (and presents an opportunity for updates and standardization)
  • IndexSearchResult remains a tuple namedtuple('Result', 'score, signature, location')
    • changing this to a *Result class would probably stop us from needing to go back and compare the two signatures again in order to have sufficient information to estimate ANI (which is what is happening in search right now)
  • it would (still) be straightforward to modify to write gather results to csv as we go. If that happens, I think it would be worth refactoring gather a bit more to better use GatherDatabases and GatherResult together + make sure we're avoiding recalculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants