Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: fix exponential time explosion in sig check #2762

Merged
merged 4 commits into from
Sep 15, 2023
Merged

MRG: fix exponential time explosion in sig check #2762

merged 4 commits into from
Sep 15, 2023

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Sep 15, 2023

Fixes #2646.

Embarrassing "let's make things run really slow" typo of the month... fixed!

In brief, the following code in manifest.py was causing an exponential explosion in time cost:

    def _add_rows(self, rows):
        self.rows.extend(rows)

        # maintain a fast check for md5sums for __contains__ check.
        md5set = self._md5_set
        for row in self.rows:
            md5set.add(row['md5'])

when called many times on a single row, because each addition of a row triggered an iteration over ALL rows.

This PR changes things so that we do for row in rows:.

Additional complexity => new tests to deal explicitly with the case where rows is a generator (which it turns out it often is, breaking dozens of tests when not accounted for)

@codecov
Copy link

codecov bot commented Sep 15, 2023

Codecov Report

Merging #2762 (e3c8d58) into latest (6df0360) will increase coverage by 6.92%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           latest    #2762      +/-   ##
==========================================
+ Coverage   85.92%   92.85%   +6.92%     
==========================================
  Files         130      104      -26     
  Lines       14808    12423    -2385     
  Branches     2619     2621       +2     
==========================================
- Hits        12724    11535    -1189     
+ Misses       1784      587    -1197     
- Partials      300      301       +1     
Flag Coverage Δ
hypothesis-py 25.81% <33.33%> (-0.01%) ⬇️
python 92.85% <66.66%> (-0.02%) ⬇️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/sourmash/sig/__main__.py 93.82% <0.00%> (-0.23%) ⬇️
src/sourmash/manifest.py 95.07% <100.00%> (+0.04%) ⬆️

... and 26 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ctb
Copy link
Contributor Author

ctb commented Sep 15, 2023

This should be ready for review @bluegenes

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@ctb
Copy link
Contributor Author

ctb commented Sep 15, 2023

Low code coverage caused by debug code. We're all good here :).

@ctb ctb merged commit 94d82b1 into latest Sep 15, 2023
@ctb ctb deleted the fix_sig_check branch September 15, 2023 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sig check very slow for large databases?
2 participants