Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent mongo connector to count more than 1M rows #159

Merged
merged 3 commits into from
May 7, 2020
Merged

Conversation

davinov
Copy link
Member

@davinov davinov commented May 4, 2020

Change Summary

Trying to get_slice on a very large collection (25M rows) was very slow (1min30).
This is caused by the count facet, which forces mongo to scan all docs regardless of the limit.

I suggest we limit this to a somewhat high value, but that doesn't impact too much performance.

I choose 1M, because it took 3s on my local mongo, and seems a nice threshold above which the exact count is not very relevant.

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable

@davinov davinov added enhancement New feature or request Need Review labels May 4, 2020
@davinov davinov self-assigned this May 4, 2020
@codecov-io
Copy link

codecov-io commented May 4, 2020

Codecov Report

Merging #159 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #159   +/-   ##
=======================================
  Coverage   97.52%   97.52%           
=======================================
  Files          42       42           
  Lines        2097     2098    +1     
=======================================
+ Hits         2045     2046    +1     
  Misses         52       52           
Impacted Files Coverage Δ
toucan_connectors/mongo/mongo_connector.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b0d4383...2f6af28. Read the comment docs.

Copy link
Contributor

@testinnplayin testinnplayin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est super !

@davinov davinov removed the TO MERGE label May 7, 2020
@davinov davinov merged commit 106788a into master May 7, 2020
@davinov davinov deleted the max-counted-rows branch May 7, 2020 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants