Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster generate_subcatalogs #295

Merged
merged 2 commits into from
Mar 29, 2021
Merged

Conversation

matthewhanson
Copy link
Member

Related Issue(s): #

Description:

The Catalog.generate_subcatalogs function was taking a long time with large catalogs (> several thousand items). Profiling revealed that the Catalog.remove_item was a significant factor. remove_item performs some redundant tasks, and is really not necessary. Instead of removing the item from the original location, when all item links have been moved, those links are removed from the STAC object.

A ~4000 item catalog takes 3 seconds vs 30 seconds to generate the subcatalogs after this change. A 25K item catalog takes about 3 minutes.

PR Checklist:

  • Code is formatted (run scripts/format)
  • Tests pass (run scripts/test)
  • This PR maintains or improves overall codebase code coverage.
  • Changes are added to the CHANGELOG. See the docs for information about adding to the changelog.

@codecov-io
Copy link

codecov-io commented Mar 29, 2021

Codecov Report

Merging #295 (5c16701) into develop (3645114) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #295      +/-   ##
===========================================
- Coverage    93.65%   93.63%   -0.02%     
===========================================
  Files           33       33              
  Lines         4004     4008       +4     
===========================================
+ Hits          3750     3753       +3     
- Misses         254      255       +1     
Impacted Files Coverage Δ
pystac/catalog.py 95.22% <100.00%> (-0.25%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3645114...5c16701. Read the comment docs.

@matthewhanson matthewhanson merged commit 4a2122d into develop Mar 29, 2021
@matthewhanson matthewhanson deleted the mah/faster_generate_subcats branch March 29, 2021 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants