Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(changelog): cache commit retain checks (258 times faster generation) #772

Merged
merged 16 commits into from
Aug 6, 2024

Conversation

aminya
Copy link
Contributor

@aminya aminya commented Jul 27, 2024

Description

This caches the commit retain checks whenever include/exclude paths are specified.

This speeds up changelog generation by 258 times in a big repository I have:

Now: 0.080 s
Before:  20.633 s

Fixes #767

It also improves handling including/excluding patterns, considering the first commit, and various other improvements.

How Has This Been Tested?

I tested this on the repository mentioned in #767

Screenshots / Logs (if applicable)

Here are the profiling flamegraphs created via #768. The bottleneck is removed after this fix is applied as seen in the above image (4366 samples to only 16 samples). After the fix, other parts of the code finally show up in the profiler.

After - Fully Cached:

git-cliff - after cached - flamegraph

After - Cached + new commit:

git-cliff - after new commit - flamegraph

Before:

git-cliff - Before - flamegraph

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (no code change)
  • Refactor (refactoring production code)
  • Other

Checklist:

  • My code follows the code style of this project.
  • I have updated the documentation accordingly.
  • I have formatted the code with rustfmt.
  • I checked the lints with clippy.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@aminya aminya requested a review from orhun as a code owner July 27, 2024 03:49
@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 83.33333% with 13 lines in your changes missing coverage. Please review.

Project coverage is 39.69%. Comparing base (4b33e7e) to head (361a0a0).
Report is 1 commits behind head on main.

Files Patch % Lines
git-cliff-core/src/repo.rs 85.53% 11 Missing ⚠️
git-cliff/src/lib.rs 0.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #772      +/-   ##
==========================================
+ Coverage   37.03%   39.69%   +2.66%     
==========================================
  Files          20       20              
  Lines        1542     1600      +58     
==========================================
+ Hits          571      635      +64     
+ Misses        971      965       -6     
Flag Coverage Δ
unit-tests 39.69% <83.34%> (+2.66%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aminya aminya force-pushed the cache branch 2 times, most recently from d1e09db to 202aeb6 Compare July 27, 2024 04:02
@alerque
Copy link
Contributor

alerque commented Jul 27, 2024

Only a measly 87× faster? Pshaw nothing to see here moving along now…

@aminya aminya changed the title perf: cache commit retain checks + 87 times faster generation perf: cache commit retain checks + 258 times faster generation Jul 27, 2024
@aminya
Copy link
Contributor Author

aminya commented Jul 27, 2024

Only a measly 87× faster? Pshaw nothing to see here moving along now…

Challenge accepted. Now 258 times faster. 😄

@aminya aminya force-pushed the cache branch 3 times, most recently from ddf3e78 to ed47005 Compare July 28, 2024 07:55
@aminya
Copy link
Contributor Author

aminya commented Jul 29, 2024

I updated the cache implementation to be more useful by caching the changed files of each commit.

This slows down the retain_commit check by ~2 times but allows reusing the cache in case the include/exclude patterns change. This is beneficial for running in monorepos where git-cliff runs multiple times to generate changelog for different subsets of the repo. This is a good trade-off in my opinion.

Copy link
Owner

@orhun orhun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥

git-cliff-core/src/repo.rs Outdated Show resolved Hide resolved
git-cliff-core/src/repo.rs Outdated Show resolved Hide resolved
git-cliff-core/Cargo.toml Show resolved Hide resolved
@aminya
Copy link
Contributor Author

aminya commented Jul 30, 2024

I fixed some bugs in the last three commits to improve the include/exclude behaviour:

  • The include/exclude patterns are considered together.
  • The glob patterns are normalized so that ./ doesn't make a difference in the result
  • Added special case for the first commit where it has no parent

git-cliff-core/src/repo.rs Show resolved Hide resolved
git-cliff-core/src/repo.rs Show resolved Hide resolved
git-cliff-core/src/repo.rs Outdated Show resolved Hide resolved
Copy link
Owner

@orhun orhun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you resolve conflicts caused by #709 ? 😇

aminya added 13 commits August 5, 2024 23:30
This caches the commit retain checks whenever include/exclude paths are specified.

This speeds up changelog generation by `87` times in a big repository I have:
```
Now: 0.237 s
Before:  20.633 s
```
Sled fails to open the cache if multiple git-cliff processes run on the same repository. This fixes the issue by using cacache which stores the cache locally under .git/git-cliff
This slows down the retain_commit check by ~2 times, but allows reusing the cache in case the include/exclude patterns change.
This is beneficial for running in monorepos where git-cliff runs multiple times to generate changelog for different subsets of the repo
aminya and others added 2 commits August 5, 2024 23:48
The slices are normalized, so an owned pattern is needed to avoid type mismatch between new normalized/unchanged patterns
Copy link
Owner

@orhun orhun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@orhun orhun changed the title perf: cache commit retain checks + 258 times faster generation perf(changelog): cache commit retain checks (258 times faster generation) Aug 6, 2024
@orhun orhun merged commit 8430c5c into orhun:main Aug 6, 2024
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow for large repositories with binary files in Git LFS and long history
4 participants