Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add option to traverse commits from oldest to newest #1610

Merged
merged 4 commits into from
Sep 26, 2024

Conversation

nrdxp
Copy link
Contributor

@nrdxp nrdxp commented Sep 26, 2024

This change introduces an enum to control commit traversal order for Simple. Users can now choose between newest-first or oldest-first traversal. The default behavior remains newest-first, but it can be toggled by passing a CommitTimeOrder to a Sorting::ByCommitTime* variant.

This feature is particularly useful for searching early repository history, which should be orders of magnitude faster with this variant. In my benchmarking I was able to identify the original commit in a large mono-repo source tree using the new OldestFirst variant in 12ms down from 2.6s with NewestFirst.

The implementation logic remains largely agnostic to this change, with only minor adjustments in key areas as necessary.

The reversed order is achieved by inverting the PriorityQueue key with std::cmp::Reverse when an oldest-first traversal is requested. This is what allows the bulk of the original logic to remain unchanged.

nrdxp added a commit to nrdxp/gitoxide that referenced this pull request Sep 26, 2024
nrdxp added a commit to ekala-project/eka that referenced this pull request Sep 26, 2024
We have submitted an upstream patch to allow the iteration over git
histories to be a bit more versatile upstream:
GitoxideLabs/gitoxide#1610

In particular this change introduces the ability to traverse using a
oldest commit first strategy, reducing the time it takes to determine
the repositories root commit to a few milliseconds as opposed to the
several seconds in large histories.

This change is critical to maintain, which is the point of trying to
push the expanded API upstream, as it makes calculating and addressing
our atoms trivial and inexpensive.
@Byron Byron linked an issue Sep 26, 2024 that may be closed by this pull request
This change introduces an enum to control commit traversal order.
Users can now choose between newest-first or oldest-first traversal.
The default behavior remains newest-first, but it can be toggled
by passing a CommitTimeOrder to a Sorting::ByCommitTime* variant.

This feature is particularly useful for searching early repository
history. The implementation remains largely agnostic to this change,
with only minor logic adjustments in key areas as necessary.

The reversed order is achieved by inverting the PriorityQueue key
when an oldest-first traversal is requested.
@Byron Byron self-assigned this Sep 26, 2024
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for setting this up - it's great to see that this sorting mode along can make such a difference!

I will keep working on this PR until it's ready for merging, by adding some tests and probably fix one logic-issue I am seeing. But tests should say what's right or wrong here.

Tasks

  • refactor
  • see if topo-traversal should also implement this. That way the CommitTimeOrder could be shared and move up one level.
    • Yes, it can be done, but it's probably not useful. Git also doesn't have it, and it's better to wait for actual demand.

@@ -61,11 +75,14 @@ pub enum Error {
ObjectDecode(#[from] gix_object::decode::Error),
}

use Result as Either;
type QueueKey<T> = Either<T, Reverse<T>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a first for me to see this usage, and I kind of like it :).
It really does save a lot of boilerplate as well!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the idea yes, also just wanted to make it clear that this is not an error condition.

@@ -77,10 +94,13 @@ mod init {
use gix_date::SecondsSinceUnixEpoch;
use gix_hash::{oid, ObjectId};
use gix_object::{CommitRefIter, FindExt};
use std::cmp::Reverse;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creative!

@@ -77,10 +94,13 @@ mod init {
use gix_date::SecondsSinceUnixEpoch;
use gix_hash::{oid, ObjectId};
use gix_object::{CommitRefIter, FindExt};
use std::cmp::Reverse;
use Err as Oldest;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never dared!

state.queue.insert(ordered_time, commit_id);
}
(Some(cutoff_time), CommitTimeOrder::OldestFirst) if time <= cutoff_time => {
state.queue.insert(ordered_time, commit_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you invert the logic because it made sense or because it seemed right?

The reason I am curious is:

  • there are no tests for this
  • the way I think this could work with similar semantics is if the traversal would actually start from the back, and move forward in time, something that's not possible with a commitgraph.

My intuition here is to keep the cutoff logic the same, as the traversal direction is always from front to back, or newest to youngest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering that as well, I actually wasn't 100% confident either way, so I'm happy to keep it simple unless something breaks. I have a real-world usecase for this, so I will let you know. But I can also devise some tests for this in a follow up PR as well

@Byron Byron force-pushed the traverse/oldest-first branch 2 times, most recently from 3e45be0 to 3f0bcef Compare September 26, 2024 08:22
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the way to go, after removing the inverted cutoff logic essentially and adding tests.

Despite merging this, I am particularly interested to hear if this is the wrong way to go about it, and potential fixes are very welcome in follow-up PRs.

Thanks again for kicking off this feature!

Byron and others added 3 commits September 26, 2024 10:32
…it-graph during traversals.

It's implemented by sorting commits oldest first when choosing the next one to traverse,
which can greatly reduce the time it takes to reach the first commit of a graph.

Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
@Byron Byron merged commit 20f9b3f into GitoxideLabs:main Sep 26, 2024
16 checks passed
nrdxp added a commit to nrdxp/gitoxide that referenced this pull request Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support equivalent of GIT_SORT_REVERSE | GIT_SORT_TOPOLOGICAL in rev walk
2 participants