-
-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add option to traverse commits from oldest to newest #1610
Conversation
eb9cde8
to
f99175e
Compare
minimum viable diff take from: GitoxideLabs#1610
We have submitted an upstream patch to allow the iteration over git histories to be a bit more versatile upstream: GitoxideLabs/gitoxide#1610 In particular this change introduces the ability to traverse using a oldest commit first strategy, reducing the time it takes to determine the repositories root commit to a few milliseconds as opposed to the several seconds in large histories. This change is critical to maintain, which is the point of trying to push the expanded API upstream, as it makes calculating and addressing our atoms trivial and inexpensive.
This change introduces an enum to control commit traversal order. Users can now choose between newest-first or oldest-first traversal. The default behavior remains newest-first, but it can be toggled by passing a CommitTimeOrder to a Sorting::ByCommitTime* variant. This feature is particularly useful for searching early repository history. The implementation remains largely agnostic to this change, with only minor logic adjustments in key areas as necessary. The reversed order is achieved by inverting the PriorityQueue key when an oldest-first traversal is requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for setting this up - it's great to see that this sorting mode along can make such a difference!
I will keep working on this PR until it's ready for merging, by adding some tests and probably fix one logic-issue I am seeing. But tests should say what's right or wrong here.
Tasks
- refactor
- see if topo-traversal should also implement this. That way the
CommitTimeOrder
could be shared and move up one level.- Yes, it can be done, but it's probably not useful. Git also doesn't have it, and it's better to wait for actual demand.
@@ -61,11 +75,14 @@ pub enum Error { | |||
ObjectDecode(#[from] gix_object::decode::Error), | |||
} | |||
|
|||
use Result as Either; | |||
type QueueKey<T> = Either<T, Reverse<T>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a first for me to see this usage, and I kind of like it :).
It really does save a lot of boilerplate as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the idea yes, also just wanted to make it clear that this is not an error condition.
@@ -77,10 +94,13 @@ mod init { | |||
use gix_date::SecondsSinceUnixEpoch; | |||
use gix_hash::{oid, ObjectId}; | |||
use gix_object::{CommitRefIter, FindExt}; | |||
use std::cmp::Reverse; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creative!
@@ -77,10 +94,13 @@ mod init { | |||
use gix_date::SecondsSinceUnixEpoch; | |||
use gix_hash::{oid, ObjectId}; | |||
use gix_object::{CommitRefIter, FindExt}; | |||
use std::cmp::Reverse; | |||
use Err as Oldest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never dared!
gix-traverse/src/commit/simple.rs
Outdated
state.queue.insert(ordered_time, commit_id); | ||
} | ||
(Some(cutoff_time), CommitTimeOrder::OldestFirst) if time <= cutoff_time => { | ||
state.queue.insert(ordered_time, commit_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you invert the logic because it made sense or because it seemed right?
The reason I am curious is:
- there are no tests for this
- the way I think this could work with similar semantics is if the traversal would actually start from the back, and move forward in time, something that's not possible with a commitgraph.
My intuition here is to keep the cutoff logic the same, as the traversal direction is always from front to back, or newest to youngest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was considering that as well, I actually wasn't 100% confident either way, so I'm happy to keep it simple unless something breaks. I have a real-world usecase for this, so I will let you know. But I can also devise some tests for this in a follow up PR as well
3e45be0
to
3f0bcef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's the way to go, after removing the inverted cutoff logic essentially and adding tests.
Despite merging this, I am particularly interested to hear if this is the wrong way to go about it, and potential fixes are very welcome in follow-up PRs.
Thanks again for kicking off this feature!
…it-graph during traversals. It's implemented by sorting commits oldest first when choosing the next one to traverse, which can greatly reduce the time it takes to reach the first commit of a graph. Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
3f0bcef
to
6862c27
Compare
minimum viable diff take from: GitoxideLabs#1610
This change introduces an enum to control commit traversal order for Simple. Users can now choose between newest-first or oldest-first traversal. The default behavior remains newest-first, but it can be toggled by passing a CommitTimeOrder to a Sorting::ByCommitTime* variant.
This feature is particularly useful for searching early repository history, which should be orders of magnitude faster with this variant. In my benchmarking I was able to identify the original commit in a large mono-repo source tree using the new OldestFirst variant in 12ms down from 2.6s with NewestFirst.
The implementation logic remains largely agnostic to this change, with only minor adjustments in key areas as necessary.
The reversed order is achieved by inverting the PriorityQueue key with std::cmp::Reverse when an oldest-first traversal is requested. This is what allows the bulk of the original logic to remain unchanged.