Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make 'ort' the default merge strategy #404

Merged
merged 12 commits into from
Aug 4, 2021

Conversation

derrickstolee
Copy link
Collaborator

Replaces #400, which I was using for testing.

The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across microsoft/git, it will be a critical step to allowing git merge to work quickly with sparse index. In my testing of a prototype, I was able to get git merge commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.

@derrickstolee derrickstolee self-assigned this Aug 2, 2021
When --rebase-merges was first introduced, it only worked with the
`recursive` strategy.  Some time later, it gained support for merges
using the `octopus` strategy.  The limitation of only supporting these
two strategies was documented in 25cff9f ("rebase -i --rebase-merges:
add a section to the man page", 2018-04-25) and lifted in e145d99
("rebase -r: support merge strategies other than `recursive`",
2019-07-31).  However, when the limitation was lifted, the documentation
was not updated.  Update it now.

Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
…zations

In commit 0c4fd73 ("Move computation of dir_rename_count from
merge-ort to diffcore-rename", 2021-02-27), much of the logic for
computing directory renames moved into diffcore-rename.
directory-rename-detection.txt had claims that all of that logic was
found in merge-recursive.  Update the documentation.

Signed-off-by: Elijah Newren <newren@gmail.com>
A few places in the documentation referred to the "`recursive` strategy"
using the phrase "`git merge-recursive`", suggesting that it was forking
subprocesses to call a toplevel builtin.  Perhaps that was relevant to
when rebase was a shell script, but it seems like a rather indirect way
to refer to the `recursive` strategy.  Simplify the references.

Signed-off-by: Elijah Newren <newren@gmail.com>
It is probably helpful to cover the default merge strategy first, so
move the text for the resolve strategy to later in the document.

Further, the wording for "resolve" claimed that it was "considered
generally safe and fast", which might imply in some readers minds that
the same is not true of other strategies.  Rather than adding this text
to all the strategies, just remove it from this one.

Signed-off-by: Elijah Newren <newren@gmail.com>
Stating that the recursive strategy "currently cannot make use of
detected copies" implies that this is a technical shortcoming of the
current algorithm.  I disagree with that.  I don't see how copies could
possibly be used in a sane fashion in a merge algorithm -- would we
propagate changes in one file on one side of history to each copy of
that file when merging?  That makes no sense to me.  I cannot think of
anything else that would make sense either.  Change the wording to
simply state that we ignore any copies.

Signed-off-by: Elijah Newren <newren@gmail.com>
…orithm

We already have diff-algorithm that explains why there are special diff
algorithms, so we do not need to re-explain patience.  patience exists
as its own toplevel option for historical reasons, but there's no reason
to give it special preference or document it again and suggest it's more
important than other diff algorithms, so just refer to it as a
deprecated shorthand for `diff-algorithm=patience`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Commit 58634db ("rebase: Allow merge strategies to be used when
rebasing", 2006-06-21) added the --merge option to git-rebase so that
renames could be detected (at least when using the `recursive` merge
backend).  However, git-am -3 gained that same ability in commit
579c9bb ("Use merge-recursive in git-am -3.", 2006-12-28).  As such,
the comment about being able to detect renames is not particularly
noteworthy.  Remove it.  While tweaking this description, add a quick
comment about when --merge became the default.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
There were two locations in the code that referred to 'merge-recursive'
but which were also applicable to 'merge-ort'.  Update them to more
general wording.

Signed-off-by: Elijah Newren <newren@gmail.com>
There are a few reasons to switch the default:
  * Correctness
  * Extensibility
  * Performance

I'll provide some summaries about each.

=== Correctness ===

The original impetus for a new merge backend was to fix issues that were
difficult to fix within recursive's design.  The success with this goal
is perhaps most easily demonstrated by running the following:

  $ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
  $ git grep test_expect_merge_algorithm.failure.success t/
  $ git grep test_expect_merge_algorithm.success.failure t/

In order, these greps show:

  * Seven sets of submodule tests (10 total tests) that fail with
    recursive but succeed with ort
  * 22 other tests that fail with recursive, but succeed with ort
  * 0 tests that pass with recursive, but fail with ort

=== Extensibility ===

Being able to perform merges without touching the working tree or index
makes it possible to create new features that were difficult with the
old backend:

  * Merging, cherry-picking, rebasing, reverting in bare repositories...
    or just on branches that aren't checked out.

  * `git diff AUTO_MERGE` -- ability to see what changes the user has
    made to resolve conflicts so far (see commit 5291828 ("merge-ort:
    write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)

  * A --remerge-diff option for log/show, used to show diffs for merges
    that display the difference between what an automatic merge would
    have created and what was recorded in the merge.  (This option will
    often result in an empty diff because many merges are clean, but for
    the non-clean ones it will show how conflicts were fixed including
    the removal of conflict markers, and also show additional changes
    made outside of conflict regions to e.g. fix semantic conflicts.)

  * A --remerge-diff-only option for log/show, similar to --remerge-diff
    but also showing how cherry-picks or reverts differed from what an
    automatic cherry-pick or revert would provide.

The last three have been implemented already (though only one has been
submitted upstream so far; the others were waiting for performance work
to complete), and I still plan to implement the first one.

=== Performance ===

I'll quote from the summary of my final optimization for merge-ort
(while fixing the testcase name from 'no-renames' to 'few-renames'):

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
few-renames:      18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
few-renames:        1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565

And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3

Signed-off-by: Elijah Newren <newren@gmail.com>
Make it clear that `ort` is the default merge strategy now rather than
`recursive`, including moving `ort` to the front of the list of merge
strategies.

Signed-off-by: Elijah Newren <newren@gmail.com>
@derrickstolee
Copy link
Collaborator Author

This version is updated with Elijah's latest versions on the mailing list, including a rearrangement of the doc updates.

@derrickstolee derrickstolee merged commit bb8d8c3 into microsoft:vfs-2.32.0 Aug 4, 2021
derrickstolee added a commit that referenced this pull request Aug 4, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
derrickstolee added a commit that referenced this pull request Aug 5, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
dscho pushed a commit that referenced this pull request Aug 5, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
derrickstolee added a commit that referenced this pull request Aug 9, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
derrickstolee added a commit that referenced this pull request Aug 9, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
derrickstolee added a commit that referenced this pull request Aug 9, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
derrickstolee added a commit that referenced this pull request Aug 12, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
derrickstolee added a commit that referenced this pull request Aug 17, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
dscho pushed a commit that referenced this pull request Oct 30, 2021
The 'ort' strategy is a new algorithm to replace the 'recursive' merge strategy. I've been reviewing some of the performance patches upstream, many of which are already in Git 2.32.0 (more coming in 2.33.0) and even with the ones already included, it is a clear performance win for our large repos.

I tested on the Office monorepo and consistently saw merge times in the 5-6 second range. With the 'recursive' strategy, these would range from 7-20 seconds. My tests reproduced merges found within the commit history, and the ones that succeeded without conflicts matched the committed changes. There were even a few where the 'recursive' strategy did not resolve to the committed change, but the 'ort' version did (probably because of better rename detection).

Not only is this a beneficial performance change for our users across `microsoft/git`, it will be a critical step to allowing `git merge` to work quickly with sparse index. In my testing of a prototype, I was able to get `git merge` commands with sparse index and the 'ort' strategy down to 0.5-1.5 seconds in most cases. (Cases with a merge conflict outside of the sparse-checkout definition jumped back up to the 6-7 second range, which is expected, and should be rare.)

cc: @newren for awareness. Thanks for the patches! These were applied from those sent to the list via git#1055.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants