Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Index: diff and blame builtins #1050

Conversation

ldennington
Copy link

@ldennington ldennington commented Oct 1, 2021

This series is based on vd/sparse-reset. It integrates the sparse index with git diff and git blame and includes:

  1. tests added to t1092 and p2000 to establish the baseline functionality of the commands
  2. repository settings to enable the sparse index

The p2000 tests demonstrate a ~44% execution time reduction for 'git diff' and a ~86% execution time reduction for 'git diff --staged' using a sparse index. For 'git blame', the reduction time was ~60% for a file two levels deep and ~30% for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.30: git diff (full-v3)                  0.33    0.34 +3.0%
2000.31: git diff (full-v4)                  0.33    0.35 +6.1%
2000.32: git diff (sparse-v3)                0.53    0.31 -41.5%
2000.33: git diff (sparse-v4)                0.54    0.29 -46.3%
2000.34: git diff --cached (full-v3)         0.07    0.07 +0.0%
2000.35: git diff --cached (full-v4)         0.07    0.08 +14.3%
2000.36: git diff --cached (sparse-v3)       0.28    0.04 -85.7%
2000.37: git diff --cached (sparse-v4)       0.23    0.03 -87.0%
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

Changes since V1

  • Fix failing diff partially-staged test in t1092-sparse-checkout-compatibility.sh, which was breaking in seen.

Changes since V2

  • Update diff commit description to include patches that make the checkout and status commands work with the sparse index for readers to reference.
  • Add new test case to verify diff behaves as expected when run against files outside the sparse checkout cone.
  • Indent error message in blame commit
  • Check error message in blame with pathspec outside sparse definition test matches expectations.
  • Loop blame tests (instead of running the same command multiple time against different files).

Changes since V3

  • Update diff p2000 tests to use --cached instead of --staged. Execute new run and update results in commit description and cover letter.
  • Update comment on blame with pathspec outside sparse definition test in t1092-sparse-checkout-compatibility.sh to clarify that it tests the current state and could be improved in the future.
  • Ensure sparse index is only activated when diff is running against files in a Git repo.
  • BUG if prepare_repo_settings() is called outside a repository.
  • Ensure sparse index is not activated for calls to blame, checkout, or pack-object with -h.
  • Ensure commit-graph is only loaded if a git directory exists.

Changes since V4

  • Remove startup_info->have_repository check from checkout, pack-objects, and blame. Update git.c to no longer bypass setup when -h is passed instead.
  • Move commit-graph, test-read-cache, and repo-settings changes into their own patches with details in commit description of why the changes are being made.
  • Update t1092-sparse-checkout-compatibility.sh tests to use --cached instead of --staged.
  • Use 10-character hash abbreviations for commits referenced in diff commit message.
  • Clarify that being unable to blame files outside the working directory is not supported in either sparse or non-sparse checkouts both in comment on blame with pathspec outside sparse definition test in t1092-sparse-checkout-compatibility.sh and blame commit message.

Changes since V5

  • Fix commit message typo.
  • Re-add blank line to separate variable declarations from statements in run_builtin.
  • Refactor prefix NULL assignment in run_builtin.

Thanks,
Lessley

cc: stolee@gmail.com
cc: gitster@pobox.com
cc: newren@gmail.com
cc: Taylor Blau me@ttaylorr.com

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 1, 2021

Welcome to GitGitGadget

Hi @ldennington, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@gitgitgadget gitgitgadget bot added the new user label Oct 1, 2021
@derrickstolee
Copy link

/allow

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 4, 2021

User ldennington is now allowed to use GitGitGadget.

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just a couple nits to clean up before submitting. When @vdye's #1048 gets a tracking branch, we can rebase onto that and target it in this PR. We might still want to delay for that one to stabilize, but we can do the prep work early.

builtin/diff.c Outdated Show resolved Hide resolved
t/t1092-sparse-checkout-compatibility.sh Show resolved Hide resolved
@ldennington ldennington force-pushed the diff-blame-sparse-index branch 3 times, most recently from 8aca481 to 506e41b Compare October 6, 2021 14:15
@derrickstolee derrickstolee changed the base branch from master to vd/sparse-reset October 13, 2021 19:44
@ldennington ldennington changed the base branch from vd/sparse-reset to master October 13, 2021 20:05
@ldennington
Copy link
Author

/preview

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 13, 2021

Preview email sent as pull.1050.git.1634155628.gitgitgadget@gmail.com

@ldennington ldennington changed the base branch from master to vd/sparse-reset October 13, 2021 20:24
@ldennington ldennington force-pushed the diff-blame-sparse-index branch 2 times, most recently from 3f0a15b to 6b0bccd Compare October 13, 2021 22:29
@ldennington
Copy link
Author

/preview

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

Preview email sent as pull.1050.git.1634229968.gitgitgadget@gmail.com

@ldennington
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

HttpError: Server Error

@derrickstolee
Copy link

While the PR handler did something strange with the error message, your patches were emailed:

https://lore.kernel.org/git/pull.1050.git.1634232352.gitgitgadget@gmail.com/

@dscho
Copy link
Member

dscho commented Oct 14, 2021

HttpError: Server Error

Here is where that error comes from: https://dev.azure.com/gitgitgadget/git/_build/results?buildId=112911&view=logs&j=fd490c07-0b22-5182-fac9-6d67fe1e939b&t=28e14b79-1d95-5195-ee03-3a68ac48a418&l=26. The comment should have read thusly:

Submitted as pull.1050.git.1634232352.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v1

To fetch this version to local tag pr-1050/ldennington/diff-blame-sparse-index-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1050/ldennington/diff-blame-sparse-index-v1

@ldennington
Copy link
Author

HttpError: Server Error

Here is where that error comes from: https://dev.azure.com/gitgitgadget/git/_build/results?buildId=112911&view=logs&j=fd490c07-0b22-5182-fac9-6d67fe1e939b&t=28e14b79-1d95-5195-ee03-3a68ac48a418&l=26. The comment should have read thusly:

Submitted as pull.1050.git.1634232352.gitgitgadget@gmail.com
To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v1

To fetch this version to local tag pr-1050/ldennington/diff-blame-sparse-index-v1:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1050/ldennington/diff-blame-sparse-index-v1

Thanks! I will admit I saw it and panicked but then realized the series had made it through and un-panicked 🙂.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

This branch is now known as ld/sparse-diff-blame.

@gitgitgadget
Copy link

gitgitgadget bot commented Oct 14, 2021

This patch series was integrated into seen via git@8ce48c2.

@gitgitgadget gitgitgadget bot added the seen label Oct 14, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Oct 15, 2021

There was a status update in the "New Topics" section about the branch ld/sparse-diff-blame on the Git mailing list:

Teach diff and blame to work well with sparse index.

Seems to break tests (e.g. 1092) when merged to 'seen'.

@dscho
Copy link
Member

dscho commented Oct 15, 2021

Seems to break tests (e.g. 1092) when merged to 'seen'.

Hrm. @ldennington have you seen this?

ldennington and others added 6 commits December 6, 2021 08:56
Return early if git directory does not exist. This will protect against
test failures in the upcoming change to BUG in prepare_repo_settings if no
git directory exists.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Move repo setup to occur after git directory is set up. This will protect
against test failures in the upcoming change to BUG in
prepare_repo_settings if no git directory exists.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Check whether git directory exists before adding any repo settings. If it
does not exist, BUG with the message that one cannot add settings for an
uninitialized repository. If it does exist, proceed with adding repo
settings.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Replace uses of the synonym --staged in t1092 tests with --cached (which
is the real and preferred option). This will allow consistency in the new
tests to be added with the upcoming change to enable the sparse index for
diff.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Enable the sparse index within the 'git diff' command. Its implementation
already safely integrates with the sparse index because it shares code
with the 'git status' and 'git checkout' commands that were already
integrated.  For more details see:

d76723e (status: use sparse-index throughout, 2021-07-14)
1ba5f45 (checkout: stop expanding sparse indexes, 2021-06-29)

The most interesting thing to do is to add tests that verify that 'git
diff' behaves correctly when the sparse index is enabled. These cases are:

1. The index is not expanded for 'diff' and 'diff --staged'
2. 'diff' and 'diff --staged' behave the same in full checkout, sparse
checkout, and sparse index repositories in the following partially-staged
scenarios (i.e. the index, HEAD, and working directory differ at a given
path):
    1. Path is within sparse-checkout cone
    2. Path is outside sparse-checkout cone
    3. A merge conflict exists for paths outside sparse-checkout cone

The `p2000` tests demonstrate a ~44% execution time reduction for 'git
diff' and a ~86% execution time reduction for 'git diff --staged' using a
sparse index:

Test                                      before  after
-------------------------------------------------------------
2000.30: git diff (full-v3)               0.33    0.34 +3.0%
2000.31: git diff (full-v4)               0.33    0.35 +6.1%
2000.32: git diff (sparse-v3)             0.53    0.31 -41.5%
2000.33: git diff (sparse-v4)             0.54    0.29 -46.3%
2000.34: git diff --cached (full-v3)      0.07    0.07 +0.0%
2000.35: git diff --cached (full-v4)      0.07    0.08 +14.3%
2000.36: git diff --cached (sparse-v3)    0.28    0.04 -85.7%
2000.37: git diff --cached (sparse-v4)    0.23    0.03 -87.0%

Co-authored-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Enable the sparse index for the 'git blame' command. The index was already
not expanded with this command, so the most interesting thing to do is to
add tests that verify that 'git blame' behaves correctly when the sparse
index is enabled and that its performance improves. More specifically, these
cases are:

1. The index is not expanded for 'blame' when given paths in the sparse
checkout cone at multiple levels.

2. Performance measurably improves for 'blame' with sparse index when given
paths in the sparse checkout cone at multiple levels.

The `p2000` tests demonstrate a ~60% execution time reduction when running
'blame' for a file two levels deep and and a ~30% execution time reduction
for a file three levels deep.

Test                                         before  after
----------------------------------------------------------------
2000.62: git blame f2/f4/a (full-v3)         0.31    0.32 +3.2%
2000.63: git blame f2/f4/a (full-v4)         0.29    0.31 +6.9%
2000.64: git blame f2/f4/a (sparse-v3)       0.55    0.23 -58.2%
2000.65: git blame f2/f4/a (sparse-v4)       0.57    0.23 -59.6%
2000.66: git blame f2/f4/f3/a (full-v3)      0.77    0.85 +10.4%
2000.67: git blame f2/f4/f3/a (full-v4)      0.78    0.81 +3.8%
2000.68: git blame f2/f4/f3/a (sparse-v3)    1.07    0.72 -32.7%
2000.99: git blame f2/f4/f3/a (sparse-v4)    1.05    0.73 -30.5%

We do not include paths outside the sparse checkout cone because blame
does not support blaming files that are not present in the working
directory. This is true in both sparse and full checkouts.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
@ldennington
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 6, 2021

Submitted as pull.1050.v6.git.1638806161.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git pr-1050/ldennington/diff-blame-sparse-index-v6

To fetch this version to local tag pr-1050/ldennington/diff-blame-sparse-index-v6:

git fetch --no-tags https://github.com/gitgitgadget/git tag pr-1050/ldennington/diff-blame-sparse-index-v6

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 6, 2021

This patch series was integrated into seen via git@f07b0fe.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 7, 2021

This patch series was integrated into seen via git@8e8ef40.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 7, 2021

This patch series was integrated into seen via git@fd11ee3.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 8, 2021

This patch series was integrated into seen via git@f25d949.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 8, 2021

There was a status update in the "Cooking" section about the branch ld/sparse-diff-blame on the Git mailing list:

Teach diff and blame to work well with sparse index.

Will merge to 'next'?
source: <pull.1050.v6.git.1638806161.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 8, 2021

This patch series was integrated into seen via git@83f9bb2.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 9, 2021

This patch series was integrated into seen via git@2b3bd97.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 9, 2021

This patch series was integrated into seen via git@413027d.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 11, 2021

There was a status update in the "Cooking" section about the branch ld/sparse-diff-blame on the Git mailing list:

Teach diff and blame to work well with sparse index.

Will merge to 'master'.
source: <pull.1050.v6.git.1638806161.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

This patch series was integrated into seen via git@ce2086f.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 14, 2021

This patch series was integrated into next via git@57b8993.

@gitgitgadget gitgitgadget bot added the next label Dec 14, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 15, 2021

This patch series was integrated into seen via git@2384e12.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 15, 2021

This patch series was integrated into seen via git@f874499.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 16, 2021

There was a status update in the "Cooking" section about the branch ld/sparse-diff-blame on the Git mailing list:

Teach diff and blame to work well with sparse index.

Will merge to 'master'.
source: <pull.1050.v6.git.1638806161.gitgitgadget@gmail.com>

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into seen via git@8d2c373.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into next via git@8d2c373.

@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

This patch series was integrated into master via git@8d2c373.

@gitgitgadget gitgitgadget bot added the master label Dec 21, 2021
@gitgitgadget gitgitgadget bot closed this Dec 21, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Dec 21, 2021

Closed via 8d2c373.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants