Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve next_cache_entry cache traversal performance #429

Merged
merged 2 commits into from
Sep 21, 2021
Merged

Improve next_cache_entry cache traversal performance #429

merged 2 commits into from
Sep 21, 2021

Conversation

vdye
Copy link

@vdye vdye commented Sep 21, 2021

Changes

  • Update next_cache_entry to accept a "cached" starting position for index search
    • Because next_cache_entry searches the src_index, which shouldn't be modified in-place over the course of unpack_trees, the first search position for the next next_cache_entry call can be last_position + 1

This might also help with git blame performance, if it's using unpack_trees (CC: @ldennington).

Performance

This change ultimately only had an effect on sparse index checkouts, as demonstrated by the performance test results:

Test                                              ms/vfs-2.33.0     HEAD                  
------------------------------------------------------------------------------------------
2000.2: git reset -- does-not-exist (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
2000.3: git reset -- does-not-exist (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2% 
2000.4: git reset -- does-not-exist (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
2000.5: git reset -- does-not-exist (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

The reason appears to be because, in full indexes, mark_ce_used will advance cache_bottom such that, in next_cache_entry, already-checked entries are skipped. In sparse indexes, however, the presence of sparse directory entries will block the cache_bottom from advancing, so already-checked cache entries are not skipped in next_cache_entry. Since the cache tree still needs cache_bottom to not advance (per 17a1bb5), the transient hint position lets next_cache_entry shortcut already-checked entries even for a sparse index.

@vdye vdye requested a review from derrickstolee September 21, 2021 18:24
@vdye vdye self-assigned this Sep 21, 2021
@derrickstolee
Copy link
Collaborator

Changes

  • Update next_cache_entry to accept a "cached" starting position for index search

    • Because next_cache_entry searches the src_index, which shouldn't be modified in-place over the course of unpack_trees, the first search position for the next next_cache_entry call can be last_position + 1

I think this is a very promising direction, especially in the short term.

Test                                              ms/vfs-2.33.0     HEAD                  
------------------------------------------------------------------------------------------
2000.2: git reset -- does-not-exist (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
2000.3: git reset -- does-not-exist (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2% 
2000.4: git reset -- does-not-exist (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
2000.5: git reset -- does-not-exist (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

This is great!

The reason appears to be because, in full indexes, mark_ce_used will advance cache_bottom such that, in next_cache_entry, already-checked entries are skipped. In sparse indexes, however, the presence of sparse directory entries will block the cache_bottom from advancing, so already-checked cache entries are not skipped in next_cache_entry. Since the cache tree still needs cache_bottom to not advance (per 17a1bb5), the transient hint position lets next_cache_entry shortcut already-checked entries even for a sparse index.

This is a good point to indicate that cache_bottom is not advancing the same way with sparse directory entries. We might want to dig into that more, since maybe there is something that causes other problems in the future. It's probable that the things I did around cache_bottom could be done better.

But let's make a note to explore cache_bottom later (I'm thinking October timeframe, still with time to respond before the 2.34 release), but this is an excellent short-term step.

Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that the changes here could be short-term and want to be removed in the future, but it seems like adding this "hint memory" to next_cache_entry() is a good thing to do, anyway. It avoids quadratic growth in the algorithm!

Just a couple nits.

t/perf/p2000-sparse-operations.sh Show resolved Hide resolved
unpack-trees.c Outdated Show resolved Hide resolved
Comment on lines +1398 to +1402
int hint = -1;
while (1) {
int cmp;
struct cache_entry *ce;

if (o->diff_index_cached)
ce = next_cache_entry(o);
ce = next_cache_entry(o, &hint);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I look closely at your implementation, this old code had quadratic complexity! That helps me understand why having ~70,000 index entries still took 5 seconds to iterate.

I suppose I was expecting hint to be initialized with some information other than -1, but this seems to be a good idea regardless of any issues with the sparse index.

Intended to test handling of cache entries outside `traverse_trees`
(accomplished here by having the pathspec match no cache entries).

Signed-off-by: Victoria Dye <vdye@github.com>
To find the first non-unpacked cache entry, `next_cache_entry` iterates through
index, starting at `cache_bottom`, to find the first cache entry. The
performance of this in full indexes is helped by `cache_bottom` advancing with
each invocation of `mark_ce_used` (called by `unpack_index_entry`). However,
the presence of sparse directories can prevent the `cache_bottom` from
advancing in a sparse index case, effectively forcing `next_cache_entry` to
search from the beginning of the index each time it is called.

The need to preserve `cache_bottom` for the sparse index is documented in
17a1bb5, so to get the benefit it provides in "shortcutting"
already-searched entries a separate `hint` position is used. The hint position
is set inside of `next_cache_entry` to `last_searched_position + 1`, allowing
full _and_ sparse index iterations to skip already-searched entries.  The
performance is significantly improved for the sparse index case based on the
`p2000` results for a `git reset` with a non-matching pathspec (heavily using
`next_cache_entry`):

Test          ms/vfs-2.33.0     HEAD
------------------------------------------------------
(full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
(full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2%
(sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
(sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

Signed-off-by: Victoria Dye <vdye@github.com>
@vdye vdye merged commit f2bf729 into microsoft:vfs-2.33.0 Sep 21, 2021
@vdye vdye deleted the unpack-trees/fast-cache-traversal branch September 21, 2021 20:52
{
const struct index_state *index = o->src_index;
int pos = o->cache_bottom;

if (*hint > pos)
pos = *hint;

while (pos < index->cache_nr) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the retroactive review comments - if it would be better to discuss on a call lmk!

I see index->cache_nr frequently but am having some trouble understanding it/finding docs on what it means (I think you mentioned "cache number," but I'm still not sure why that's significant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache_nr is the number of entries in the cache array.

In the Git codebase, you'll see a lot of X_nr and X_alloc representing that there are X_nr elements in an array currently allocated to have X_alloc entries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -665,17 +665,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
}
}

static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
{
const struct index_state *index = o->src_index;
int pos = o->cache_bottom;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know cache_bottom is critical to this change, but a little unsure what it means?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we scan the index, we are also drilling down the trees, so the cache-tree extension (if available) is giving us some idea of the range of index entries corresponding to a given tree. This can be computed dynamically, too. The cache_bottom value is storing the expected lower-end of this range. As we recurse into subtrees, the cache_bottom is stored in a stack variable somewhere and then the o->cache_bottom value is rewritten as the recursion ends.

In general: the cache_bottom thing is necessary to make these index scans work properly, but it is complicated because the index is a flat list of files. It's just one more thing that is added on top to try and make sense of that difference with the trees in the object database.

dscho pushed a commit that referenced this pull request Oct 30, 2021
Improve `next_cache_entry` cache traversal performance
derrickstolee pushed a commit that referenced this pull request Oct 30, 2021
Improve `next_cache_entry` cache traversal performance
derrickstolee pushed a commit that referenced this pull request Oct 31, 2021
Improve `next_cache_entry` cache traversal performance
derrickstolee pushed a commit that referenced this pull request Nov 4, 2021
Improve `next_cache_entry` cache traversal performance
derrickstolee pushed a commit that referenced this pull request Nov 4, 2021
Improve `next_cache_entry` cache traversal performance
derrickstolee pushed a commit that referenced this pull request Nov 10, 2021
Improve `next_cache_entry` cache traversal performance
derrickstolee pushed a commit that referenced this pull request Nov 15, 2021
Improve `next_cache_entry` cache traversal performance
ldennington pushed a commit to ldennington/git that referenced this pull request Jan 19, 2022
…aversal

Improve `next_cache_entry` cache traversal performance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants