Improve `next_cache_entry` cache traversal performance #429

vdye · 2021-09-21T18:24:48Z

Changes

Update next_cache_entry to accept a "cached" starting position for index search
- Because next_cache_entry searches the src_index, which shouldn't be modified in-place over the course of unpack_trees, the first search position for the next next_cache_entry call can be last_position + 1

This might also help with git blame performance, if it's using unpack_trees (CC: @ldennington).

Performance

This change ultimately only had an effect on sparse index checkouts, as demonstrated by the performance test results:

Test                                              ms/vfs-2.33.0     HEAD                  
------------------------------------------------------------------------------------------
2000.2: git reset -- does-not-exist (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
2000.3: git reset -- does-not-exist (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2% 
2000.4: git reset -- does-not-exist (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
2000.5: git reset -- does-not-exist (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

The reason appears to be because, in full indexes, mark_ce_used will advance cache_bottom such that, in next_cache_entry, already-checked entries are skipped. In sparse indexes, however, the presence of sparse directory entries will block the cache_bottom from advancing, so already-checked cache entries are not skipped in next_cache_entry. Since the cache tree still needs cache_bottom to not advance (per 17a1bb5), the transient hint position lets next_cache_entry shortcut already-checked entries even for a sparse index.

derrickstolee · 2021-09-21T18:39:41Z

Changes

Update next_cache_entry to accept a "cached" starting position for index search

Because next_cache_entry searches the src_index, which shouldn't be modified in-place over the course of unpack_trees, the first search position for the next next_cache_entry call can be last_position + 1

I think this is a very promising direction, especially in the short term.

Test                                              ms/vfs-2.33.0     HEAD                  
------------------------------------------------------------------------------------------
2000.2: git reset -- does-not-exist (full-v3)     0.79(0.38+0.30)   0.91(0.43+0.34) +15.2%
2000.3: git reset -- does-not-exist (full-v4)     0.80(0.38+0.29)   0.85(0.40+0.35) +6.2% 
2000.4: git reset -- does-not-exist (sparse-v3)   0.76(0.43+0.69)   0.44(0.08+0.67) -42.1%
2000.5: git reset -- does-not-exist (sparse-v4)   0.71(0.40+0.65)   0.41(0.09+0.65) -42.3%

This is great!

The reason appears to be because, in full indexes, mark_ce_used will advance cache_bottom such that, in next_cache_entry, already-checked entries are skipped. In sparse indexes, however, the presence of sparse directory entries will block the cache_bottom from advancing, so already-checked cache entries are not skipped in next_cache_entry. Since the cache tree still needs cache_bottom to not advance (per 17a1bb5), the transient hint position lets next_cache_entry shortcut already-checked entries even for a sparse index.

This is a good point to indicate that cache_bottom is not advancing the same way with sparse directory entries. We might want to dig into that more, since maybe there is something that causes other problems in the future. It's probable that the things I did around cache_bottom could be done better.

But let's make a note to explore cache_bottom later (I'm thinking October timeframe, still with time to respond before the 2.34 release), but this is an excellent short-term step.

derrickstolee

I was worried that the changes here could be short-term and want to be removed in the future, but it seems like adding this "hint memory" to next_cache_entry() is a good thing to do, anyway. It avoids quadratic growth in the algorithm!

Just a couple nits.

t/perf/p2000-sparse-operations.sh

unpack-trees.c

derrickstolee · 2021-09-21T19:01:28Z

unpack-trees.c

+		int hint = -1;
 		while (1) {
 			int cmp;
 			struct cache_entry *ce;

 			if (o->diff_index_cached)
-				ce = next_cache_entry(o);
+				ce = next_cache_entry(o, &hint);


Now that I look closely at your implementation, this old code had quadratic complexity! That helps me understand why having ~70,000 index entries still took 5 seconds to iterate.

I suppose I was expecting hint to be initialized with some information other than -1, but this seems to be a good idea regardless of any issues with the sparse index.

Intended to test handling of cache entries outside `traverse_trees` (accomplished here by having the pathspec match no cache entries). Signed-off-by: Victoria Dye <vdye@github.com>

To find the first non-unpacked cache entry, `next_cache_entry` iterates through index, starting at `cache_bottom`, to find the first cache entry. The performance of this in full indexes is helped by `cache_bottom` advancing with each invocation of `mark_ce_used` (called by `unpack_index_entry`). However, the presence of sparse directories can prevent the `cache_bottom` from advancing in a sparse index case, effectively forcing `next_cache_entry` to search from the beginning of the index each time it is called. The need to preserve `cache_bottom` for the sparse index is documented in 17a1bb5, so to get the benefit it provides in "shortcutting" already-searched entries a separate `hint` position is used. The hint position is set inside of `next_cache_entry` to `last_searched_position + 1`, allowing full _and_ sparse index iterations to skip already-searched entries. The performance is significantly improved for the sparse index case based on the `p2000` results for a `git reset` with a non-matching pathspec (heavily using `next_cache_entry`): Test ms/vfs-2.33.0 HEAD ------------------------------------------------------ (full-v3) 0.79(0.38+0.30) 0.91(0.43+0.34) +15.2% (full-v4) 0.80(0.38+0.29) 0.85(0.40+0.35) +6.2% (sparse-v3) 0.76(0.43+0.69) 0.44(0.08+0.67) -42.1% (sparse-v4) 0.71(0.40+0.65) 0.41(0.09+0.65) -42.3% Signed-off-by: Victoria Dye <vdye@github.com>

ldennington · 2021-09-22T18:05:05Z

unpack-trees.c

 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;

+	if (*hint > pos)
+		pos = *hint;
+
 	while (pos < index->cache_nr) {


Sorry for the retroactive review comments - if it would be better to discuss on a call lmk!

I see index->cache_nr frequently but am having some trouble understanding it/finding docs on what it means (I think you mentioned "cache number," but I'm still not sure why that's significant.

cache_nr is the number of entries in the cache array.

In the Git codebase, you'll see a lot of X_nr and X_alloc representing that there are X_nr elements in an array currently allocated to have X_alloc entries.

ldennington · 2021-09-22T18:08:06Z

unpack-trees.c

@@ -665,17 +665,24 @@ static void mark_ce_used_same_name(struct cache_entry *ce,
 	}
 }

-static struct cache_entry *next_cache_entry(struct unpack_trees_options *o)
+static struct cache_entry *next_cache_entry(struct unpack_trees_options *o, int *hint)
 {
 	const struct index_state *index = o->src_index;
 	int pos = o->cache_bottom;


I know cache_bottom is critical to this change, but a little unsure what it means?

As we scan the index, we are also drilling down the trees, so the cache-tree extension (if available) is giving us some idea of the range of index entries corresponding to a given tree. This can be computed dynamically, too. The cache_bottom value is storing the expected lower-end of this range. As we recurse into subtrees, the cache_bottom is stored in a stack variable somewhere and then the o->cache_bottom value is rewritten as the recursion ends.

In general: the cache_bottom thing is necessary to make these index scans work properly, but it is complicated because the index is a flat list of files. It's just one more thing that is added on top to try and make sense of that difference with the trees in the object database.

Improve `next_cache_entry` cache traversal performance

…aversal Improve `next_cache_entry` cache traversal performance

vdye requested a review from derrickstolee September 21, 2021 18:24

vdye self-assigned this Sep 21, 2021

derrickstolee approved these changes Sep 21, 2021

View reviewed changes

vdye added 2 commits September 21, 2021 16:06

fixup! p2000: add new commands

aed0843

Intended to test handling of cache entries outside `traverse_trees` (accomplished here by having the pathspec match no cache entries). Signed-off-by: Victoria Dye <vdye@github.com>

vdye merged commit f2bf729 into microsoft:vfs-2.33.0 Sep 21, 2021

vdye deleted the unpack-trees/fast-cache-traversal branch September 21, 2021 20:52

ldennington reviewed Sep 22, 2021

View reviewed changes

ldennington mentioned this pull request Sep 22, 2021

blame: enable and test the sparse index #431

Merged

dscho pushed a commit that referenced this pull request Oct 30, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

1d3fa0b

Improve `next_cache_entry` cache traversal performance

derrickstolee pushed a commit that referenced this pull request Oct 30, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

c23164a

Improve `next_cache_entry` cache traversal performance

derrickstolee pushed a commit that referenced this pull request Oct 31, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

5c2b9b4

Improve `next_cache_entry` cache traversal performance

derrickstolee pushed a commit that referenced this pull request Nov 4, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

738762b

Improve `next_cache_entry` cache traversal performance

derrickstolee pushed a commit that referenced this pull request Nov 4, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

9d20b37

Improve `next_cache_entry` cache traversal performance

derrickstolee pushed a commit that referenced this pull request Nov 10, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

c95decf

Improve `next_cache_entry` cache traversal performance

derrickstolee pushed a commit that referenced this pull request Nov 15, 2021

Merge pull request #429 from vdye/unpack-trees/fast-cache-traversal

fa511ac

Improve `next_cache_entry` cache traversal performance

ldennington pushed a commit to ldennington/git that referenced this pull request Jan 19, 2022

Merge pull request microsoft#429 from vdye/unpack-trees/fast-cache-tr…

8f7a923

…aversal Improve `next_cache_entry` cache traversal performance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `next_cache_entry` cache traversal performance #429

Improve `next_cache_entry` cache traversal performance #429

vdye commented Sep 21, 2021

derrickstolee commented Sep 21, 2021

Changes

derrickstolee left a comment

derrickstolee Sep 21, 2021

ldennington Sep 22, 2021

derrickstolee Sep 22, 2021

ldennington Sep 22, 2021

ldennington Sep 22, 2021

derrickstolee Sep 22, 2021

Improve next_cache_entry cache traversal performance #429

Improve next_cache_entry cache traversal performance #429

Conversation

vdye commented Sep 21, 2021

Changes

Performance

derrickstolee commented Sep 21, 2021

Changes

derrickstolee left a comment

Choose a reason for hiding this comment

derrickstolee Sep 21, 2021

Choose a reason for hiding this comment

ldennington Sep 22, 2021

Choose a reason for hiding this comment

derrickstolee Sep 22, 2021

Choose a reason for hiding this comment

ldennington Sep 22, 2021

Choose a reason for hiding this comment

ldennington Sep 22, 2021

Choose a reason for hiding this comment

derrickstolee Sep 22, 2021

Choose a reason for hiding this comment

Improve `next_cache_entry` cache traversal performance #429

Improve `next_cache_entry` cache traversal performance #429