Skip to content

Commit

Permalink
cache-tree: integrate with sparse directory entries
Browse files Browse the repository at this point in the history
The cache-tree extension was previously disabled with sparse indexes.
However, the cache-tree is an important performance feature for commands
like 'git status' and 'git add'. Integrate it with sparse directory
entries.

When writing a sparse index, completely clear and recalculate the cache
tree. By starting from scratch, the only integration necessary is to
check if we hit a sparse directory entry and create a leaf of the
cache-tree that has an entry_count of one and no subtrees.

Once the cache-tree exists within a sparse index, we finally get
improved performance. I test the sparse index performance using a
private monorepo with over 2.1 million files at HEAD, but with a
sparse-checkout definition that has only 68,000 paths in the populated
cone. The sparse index has about 2,000 sparse directory entries. I
compare three scenarios:

 1. Use the full index. The index size is ~186 MB.
 2. Use the sparse index. The index size is ~5.5 MB.
 3. Use a commit where HEAD matches the populated set. The full index
    size is ~5.3MB.

The third benchmark is included as a theoretical optimium for a
repository of the same object database.

First, a clean 'git status' improves from 3.1s to 240ms.

Benchmark #1: full index (git status)
  Time (mean ± σ):      3.167 s ±  0.036 s    [User: 2.006 s, System: 1.078 s]
  Range (min … max):    3.100 s …  3.208 s    10 runs

Benchmark #2: sparse index (git status)
  Time (mean ± σ):     239.5 ms ±   8.1 ms    [User: 189.4 ms, System: 226.8 ms]
  Range (min … max):   226.0 ms … 251.9 ms    13 runs

Benchmark #3: small tree (git status)
  Time (mean ± σ):     195.3 ms ±   4.5 ms    [User: 116.5 ms, System: 84.4 ms]
  Range (min … max):   188.8 ms … 202.8 ms    15 runs

The optimimum is still 45ms faster. This is due in part to the 2,000+
sparse directory entries, but there might be other optimizations to make
in the sparse-index case. In particular, I find that this performance
difference disappears when I disable FS Monitor, which is somewhat
disabled in the sparse-index case, but might still be adding overhead.

The performance numbers for 'git add .' are much closer to optimal:

Benchmark #1: full index (git add .)
  Time (mean ± σ):      3.076 s ±  0.022 s    [User: 2.065 s, System: 0.943 s]
  Range (min … max):    3.044 s …  3.116 s    10 runs

Benchmark #2: sparse index (git add .)
  Time (mean ± σ):     218.0 ms ±   6.6 ms    [User: 195.7 ms, System: 206.6 ms]
  Range (min … max):   209.8 ms … 228.2 ms    13 runs

Benchmark #3: small tree (git add .)
  Time (mean ± σ):     217.6 ms ±   5.4 ms    [User: 131.9 ms, System: 86.7 ms]
  Range (min … max):   212.1 ms … 228.4 ms    14 runs

In this test, I also used "echo >>README.md" to append a line to the
README.md file, so the 'git add .' command is doing _something_ other
than a no-op. Without this edit (and FS Monitor enabled) the small
tree case again gains about 30ms on the sparse index case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
  • Loading branch information
derrickstolee committed Jan 25, 2021
1 parent 5f53b08 commit 05e7548
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 1 deletion.
18 changes: 18 additions & 0 deletions cache-tree.c
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,24 @@ static int update_one(struct cache_tree *it,

*skip_count = 0;

/*
* If the first entry of this region is a sparse directory
* entry corresponding exactly to 'base', then this cache_tree
* struct is a "leaf" in the data structure, pointing to the
* tree OID specified in the entry.
*/
if (entries > 0) {
const struct cache_entry *ce = cache[0];

if (S_ISSPARSEDIR(ce) &&
ce->ce_namelen == baselen &&
!strncmp(ce->name, base, baselen)) {
it->entry_count = 1;
oidcpy(&it->oid, &ce->oid);
return 1;
}
}

if (0 <= it->entry_count && has_object_file(&it->oid))
return it->entry_count;

Expand Down
10 changes: 9 additions & 1 deletion sparse-index.c
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,11 @@ int convert_to_sparse(struct index_state *istate)
istate->cache_nr = convert_to_sparse_rec(istate,
0, 0, istate->cache_nr,
"", 0, istate->cache_tree);
istate->drop_cache_tree = 1;

/* Clear and recompute the cache-tree */
cache_tree_free(&istate->cache_tree);
cache_tree_update(istate, 0);

istate->sparse_index = 1;
trace2_region_leave("index", "convert_to_sparse", istate->repo);
return 0;
Expand Down Expand Up @@ -278,6 +282,10 @@ void ensure_full_index(struct index_state *istate)

free(full);

/* Clear and recompute the cache-tree */
cache_tree_free(&istate->cache_tree);
cache_tree_update(istate, 0);

trace2_region_leave("index", "ensure_full_index", istate->repo);
}

Expand Down

0 comments on commit 05e7548

Please sign in to comment.