Skip to content

Commit

Permalink
pack-objects: create new name-hash algorithm (#5157)
Browse files Browse the repository at this point in the history
This is an updated version of gitgitgadget#1785, intended for early
consumption into Git for Windows.

The idea here is to add a new `--full-name-hash` option to `git
pack-objects` and `git repack`. This adjusts the name-hash value used
for finding delta bases in such a way that uses the full path name with
a lower likelihood of collisions than the default name-hash algorithm.
In many repositories with name-hash collisions and many versions of
those paths, this can significantly reduce the size of a full repack. It
can also help in certain cases of `git push`, but only if the pack is
already artificially inflated by name-hash collisions; cases that find
"sibling" deltas as better choices become worse with `--full-name-hash`.

Thus, this option is currently recommended for full repacks of large
repos, and on client machines without reachability bitmaps.

Some care is taken to ignore this option when using bitmaps, either
writing bitmaps or using a bitmap walk during reads. The bitmap file
format contains name-hash values, but no way to indicate which function
is used, so compatibility is a concern for bitmaps. Future work could
explore this idea.

After this PR is merged, then the more-involved `--path-walk` option may
be considered.
  • Loading branch information
dscho committed Nov 22, 2024
2 parents b48d75b + 1e01fd3 commit 7edfb70
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 2 deletions.
2 changes: 1 addition & 1 deletion builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -4443,7 +4443,7 @@ int cmd_pack_objects(int argc,
N_("protocol"),
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
OPT_BOOL(0, "full-name-hash", &use_full_name_hash,
N_("optimize delta compression across identical path names over time")),
N_("(EXPERIMENTAL!) optimize delta compression across identical path names over time")),
OPT_END(),
};

Expand Down
2 changes: 1 addition & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -1209,7 +1209,7 @@ int cmd_repack(int argc,
OPT_BOOL('F', NULL, &po_args.no_reuse_object,
N_("pass --no-reuse-object to git-pack-objects")),
OPT_BOOL(0, "full-name-hash", &po_args.full_name_hash,
N_("pass --full-name-hash to git-pack-objects")),
N_("(EXPERIMENTAL!) pass --full-name-hash to git-pack-objects")),
OPT_NEGBIT('n', NULL, &run_update_server_info,
N_("do not run git-update-server-info"), 1),
OPT__QUIET(&po_args.quiet, N_("be quiet")),
Expand Down
25 changes: 25 additions & 0 deletions t/perf/p5313-pack-objects.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ test_size 'thin pack size with --full-name-hash' '
test_file_size out
'

test_perf 'thin pack with --full-name-hash' '
git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out
'

test_size 'thin pack size with --full-name-hash' '
test_file_size out
'

test_perf 'big pack' '
git pack-objects --stdout --revs --sparse <in-big >out
'
Expand Down Expand Up @@ -74,6 +82,14 @@ test_size 'shallow pack size with --full-name-hash' '
test_file_size out
'

test_perf 'big pack with --full-name-hash' '
git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out
'

test_size 'big pack size with --full-name-hash' '
test_file_size out
'

test_perf 'repack' '
git repack -adf
'
Expand All @@ -92,4 +108,13 @@ test_size 'repack size with --full-name-hash' '
test_file_size "$pack"
'

test_perf 'repack with --full-name-hash' '
git repack -adf --full-name-hash
'

test_size 'repack size with --full-name-hash' '
pack=$(ls .git/objects/pack/pack-*.pack) &&
test_file_size "$pack"
'

test_done

0 comments on commit 7edfb70

Please sign in to comment.