Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pack-objects: add --path-walk option for better deltas #1813

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Documentation/config/feature.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ walking fewer objects.
+
* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
reusing objects from multiple packs instead of just one.
+
* `pack.usePathWalk` may speed up packfile creation and make the packfiles be
significantly smaller in the presence of certain filename collisions with Git's
default name-hash.

feature.manyFiles::
Enable config options that optimize for repos with many files in the
Expand Down
8 changes: 8 additions & 0 deletions Documentation/config/pack.txt
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,14 @@ pack.useSparse::
commits contain certain types of direct renames. Default is
`true`.

pack.usePathWalk::
When true, git will default to using the '--path-walk' option in
'git pack-objects' when the '--revs' option is present. This
algorithm groups objects by path to maximize the ability to
compute delta chains across historical versions of the same
object. This may disable other options, such as using bitmaps to
enumerate objects.

pack.preferBitmapTips::
When selecting which commits will receive bitmaps, prefer a
commit at the tip of any reference that is a suffix of any value
Expand Down
23 changes: 17 additions & 6 deletions Documentation/git-pack-objects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ SYNOPSIS
--------
[verse]
'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=<n>] [--depth=<n>]
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=<n>] [--depth=<n>]
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
[--cruft] [--cruft-expiration=<time>]
[--stdout [--filter=<filter-spec>] | <base-name>]
[--shallow] [--keep-true-parents] [--[no-]sparse]
[--path-walk] < <object-list>


DESCRIPTION
Expand Down Expand Up @@ -345,6 +346,16 @@ raise an error.
Restrict delta matches based on "islands". See DELTA ISLANDS
below.

--path-walk::
By default, `git pack-objects` walks objects in an order that
presents trees and blobs in an order unrelated to the path they
appear relative to a commit's root tree. The `--path-walk` option
enables a different walking algorithm that organizes trees and
blobs by path. This has the potential to improve delta compression
especially in the presence of filenames that cause collisions in
Git's default name-hash algorithm. Due to changing how the objects
are walked, this option is not compatible with `--delta-islands`,
`--shallow`, or `--filter`.

DELTA ISLANDS
-------------
Expand Down
17 changes: 16 additions & 1 deletion Documentation/git-repack.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ git-repack - Pack unpacked objects in a repository
SYNOPSIS
--------
[verse]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] [--write-midx]
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
[--write-midx] [--path-walk]

DESCRIPTION
-----------
Expand Down Expand Up @@ -249,6 +251,19 @@ linkgit:git-multi-pack-index[1]).
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
containing the non-redundant packs.

--path-walk::
This option passes the `--path-walk` option to the underlying
`git pack-options` process (see linkgit:git-pack-objects[1]).
By default, `git pack-objects` walks objects in an order that
presents trees and blobs in an order unrelated to the path they
appear relative to a commit's root tree. The `--path-walk` option
enables a different walking algorithm that organizes trees and
blobs by path. This has the potential to improve delta compression
especially in the presence of filenames that cause collisions in
Git's default name-hash algorithm. Due to changing how the objects
are walked, this option is not compatible with `--delta-islands`
or `--filter`.

CONFIGURATION
-------------

Expand Down
64 changes: 64 additions & 0 deletions Documentation/technical/api-path-walk.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Path-Walk API
=============

The path-walk API is used to walk reachable objects, but to visit objects
in batches based on a common path they appear in, or by type.

For example, all reachable commits are visited in a group. All tags are
visited in a group. Then, all root trees are visited. At some point, all
blobs reachable via a path `my/dir/to/A` are visited. When there are
multiple paths possible to reach the same object, then only one of those
paths is used to visit the object.

Basics
------

To use the path-walk API, include `path-walk.h` and call
`walk_objects_by_path()` with a customized `path_walk_info` struct. The
struct is used to set all of the options for how the walk should proceed.
Let's dig into the different options and their use.

`path_fn` and `path_fn_data`::
The most important option is the `path_fn` option, which is a
function pointer to the callback that can execute logic on the
object IDs for objects grouped by type and path. This function
also receives a `data` value that corresponds to the
`path_fn_data` member, for providing custom data structures to
this callback function.

`revs`::
To configure the exact details of the reachable set of objects,
use the `revs` member and initialize it using the revision
machinery in `revision.h`. Initialize `revs` using calls such as
`setup_revisions()` or `parse_revision_opt()`. Do not call
`prepare_revision_walk()`, as that will be called within
`walk_objects_by_path()`.
+
It is also important that you do not specify the `--objects` flag for the
`revs` struct. The revision walk should only be used to walk commits, and
the objects will be walked in a separate way based on those starting
commits.

`commits`, `blobs`, `trees`, `tags`::
By default, these members are enabled and signal that the path-walk
API should call the `path_fn` on objects of these types. Specialized
applications could disable some options to make it simpler to walk
the objects or to have fewer calls to `path_fn`.
+
While it is possible to walk only commits in this way, consumers would be
better off using the revision walk API instead.

`prune_all_uninteresting`::
By default, all reachable paths are emitted by the path-walk API.
This option allows consumers to declare that they are not
interested in paths where all included objects are marked with the
`UNINTERESTING` flag. This requires using the `boundary` option in
the revision walk so that the walk emits commits marked with the
`UNINTERESTING` flag.

Examples
--------

See example usages in:
`t/helper/test-path-walk.c`,
`builtin/pack-objects.c`
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -818,6 +818,7 @@ TEST_BUILTINS_OBJS += test-parse-options.o
TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
TEST_BUILTINS_OBJS += test-partial-clone.o
TEST_BUILTINS_OBJS += test-path-utils.o
TEST_BUILTINS_OBJS += test-path-walk.o
TEST_BUILTINS_OBJS += test-pcre2-config.o
TEST_BUILTINS_OBJS += test-pkt-line.o
TEST_BUILTINS_OBJS += test-proc-receive.o
Expand Down Expand Up @@ -1094,6 +1095,7 @@ LIB_OBJS += parse-options.o
LIB_OBJS += patch-delta.o
LIB_OBJS += patch-ids.o
LIB_OBJS += path.o
LIB_OBJS += path-walk.o
LIB_OBJS += pathspec.o
LIB_OBJS += pkt-line.o
LIB_OBJS += preload-index.o
Expand Down
Loading
Loading