Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packed-refs v2 Part V: the v2 file format #27

Closed
wants to merge 15 commits into from

Conversation

When updating the file format version for something as critical as ref
storage, the file format version must come with an extension change. The
extensions.refFormat config value is a multi-valued config value that
defaults to the pair "files" and "packed".

Add "packed-v2" as a possible value to extensions.refFormat. This
value specifies that the packed-refs file may exist in the version 2
format. (If the "packed" value does not exist, then the packed-refs file
must exist in version 2, not version 1.)

In order to select version 2 for writing, the user will have two
options. First, the user could remove "packed" and add "packed-v2" to
the extensions.refFormat list. This would imply that version 2 is the
only format available. However, this also means that version 1 files
would be ignored at read time, so this does not allow users to upgrade
repositories with existing packed-refs files.

Add a new refs.packedRefsVersion config option which allows specifying
which version to use during writes. Thus, when both "packed" and
"packed-v2" are in the extensions.refFormat list, the user can upgrade
from version 1 to version 2, or downgrade from 2 to 1.

Currently, the implementation does not use refs.packedRefsVersion, as
that is delayed until we have the code to write that file format
version. However, we can add the necessary enum values and flag
constants to communicate the presence of "packed-v2" in the
extensions.refFormat list.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
TODO: add writing tests.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Tests already cover that we will start reading these prefixes.

TODO: discuss time and space savings over typical approach.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
When set, this will create a default value for the packed-refs file
version on writes. When set to "2", it will automatically add the
"packed-v2" value to extensions.refFormat.

Not all tests pass with GIT_TEST_PACKED_REFS_VERSION=2 because they care
specifically about the content of the packed-refs file. These tests will
be updated in following changes.

To start, though, disable the GIT_TEST_PACKED_REFS_VERSION environment
variable in t3212-ref-formats.sh, since that script already tests both
versions, including upgrade scenarios.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
t1409-avoid-packing-refs.sh seeks to test that the packed-refs file is
not modified unnecessarily. One way it does this is by creating a
packed-refs file, then munging its contents and verifying that the
munged data remains after other commands.

For packed-refs v1, it suffices to add a line that is similar to a
comment. For packed-refs v2, we cannot even add to the file without
messing up the trailing table of contents of its chunked format.
However, we can manipulate the last bytes that are within the trailing
hash and use 'tail -c 4' to read them.

This makes t1409 pass with GIT_TEST_PACKED_REFS_VERSION=2.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
One test in t5312 uses 'grep' to detect that a ref is written in the
packed-refs file instead of a loose object. This does not work when the
packed-refs file is in v2 format, such as when
GIT_TEST_PACKED_REFS_VERSION=2.

Since the test already checks that the loose ref is missing, it suffices
to check that 'git rev-parse' succeeds.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The last test in t5502-quickfetch.sh exploits the packed-refs v1 file
format by appending 1000 lines to the packed-refs file. If the
packed-refs file is in the v2 format, this corrupts the file as
unreadable.

Instead of making the test slower, let's ignore it when
GIT_TEST_PACKED_REFS_VERSION=2. The test is really about 'git fetch',
not the packed-refs format. Create a prerequisite in case we want to use
this technique again in the future.

An alternative would be to write those 1000 refs using a different
mechanism, but let's opt for the simpler case for now.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Three tests in t3210-pack-refs.sh corrupt a packed-refs file to test
that Git properly discovers and handles those failures. These tests
assume that the file is in the v1 format, so add the PACKED_REFS_V1
prereq to skip these tests when GIT_TEST_PACKED_REFS_VERSION=2.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The GIT_TEST_PACKED_REFS_VERSION=2 environment variable helps us test
the packed-refs file format in its v2 version. This variable makes the
Git process act as if the extensions.refFormat config key has
"packed-v2" in its list. This means that if the environment variable is
removed, the repository is in a bad state. This is sufficient for most
test cases.

However, tests that fetch over HTTP appear to lose this environment
variable when executed through the HTTP server. Since the repositories
are created via Git commands in the tests, the packed-refs files end up
in the v2 format, but the server processes do not understand this and
start serving empty payloads since they do not recognize any refs.

The preferred long-term solution would be to ensure that the GIT_TEST_*
environment variable persists into the HTTP server. However, these tests
are not exercising any particularly tricky parts of the packed-refs file
format. It may not be worth the effort to pass the environment variable
and instead we can unset the environment variable (with a comment
explaining why) in these tests.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The linux-TEST-vars CI build helps us check that certain opt-in features
are still exercised in at least one environment. The new
GIT_TEST_PACKED_REFS_VERSION environment variable now passes the test
suite when set to "2", so add this to that list of variables.

This provides nearly the same coverage of the v2 format as we had in the
v1 format.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
TBD

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The 'skip_hash' option in 'struct hashfile' indicates that we want to
use the hashfile API as a buffered writer, and not use the hash function
to create a trailing hash. We still write a trailing null hash to
indicate that we do not have a checksum at the end. This feature is
enabled for index writes using the 'index.computeHash' config key.

Create a similar (currently hidden) option for the packed-refs v2 file
format: refs.hashPackedRefs. This defaults to false because performance
is compared to the packed-refs v1 file format which does have a checksum
anywhere.

This change results in improvements to p1401 when using a repository
with a 42 MB packed-refs file (600,000+ refs).

Test                        HEAD~1            HEAD
--------------------------------------------------------------------
1401.1: git pack-refs (v1)  0.38(0.31+0.52)   0.37(0.28+0.52) -2.6%
1401.5: git pack-refs (v2)  0.39(0.33+0.52)   0.30(0.28+0.46) -23.1%

Note that these tests update a ref and then repack the packed-refs file.
The following benchmarks are from a hyperfine experiment that only ran
the 'git pack-refs --all' command for the two formats, but also compared
the effect when refs.hashPackedRefs=true.

Benchmark 1: v1
  Time (mean ± σ):     163.5 ms ±  18.1 ms    [User: 117.8 ms, System: 38.1 ms]
  Range (min … max):   131.3 ms … 190.4 ms    50 runs

Benchmark 2: v2-no-hash
  Time (mean ± σ):      95.8 ms ±  15.1 ms    [User: 72.5 ms, System: 23.0 ms]
  Range (min … max):    82.9 ms … 131.2 ms    50 runs

Benchmark 3: v2-hashing
  Time (mean ± σ):     100.8 ms ±  16.4 ms    [User: 77.2 ms, System: 23.1 ms]
  Range (min … max):    83.0 ms … 131.1 ms    50 runs

Summary
  'v2-no-hash' ran
    1.05 ± 0.24 times faster than 'v2-hashing'
    1.71 ± 0.33 times faster than 'v1'

In this case of repeatedly rewriting the same refs seems to demonstrate
a smaller improvement than the p1401 test. However, the overall
reduction from v1 matches the expected reduction in file size. In my
tests, the 42 MB packed-refs (v1) file was compacted to 28 MB in the v2
format.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
derrickstolee pushed a commit that referenced this pull request Nov 8, 2022
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Mar 15, 2023
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Mar 15, 2023
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request May 11, 2023
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Aug 23, 2023
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Apr 30, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request May 31, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Jun 19, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Jul 19, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Aug 23, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Sep 30, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
derrickstolee pushed a commit that referenced this pull request Oct 9, 2024
Add virtual file system settings and hook proc.  On index load,
clear/set the skip worktree bits based on the virtual file system data.
Use virtual file system data to update skip-worktree bit in
unpack-trees. Use virtual file system data to exclude files and folders
not explicitly requested.

The hook was first contributed in private, but was extended via the
following pull requests:

	#15
	#27
	git#33
	git#70

Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant