Skip to content

Commit

Permalink
Merge core VFS features
Browse files Browse the repository at this point in the history
These were done in private, before microsoft/git.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
  • Loading branch information
dscho committed Aug 21, 2023
2 parents a2e49ec + a05336f commit cb839fe
Show file tree
Hide file tree
Showing 24 changed files with 842 additions and 13 deletions.
42 changes: 42 additions & 0 deletions Documentation/config/core.txt
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,48 @@ core.multiPackIndex::
single index. See linkgit:git-multi-pack-index[1] for more
information. Defaults to true.

core.gvfs::
Enable the features needed for GVFS. This value can be set to true
to indicate all features should be turned on or the bit values listed
below can be used to turn on specific features.
+
--
GVFS_SKIP_SHA_ON_INDEX::
Bit value 1
Disables the calculation of the sha when writing the index
GVFS_MISSING_OK::
Bit value 4
Normally git write-tree ensures that the objects referenced by the
directory exist in the object database. This option disables this check.
GVFS_NO_DELETE_OUTSIDE_SPARSECHECKOUT::
Bit value 8
When marking entries to remove from the index and the working
directory this option will take into account what the
skip-worktree bit was set to so that if the entry has the
skip-worktree bit set it will not be removed from the working
directory. This will allow virtualized working directories to
detect the change to HEAD and use the new commit tree to show
the files that are in the working directory.
GVFS_FETCH_SKIP_REACHABILITY_AND_UPLOADPACK::
Bit value 16
While performing a fetch with a virtual file system we know
that there will be missing objects and we don't want to download
them just because of the reachability of the commits. We also
don't want to download a pack file with commits, trees, and blobs
since these will be downloaded on demand. This flag will skip the
checks on the reachability of objects during a fetch as well as
the upload pack so that extraneous objects don't get downloaded.
GVFS_BLOCK_FILTERS_AND_EOL_CONVERSIONS::
Bit value 64
With a virtual file system we only know the file size before any
CRLF or smudge/clean filters processing is done on the client.
To prevent file corruption due to truncation or expansion with
garbage at the end, these filters must not run when the file
is first accessed and brought down to the client. Git.exe can't
currently tell the first access vs subsequent accesses so this
flag just blocks them from occurring at all.
--

core.sparseCheckout::
Enable "sparse checkout" feature. See linkgit:git-sparse-checkout[1]
for more information.
Expand Down
102 changes: 102 additions & 0 deletions Documentation/technical/read-object-protocol.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
Read Object Process
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The read-object process enables Git to read all missing blobs with a
single process invocation for the entire life of a single Git command.
This is achieved by using a packet format (pkt-line, see technical/
protocol-common.txt) based protocol over standard input and standard
output as follows. All packets, except for the "*CONTENT" packets and
the "0000" flush packet, are considered text and therefore are
terminated by a LF.

Git starts the process when it encounters the first missing object that
needs to be retrieved. After the process is started, Git sends a welcome
message ("git-read-object-client"), a list of supported protocol version
numbers, and a flush packet. Git expects to read a welcome response
message ("git-read-object-server"), exactly one protocol version number
from the previously sent list, and a flush packet. All further
communication will be based on the selected version.

The remaining protocol description below documents "version=1". Please
note that "version=42" in the example below does not exist and is only
there to illustrate how the protocol would look with more than one
version.

After the version negotiation Git sends a list of all capabilities that
it supports and a flush packet. Git expects to read a list of desired
capabilities, which must be a subset of the supported capabilities list,
and a flush packet as response:
------------------------
packet: git> git-read-object-client
packet: git> version=1
packet: git> version=42
packet: git> 0000
packet: git< git-read-object-server
packet: git< version=1
packet: git< 0000
packet: git> capability=get
packet: git> capability=have
packet: git> capability=put
packet: git> capability=not-yet-invented
packet: git> 0000
packet: git< capability=get
packet: git< 0000
------------------------
The only supported capability in version 1 is "get".

Afterwards Git sends a list of "key=value" pairs terminated with a flush
packet. The list will contain at least the command (based on the
supported capabilities) and the sha1 of the object to retrieve. Please
note, that the process must not send any response before it received the
final flush packet.

When the process receives the "get" command, it should make the requested
object available in the git object store and then return success. Git will
then check the object store again and this time find it and proceed.
------------------------
packet: git> command=get
packet: git> sha1=0a214a649e1b3d5011e14a3dc227753f2bd2be05
packet: git> 0000
------------------------

The process is expected to respond with a list of "key=value" pairs
terminated with a flush packet. If the process does not experience
problems then the list must contain a "success" status.
------------------------
packet: git< status=success
packet: git< 0000
------------------------

In case the process cannot or does not want to process the content, it
is expected to respond with an "error" status.
------------------------
packet: git< status=error
packet: git< 0000
------------------------

In case the process cannot or does not want to process the content as
well as any future content for the lifetime of the Git process, then it
is expected to respond with an "abort" status at any point in the
protocol.
------------------------
packet: git< status=abort
packet: git< 0000
------------------------

Git neither stops nor restarts the process in case the "error"/"abort"
status is set.

If the process dies during the communication or does not adhere to the
protocol then Git will stop the process and restart it with the next
object that needs to be processed.

After the read-object process has processed an object it is expected to
wait for the next "key=value" list containing a command. Git will close
the command pipe on exit. The process is expected to detect EOF and exit
gracefully on its own. Git will wait until the process has stopped.

A long running read-object process demo implementation can be found in
`contrib/long-running-read-object/example.pl` located in the Git core
repository. If you develop your own long running process then the
`GIT_TRACE_PACKET` environment variables can be very helpful for
debugging (see linkgit:git[1]).
9 changes: 7 additions & 2 deletions GIT-VERSION-GEN
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/bin/sh

GVF=GIT-VERSION-FILE
DEF_VER=v2.42.0
DEF_VER=v2.42.0.vfs.0.0

LF='
'
Expand All @@ -12,10 +12,15 @@ if test -f version
then
VN=$(cat version) || VN="$DEF_VER"
elif test -d ${GIT_DIR:-.git} -o -f .git &&
VN=$(git describe --match "v[0-9]*" HEAD 2>/dev/null) &&
VN=$(git describe --match "v[0-9]*vfs*" HEAD 2>/dev/null) &&
case "$VN" in
*$LF*) (exit 1) ;;
v[0-9]*)
if test "${VN%%.vfs.*}" != "${DEF_VER%%.vfs.*}"
then
echo "Found version $VN, which is not based on $DEF_VER" >&2
exit 1
fi
git update-index -q --refresh
test -z "$(git diff-index --name-only HEAD --)" ||
VN="$VN-dirty" ;;
Expand Down
4 changes: 3 additions & 1 deletion cache-tree.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "git-compat-util.h"
#include "environment.h"
#include "hex.h"
#include "gvfs.h"
#include "lockfile.h"
#include "tree.h"
#include "tree-walk.h"
Expand Down Expand Up @@ -258,7 +259,8 @@ static int update_one(struct cache_tree *it,
int flags)
{
struct strbuf buffer;
int missing_ok = flags & WRITE_TREE_MISSING_OK;
int missing_ok = gvfs_config_is_set(GVFS_MISSING_OK) ?
WRITE_TREE_MISSING_OK : (flags & WRITE_TREE_MISSING_OK);
int dryrun = flags & WRITE_TREE_DRY_RUN;
int repair = flags & WRITE_TREE_REPAIR;
int to_invalidate = 0;
Expand Down
9 changes: 7 additions & 2 deletions commit.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
#include "git-compat-util.h"
#include "gvfs.h"
#include "tag.h"
#include "commit.h"
#include "commit-graph.h"
Expand Down Expand Up @@ -560,13 +561,17 @@ int repo_parse_commit_internal(struct repository *r,
.sizep = &size,
.contentp = &buffer,
};
int ret;
/*
* Git does not support partial clones that exclude commits, so set
* OBJECT_INFO_SKIP_FETCH_OBJECT to fail fast when an object is missing.
*/
int flags = OBJECT_INFO_LOOKUP_REPLACE | OBJECT_INFO_SKIP_FETCH_OBJECT |
OBJECT_INFO_DIE_IF_CORRUPT;
int ret;
OBJECT_INFO_DIE_IF_CORRUPT;

/* But the GVFS Protocol _does_ support missing commits! */
if (gvfs_config_is_set(GVFS_MISSING_OK))
flags ^= OBJECT_INFO_SKIP_FETCH_OBJECT;

if (!item)
return -1;
Expand Down
11 changes: 11 additions & 0 deletions config.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include "abspath.h"
#include "advice.h"
#include "date.h"
#include "gvfs.h"
#include "branch.h"
#include "config.h"
#include "convert.h"
Expand Down Expand Up @@ -1776,6 +1777,11 @@ int git_default_core_config(const char *var, const char *value,
return 0;
}

if (!strcmp(var, "core.gvfs")) {
gvfs_load_config_value(value);
return 0;
}

if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
Expand All @@ -1801,6 +1807,11 @@ int git_default_core_config(const char *var, const char *value,
return 0;
}

if (!strcmp(var, "core.virtualizeobjects")) {
core_virtualize_objects = git_config_bool(var, value);
return 0;
}

/* Add other config variables here and to Documentation/config.txt. */
return platform_core_config(var, value, ctx, cb);
}
Expand Down
22 changes: 22 additions & 0 deletions connected.c
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#include "git-compat-util.h"
#include "environment.h"
#include "gettext.h"
#include "hex.h"
#include "gvfs.h"
#include "object-store-ll.h"
#include "run-command.h"
#include "sigchain.h"
Expand Down Expand Up @@ -32,6 +34,26 @@ int check_connected(oid_iterate_fn fn, void *cb_data,
struct transport *transport;
size_t base_len;

/*
* Running a virtual file system there will be objects that are
* missing locally and we don't want to download a bunch of
* commits, trees, and blobs just to make sure everything is
* reachable locally so this option will skip reachablility
* checks below that use rev-list. This will stop the check
* before uploadpack runs to determine if there is anything to
* fetch. Returning zero for the first check will also prevent the
* uploadpack from happening. It will also skip the check after
* the fetch is finished to make sure all the objects where
* downloaded in the pack file. This will allow the fetch to
* run and get all the latest tip commit ids for all the branches
* in the fetch but not pull down commits, trees, or blobs via
* upload pack.
*/
if (gvfs_config_is_set(GVFS_FETCH_SKIP_REACHABILITY_AND_UPLOADPACK))
return 0;
if (core_virtualize_objects)
return 0;

if (!opt)
opt = &defaults;
transport = opt->transport;
Expand Down
Loading

0 comments on commit cb839fe

Please sign in to comment.