Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle URIs Part 4: Advertise URIs from Git server #21

Closed
wants to merge 11 commits into from
6 changes: 6 additions & 0 deletions Documentation/config/transfer.txt
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,9 @@ transfer.unpackLimit::
transfer.advertiseSID::
Boolean. When true, client and server processes will advertise their
unique session IDs to their remote counterpart. Defaults to false.

transfer.bundleURI::
When `true`, local `git clone` commands will request bundle
information from the remote server (if advertised) and download
bundles before continuing the clone through the Git protocol.
Defaults to `false`.
201 changes: 201 additions & 0 deletions Documentation/gitprotocol-v2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,207 @@ and associated requested information, each separated by a single space.

obj-info = obj-id SP obj-size

bundle-uri
~~~~~~~~~~

If the 'bundle-uri' capability is advertised, the server supports the
`bundle-uri' command.

The capability is currently advertised with no value (i.e. not
"bundle-uri=somevalue"), a value may be added in the future for
supporting command-wide extensions. Clients MUST ignore any unknown
capability values and proceed with the 'bundle-uri` dialog they
support.

The 'bundle-uri' command is intended to be issued before `fetch` to
get URIs to bundle files (see linkgit:git-bundle[1]) to "seed" and
inform the subsequent `fetch` command.

The client CAN issue `bundle-uri` before or after any other valid
command. To be useful to clients it's expected that it'll be issued
after an `ls-refs` and before `fetch`, but CAN be issued at any time
in the dialog.

DISCUSSION of bundle-uri
^^^^^^^^^^^^^^^^^^^^^^^^

The intent of the feature is optimize for server resource consumption
in the common case by changing the common case of fetching a very
large PACK during linkgit:git-clone[1] into a smaller incremental
fetch.

It also allows servers to achieve better caching in combination with
an `uploadpack.packObjectsHook` (see linkgit:git-config[1]).

By having new clones or fetches be a more predictable and common
negotiation against the tips of recently produces *.bundle file(s).
Servers might even pre-generate the results of such negotiations for
the `uploadpack.packObjectsHook` as new pushes come in.

One way that servers could take advantage of these bundles is that the
server would anticipate that fresh clones will download a known bundle,
followed by catching up to the current state of the repository using ref
tips found in that bundle (or bundles).

PROTOCOL for bundle-uri
^^^^^^^^^^^^^^^^^^^^^^^

A `bundle-uri` request takes no arguments, and as noted above does not
currently advertise a capability value. Both may be added in the
future.

When the client issues a `command=bundle-uri` request, the response is a
list of key-value pairs provided as packet lines with value
`<key>=<value>`. Each `<key>` should be interpreted as a config key from
the `bundle.*` namespace to construct a list of bundles. These keys are
grouped by a `bundle.<id>.` subsection, where each key corresponding to a
given `<id>` contributes attributes to the bundle defined by that `<id>`.
See linkgit:git-config[1] for the specific details of these keys and how
the Git client will interpret their values.

Clients MUST parse the line according to the above format, lines that do
not conform to the format SHOULD be discarded. The user MAY be warned in
such a case.

bundle-uri CLIENT AND SERVER EXPECTATIONS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

URI CONTENTS::
The content at the advertised URIs MUST be one of two types.
+
The advertised URI may contain a bundle file that `git bundle verify`
would accept. I.e. they MUST contain one or more reference tips for
use by the client, MUST indicate prerequisites (in any) with standard
"-" prefixes, and MUST indicate their "object-format", if
applicable.
+
The advertised URI may alternatively contain a plaintext file that `git
config --list` would accept (with the `--file` option). The key-value
pairs in this list are in the `bundle.*` namespace (see
linkgit:git-config[1]).

bundle-uri CLIENT ERROR RECOVERY::
A client MUST above all gracefully degrade on errors, whether that
error is because of bad missing/data in the bundle URI(s), because
that client is too dumb to e.g. understand and fully parse out bundle
headers and their prerequisite relationships, or something else.
+
Server operators should feel confident in turning on "bundle-uri" and
not worry if e.g. their CDN goes down that clones or fetches will run
into hard failures. Even if the server bundle bundle(s) are
incomplete, or bad in some way the client should still end up with a
functioning repository, just as if it had chosen not to use this
protocol extension.
+
All subsequent discussion on client and server interaction MUST keep
this in mind.

bundle-uri SERVER TO CLIENT::
The ordering of the returned bundle uris is not significant. Clients
MUST parse their headers to discover their contained OIDS and
prerequisites. A client MUST consider the content of the bundle(s)
themselves and their header as the ultimate source of truth.
+
A server MAY even return bundle(s) that don't have any direct
relationship to the repository being cloned (either through accident,
or intentional "clever" configuration), and expect a client to sort
out what data they'd like from the bundle(s), if any.

bundle-uri CLIENT TO SERVER::
The client SHOULD provide reference tips found in the bundle header(s)
as 'have' lines in any subsequent `fetch` request. A client MAY also
ignore the bundle(s) entirely if doing so is deemed worse for some
reason, e.g. if the bundles can't be downloaded, it doesn't like the
tips it finds etc.

WHEN ADVERTISED BUNDLE(S) REQUIRE NO FURTHER NEGOTIATION::
If after issuing `bundle-uri` and `ls-refs`, and getting the header(s)
of the bundle(s) the client finds that the ref tips it wants can be
retrieved entirely from advertised bundle(s), the client MAY disconnect
from the Git server. The results of such a 'clone' or 'fetch' should be
indistinguishable from the state attained without using bundle-uri.

EARLY CLIENT DISCONNECTIONS AND ERROR RECOVERY::
A client MAY perform an early disconnect while still downloading the
bundle(s) (having streamed and parsed their headers). In such a case
the client MUST gracefully recover from any errors related to
finishing the download and validation of the bundle(s).
+
I.e. a client might need to re-connect and issue a 'fetch' command,
and possibly fall back to not making use of 'bundle-uri' at all.
+
This "MAY" behavior is specified as such (and not a "SHOULD") on the
assumption that a server advertising bundle uris is more likely than
not to be serving up a relatively large repository, and to be pointing
to URIs that have a good chance of being in working order. A client
MAY e.g. look at the payload size of the bundles as a heuristic to see
if an early disconnect is worth it, should falling back on a full
"fetch" dialog be necessary.

WHEN ADVERTISED BUNDLE(S) REQUIRE FURTHER NEGOTIATION::
A client SHOULD commence a negotiation of a PACK from the server via
the "fetch" command using the OID tips found in advertised bundles,
even if's still in the process of downloading those bundle(s).
+
This allows for aggressive early disconnects from any interactive
server dialog. The client blindly trusts that the advertised OID tips
are relevant, and issues them as 'have' lines, it then requests any
tips it would like (usually from the "ls-refs" advertisement) via
'want' lines. The server will then compute a (hopefully small) PACK
with the expected difference between the tips from the bundle(s) and
the data requested.
+
The only connection the client then needs to keep active is to the
concurrently downloading static bundle(s), when those and the
incremental PACK are retrieved they should be inflated and
validated. Any errors at this point should be gracefully recovered
from, see above.

bundle-uri PROTOCOL FEATURES
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The client constructs a bundle list from the `<key>=<value>` pairs
provided by the server. These pairs are part of the `bundle.*` namespace
as documented in linkgit:git-config[1]. In this section, we discuss some
of these keys and describe the actions the client will do in response to
this information.

In particular, the `bundle.version` key specifies an integer value. The
only accepted value at the moment is `1`, but if the client sees an
unexpected value here then the client MUST ignore the bundle list.

As long as `bundle.version` is understood, all other unknown keys MAY be
ignored by the client. The server will guarantee compatibility with older
clients, though newer clients may be better able to use the extra keys to
minimize downloads.

Any backwards-incompatible addition of pre-URI key-value will be
guarded by a new `bundle.version` value or values in 'bundle-uri'
capability advertisement itself, and/or by new future `bundle-uri`
request arguments.

Some example key-value pairs that are not currently implemented but could
be implemented in the future include:

* Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
size of the bundle file.

* Advertise that one or more bundle files are the same (to e.g. have
clients round-robin or otherwise choose one of N possible files).

* A "oid=<OID>" shortcut and "prerequisite=<OID>" shortcut. For
expressing the common case of a bundle with one tip and no
prerequisites, or one tip and one prerequisite.
+
This would allow for optimizing the common case of servers who'd like
to provide one "big bundle" containing only their "main" branch,
and/or incremental updates thereof.
+
A client receiving such a a response MAY assume that they can skip
retrieving the header from a bundle at the indicated URI, and thus
save themselves and the server(s) the request(s) needed to inspect the
headers of that bundle or bundles.

GIT
---
Part of the linkgit:git[1] suite
21 changes: 21 additions & 0 deletions builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -1266,6 +1266,27 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (refs)
mapped_refs = wanted_peer_refs(refs, &remote->fetch);

if (!bundle_uri) {
/*
* Populate transport->got_remote_bundle_uri and
* transport->bundle_uri. We might get nothing.
*/
transport_get_remote_bundle_uri(transport);

if (transport->bundles &&
hashmap_get_size(&transport->bundles->bundles)) {
/* At this point, we need the_repository to match the cloned repo. */
if (repo_init(the_repository, git_dir, work_tree))
warning(_("failed to initialize the repo, skipping bundle URI"));
else if (fetch_bundle_list(the_repository,
transport->bundles))
warning(_("failed to fetch advertised bundles"));
} else {
clear_bundle_list(transport->bundles);
FREE_AND_NULL(transport->bundles);
}
}

if (mapped_refs) {
int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));

Expand Down
87 changes: 86 additions & 1 deletion bundle-uri.c
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include "hashmap.h"
#include "pkt-line.h"
#include "config.h"
#include "remote.h"

static int compare_bundles(const void *hashmap_cmp_fn_data,
const struct hashmap_entry *he1,
Expand Down Expand Up @@ -49,6 +50,7 @@ void clear_bundle_list(struct bundle_list *list)

for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
free(list->baseURI);
}

int for_all_bundles_in_list(struct bundle_list *list,
Expand Down Expand Up @@ -163,7 +165,7 @@ static int bundle_list_update(const char *key, const char *value,
if (!strcmp(subkey, "uri")) {
if (bundle->uri)
return -1;
bundle->uri = xstrdup(value);
bundle->uri = relative_url(list->baseURI, value, NULL);
return 0;
}

Expand All @@ -190,6 +192,18 @@ int bundle_uri_parse_config_format(const char *uri,
.error_action = CONFIG_ERROR_ERROR,
};

if (!list->baseURI) {
struct strbuf baseURI = STRBUF_INIT;
strbuf_addstr(&baseURI, uri);

/*
* If the URI does not end with a trailing slash, then
* remove the filename portion of the path. This is
* important for relative URIs.
*/
strbuf_strip_file_from_path(&baseURI);
list->baseURI = strbuf_detach(&baseURI, NULL);
}
result = git_config_from_file_with_options(config_to_bundle_list,
filename, list,
&opts);
Expand Down Expand Up @@ -563,6 +577,77 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
return result;
}

int fetch_bundle_list(struct repository *r, struct bundle_list *list)
{
int result;
struct bundle_list global_list;

init_bundle_list(&global_list);

/* If a bundle is added to this global list, then it is required. */
global_list.mode = BUNDLE_MODE_ALL;

if ((result = download_bundle_list(r, list, &global_list, 0)))
goto cleanup;

result = unbundle_all_bundles(r, &global_list);

cleanup:
for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
clear_bundle_list(&global_list);
return result;
}

/**
* API for serve.c.
*/

int bundle_uri_advertise(struct repository *r, struct strbuf *value UNUSED)
{
static int advertise_bundle_uri = -1;

if (advertise_bundle_uri != -1)
goto cached;

advertise_bundle_uri = 0;
repo_config_get_maybe_bool(r, "uploadpack.advertisebundleuris", &advertise_bundle_uri);

cached:
return advertise_bundle_uri;
}

static int config_to_packet_line(const char *key, const char *value, void *data)
{
struct packet_reader *writer = data;

if (!strncmp(key, "bundle.", 7))
packet_write_fmt(writer->fd, "%s=%s", key, value);

return 0;
}

int bundle_uri_command(struct repository *r,
struct packet_reader *request)
{
struct packet_writer writer;
packet_writer_init(&writer, 1);

while (packet_reader_read(request) == PACKET_READ_NORMAL)
die(_("bundle-uri: unexpected argument: '%s'"), request->line);
if (request->status != PACKET_READ_FLUSH)
die(_("bundle-uri: expected flush after arguments"));

/*
* Read all "bundle.*" config lines to the client as key=value
* packet lines.
*/
git_config(config_to_packet_line, &writer);

packet_writer_flush(&writer);

return 0;
}

/**
* General API for {transport,connect}.c etc.
*/
Expand Down
Loading