Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle URIs II: git clone --bundle-uri #1300

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Documentation/git-clone.txt
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,13 @@ or `--mirror` is given)
for `host.xz:foo/.git`). Cloning into an existing directory
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * Before fetching from the remote, download and install bundle
> +	 * data from the --bundle-uri option.
> +	 */
> +	if (bundle_uri) {
> +		/* At this point, we need the_repository to match the cloned repo. */
> +		repo_init(the_repository, git_dir, work_tree);
> +		if (fetch_bundle_uri(the_repository, bundle_uri))
> +			warning(_("failed to fetch objects from bundle URI '%s'"),
> +				bundle_uri);
> +	}

I do not offhand know why I suddenly started seeing the issue for
this relatively old commit I have had in my tree for at least 10
days, but I am getting this

builtin/clone.c: In function 'cmd_clone':
builtin/clone.c:1248:17: error: ignoring return value of 'repo_init' declared with attribute 'warn_unused_result' [-Werror=unused-result]
 1248 |                 repo_init(the_repository, git_dir, work_tree);
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


with the commit merged in 'seen'.

It seems that the updated ab/submodule-helper-prep is doing it.  

Why can't that (or any of Ævar's) topic focus on what it needs to
do, without churning the codebase and inflict damages to other
topics like this?  Quite frustrating.

I'll redo today's integration without the offending topic to give
other topics test coverage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/22/2022 5:24 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +	/*
>> +	 * Before fetching from the remote, download and install bundle
>> +	 * data from the --bundle-uri option.
>> +	 */
>> +	if (bundle_uri) {
>> +		/* At this point, we need the_repository to match the cloned repo. */
>> +		repo_init(the_repository, git_dir, work_tree);
>> +		if (fetch_bundle_uri(the_repository, bundle_uri))
>> +			warning(_("failed to fetch objects from bundle URI '%s'"),
>> +				bundle_uri);
>> +	}
> 
> I do not offhand know why I suddenly started seeing the issue for
> this relatively old commit I have had in my tree for at least 10
> days, but I am getting this
> 
> builtin/clone.c: In function 'cmd_clone':
> builtin/clone.c:1248:17: error: ignoring return value of 'repo_init' declared with attribute 'warn_unused_result' [-Werror=unused-result]
>  1248 |                 repo_init(the_repository, git_dir, work_tree);
>       |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> with the commit merged in 'seen'.

Thank you for pointing this out. Here is a patch that fixes
the issue from this patch:

--- >8 ---

From 6df8bc6d7ffdc1b115d85ef9550bab5dcf1811f8 Mon Sep 17 00:00:00 2001
From: Derrick Stolee <derrickstolee@github.com>
Date: Tue, 23 Aug 2022 09:53:47 -0400
Subject: [PATCH] clone: warn on failure to repo_init()

The --bundle-uri option was added in 55568919616 (clone: add
--bundle-uri option, 2022-08-09), but this also introduced a call to
repo_init() whose return value was ignored. Fix that ignored value by
warning that the bundle URI process could not continue if it failed.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 builtin/clone.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4463789680b..e21d42dfee5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1245,8 +1245,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	 */
 	if (bundle_uri) {
 		/* At this point, we need the_repository to match the cloned repo. */
-		repo_init(the_repository, git_dir, work_tree);
-		if (fetch_bundle_uri(the_repository, bundle_uri))
+		if (repo_init(the_repository, git_dir, work_tree))
+			warning(_("failed to initialize the repo, skipping bundle URI"));
+		else if (fetch_bundle_uri(the_repository, bundle_uri))
 			warning(_("failed to fetch objects from bundle URI '%s'"),
 				bundle_uri);
 	}
-- 
2.37.1.vfs.0.0.rebase

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
> Date: Tue, 23 Aug 2022 09:53:47 -0400
> Subject: [PATCH] clone: warn on failure to repo_init()
>
> The --bundle-uri option was added in 55568919616 (clone: add
> --bundle-uri option, 2022-08-09), but this also introduced a call to
> repo_init() whose return value was ignored. Fix that ignored value by
> warning that the bundle URI process could not continue if it failed.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---

OK.  It looks like a sensible thing to do.

>  builtin/clone.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/clone.c b/builtin/clone.c
> index 4463789680b..e21d42dfee5 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1245,8 +1245,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  	 */
>  	if (bundle_uri) {
>  		/* At this point, we need the_repository to match the cloned repo. */
> -		repo_init(the_repository, git_dir, work_tree);
> -		if (fetch_bundle_uri(the_repository, bundle_uri))
> +		if (repo_init(the_repository, git_dir, work_tree))
> +			warning(_("failed to initialize the repo, skipping bundle URI"));
> +		else if (fetch_bundle_uri(the_repository, bundle_uri))
>  			warning(_("failed to fetch objects from bundle URI '%s'"),
>  				bundle_uri);
>  	}

is only allowed if the directory is empty.

--bundle-uri=<uri>::
Before fetching from the remote, fetch a bundle from the given
`<uri>` and unbundle the data into the local repository. The refs
in the bundle will be stored under the hidden `refs/bundle/*`
namespace. This option is incompatible with `--depth`,
`--shallow-since`, and `--shallow-exclude`.

:git-clone: 1
include::urls.txt[]

Expand Down
9 changes: 9 additions & 0 deletions Documentation/gitremote-helpers.txt
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
Can guarantee that when a clone is requested, the received
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> A future change will want a way to download a file over HTTP(S) using
> the simplest of download mechanisms. We do not want to assume that the
> server on the other side understands anything about the Git protocol but
> could be a simple static web server.
> 
> Create the new 'get' capability for the remote helpers which advertises
> that the 'get' command is avalable. A caller can send a line containing
> 'get <url> <path>' to download the file at <url> into the file at
> <path>.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Documentation/gitremote-helpers.txt |  9 +++++++
>  remote-curl.c                       | 32 +++++++++++++++++++++++++
>  t/t5557-http-get.sh                 | 37 +++++++++++++++++++++++++++++
>  3 files changed, 78 insertions(+)
>  create mode 100755 t/t5557-http-get.sh
> 
> diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
> index 6f1e269ae43..ed8da428c98 100644
> --- a/Documentation/gitremote-helpers.txt
> +++ b/Documentation/gitremote-helpers.txt
> @@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
>  	Can guarantee that when a clone is requested, the received
>  	pack is self contained and is connected.
>  
> +'get'::
> +	Can use the 'get' command to download a file from a given URI.
> +
>  If a helper advertises 'connect', Git will use it if possible and
>  fall back to another capability if the helper requests so when
>  connecting (see the 'connect' command under COMMANDS).
> @@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
>  +
>  Supported if the helper has the "stateless-connect" capability.
>  
> +'get' <uri> <path>::
> +	Downloads the file from the given `<uri>` to the given `<path>`. If
> +	`<path>.temp` exists, then Git assumes that the `.temp` file is a
> +	partial download from a previous attempt and will resume the
> +	download from that position.
> +
>  If a fatal error occurs, the program writes the error message to
>  stderr and exits. The caller should expect that a suitable error
>  message has been printed if the child closes the connection without
> diff --git a/remote-curl.c b/remote-curl.c
> index b8758757ece..73fbdbddd84 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -1286,6 +1286,33 @@ static void parse_fetch(struct strbuf *buf)
>  	strbuf_reset(buf);
>  }
>  
> +static void parse_get(struct strbuf *buf)
> +{
> +	struct strbuf url = STRBUF_INIT;
> +	struct strbuf path = STRBUF_INIT;
> +	const char *p, *space;
> +
> +	if (!skip_prefix(buf->buf, "get ", &p))
> +		die(_("http transport does not support %s"), buf->buf);

Nit: since we're already calling skip_prefix(...) in cmd_main() below,
can we just pass the suffix to parse_get() and avoid having to skip the
prefix twice?


> +
> +	space = strchr(p, ' ');
> +
> +	if (!space)
> +		die(_("protocol error: expected '<url> <path>', missing space"));
> +
> +	strbuf_add(&url, p, space - p);
> +	strbuf_addstr(&path, space + 1);
> +
> +	if (http_get_file(url.buf, path.buf, NULL))
> +		die(_("failed to download file at URL '%s'"), url.buf);
> +
> +	strbuf_release(&url);
> +	strbuf_release(&path);
> +	printf("\n");
> +	fflush(stdout);
> +	strbuf_reset(buf);
> +}
> +
>  static int push_dav(int nr_spec, const char **specs)
>  {
>  	struct child_process child = CHILD_PROCESS_INIT;
> @@ -1564,9 +1591,14 @@ int cmd_main(int argc, const char **argv)
>  				printf("unsupported\n");
>  			fflush(stdout);
>  
> +		} else if (skip_prefix(buf.buf, "get ", &arg)) {
> +			parse_get(&buf);
> +			fflush(stdout);
> +
>  		} else if (!strcmp(buf.buf, "capabilities")) {
>  			printf("stateless-connect\n");
>  			printf("fetch\n");
> +			printf("get\n");
>  			printf("option\n");
>  			printf("push\n");
>  			printf("check-connectivity\n");
> diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
> new file mode 100755
> index 00000000000..50b7dbcf957
> --- /dev/null
> +++ b/t/t5557-http-get.sh
> @@ -0,0 +1,37 @@
> +#!/bin/sh
> +
> +test_description='test downloading a file by URL'
> +
> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> +	url="$HTTPD_URL/none.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file1
> +	EOF
> +
> +	test_must_fail git remote-http $url <input 2>err &&
> +	test_path_is_missing file1 &&
> +	grep "failed to download file at URL" err &&
> +	rm file1.temp
> +'
> +
> +test_expect_success 'get by URL: 200' '
> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> +	url="$HTTPD_URL/exists.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file2
> +
> +	EOF
> +
> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&
> +	test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
> +'
> +
> +test_done
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>
> [...]

Add a:

	TEST_PASSES_SANITIZE_LEAK=true

Before:

> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> +	url="$HTTPD_URL/none.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file1
> +	EOF
> +
> +	test_must_fail git remote-http $url <input 2>err &&
> +	test_path_is_missing file1 &&
> +	grep "failed to download file at URL" err &&
> +	rm file1.temp

This should presumably be a :

	test_when_finished "rm file.temp"

Before the "test_must_fail", otherwise we'll leave this around if
e.g. the "grep" in the &&-chain fails, but we surely want to clean this
up in that case..

> +'
> +
> +test_expect_success 'get by URL: 200' '
> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> +	url="$HTTPD_URL/exists.txt" &&
> +	cat >input <<-EOF &&
> +	capabilities
> +	get $url file2
> +
> +	EOF
> +
> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&

Is this ad-hoc debugging that snuck into the test? Nothing here needs
this GIT_TRACE2_PERF...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
>> [...]
> 
> Add a:
> 
> 	TEST_PASSES_SANITIZE_LEAK=true

I trust this is important for your series that flags tests
that _could_ pass with this. Thanks.

>> +. ./test-lib.sh
>> +
>> +. "$TEST_DIRECTORY"/lib-httpd.sh
>> +start_httpd
>> +
>> +test_expect_success 'get by URL: 404' '
>> +	url="$HTTPD_URL/none.txt" &&
>> +	cat >input <<-EOF &&
>> +	capabilities
>> +	get $url file1
>> +	EOF
>> +
>> +	test_must_fail git remote-http $url <input 2>err &&
>> +	test_path_is_missing file1 &&
>> +	grep "failed to download file at URL" err &&
>> +	rm file1.temp
> 
> This should presumably be a :
> 
> 	test_when_finished "rm file.temp"

Yes. Thanks.

>> +'
>> +
>> +test_expect_success 'get by URL: 200' '
>> +	echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
>> +
>> +	url="$HTTPD_URL/exists.txt" &&
>> +	cat >input <<-EOF &&
>> +	capabilities
>> +	get $url file2
>> +
>> +	EOF
>> +
>> +	GIT_TRACE2_PERF=1 git remote-http $url <input &&
> 
> Is this ad-hoc debugging that snuck into the test? Nothing here needs
> this GIT_TRACE2_PERF...

Good eye. Removed.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this):

On Mon, Aug 01 2022, Derrick Stolee wrote:

> On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
>> 
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> [...]
>> 
>> Add a:
>> 
>> 	TEST_PASSES_SANITIZE_LEAK=true
>
> I trust this is important for your series that flags tests
> that _could_ pass with this. Thanks.

Not really, it does add a "check" mode, and this is one of the few tests
crop up when merged with "seen" if run in that mode, but currentlny it's
only run manually, and we have other outstanding deviations.

It's just something for "master": We have the "linux-leaks" job for a
while now, so it makes sense for new tests to declare if they're
leak-free or not, so we guard against the introduction of leaks with new
code, if possible.

I (or someone else) can always follow-up with something like[1], but as
it's easy it makes sense to just add it in the first place, thanks.

1. https://lore.kernel.org/git/patch-v3-13.15-28255ac3239-20220727T230800Z-avarab@gmail.com/

pack is self contained and is connected.

'get'::
Can use the 'get' command to download a file from a given URI.

If a helper advertises 'connect', Git will use it if possible and
fall back to another capability if the helper requests so when
connecting (see the 'connect' command under COMMANDS).
Expand Down Expand Up @@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
+
Supported if the helper has the "stateless-connect" capability.

'get' <uri> <path>::
Downloads the file from the given `<uri>` to the given `<path>`. If
`<path>.temp` exists, then Git assumes that the `.temp` file is a
partial download from a previous attempt and will resume the
download from that position.

If a fatal error occurs, the program writes the error message to
stderr and exits. The caller should expect that a suitable error
message has been printed if the child closes the connection without
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -906,6 +906,7 @@ LIB_OBJS += blob.o
LIB_OBJS += bloom.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> Before implementing a way to fetch bundles into a repository, create the
> basic logic. Assume that the URI is actually a file path. Future logic
> will make this more careful to other protocols.
> 
> For now, we also only succeed if the content at the URI is a bundle
> file, not a bundle list. Bundle lists will be implemented in a future
> change.
> 
> Note that the discovery of a temporary filename is slightly racy because
> the odb_mkstemp() relies on the temporary file not existing. With the
> current implementation being limited to file copies, we could replace
> the copy_file() with copy_fd(). The tricky part comes in future changes
> that send the filename to 'git remote-https' and its 'get' capability.

> At that point, we need the file descriptor closed _and_ the file
> unlinked.

Ahh, it looks like this was the point I missed in my previous review.
IIUC, we need the file unlinked because http_get_file() will eventually
call finalize_object_file() to move a tempfile to the final object name,
and that will fail if we have an empty file already in place.


> If we were to keep the file descriptor open for the sake of
> normal file copies, then we would pollute the rest of the code for
> little benefit. This is especially the case because we expect that most
> bundle URI use will be based on HTTPS instead of file copies.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  Makefile     |   1 +
>  bundle-uri.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  bundle-uri.h |  14 +++++++
>  3 files changed, 119 insertions(+)
>  create mode 100644 bundle-uri.c
>  create mode 100644 bundle-uri.h
> 
> diff --git a/Makefile b/Makefile
> index 1624471badc..7d5f48069ea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -906,6 +906,7 @@ LIB_OBJS += blob.o
>  LIB_OBJS += bloom.o
>  LIB_OBJS += branch.o
>  LIB_OBJS += bulk-checkin.o
> +LIB_OBJS += bundle-uri.o
>  LIB_OBJS += bundle.o
>  LIB_OBJS += cache-tree.o
>  LIB_OBJS += cbtree.o
> diff --git a/bundle-uri.c b/bundle-uri.c
> new file mode 100644
> index 00000000000..b35babc36aa
> --- /dev/null
> +++ b/bundle-uri.c
> @@ -0,0 +1,104 @@
> +#include "cache.h"
> +#include "bundle-uri.h"
> +#include "bundle.h"
> +#include "object-store.h"
> +#include "refs.h"
> +#include "run-command.h"
> +
> +static int find_temp_filename(struct strbuf *name)
> +{
> +	int fd;
> +	/*
> +	 * Find a temporary filename that is available. This is briefly
> +	 * racy, but unlikely to collide.
> +	 */
> +	fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
> +	if (fd < 0) {
> +		warning(_("failed to create temporary file"));
> +		return -1;
> +	}
> +
> +	close(fd);
> +	unlink(name->buf);
> +	return 0;
> +}
> +
> +static int copy_uri_to_file(const char *file, const char *uri)
> +{
> +	/* File-based URIs only for now. */
> +	return copy_file(file, uri, 0);
> +}
> +
> +static int unbundle_from_file(struct repository *r, const char *file)
> +{
> +	int result = 0;
> +	int bundle_fd;
> +	struct bundle_header header = BUNDLE_HEADER_INIT;
> +	struct string_list_item *refname;
> +	struct strbuf bundle_ref = STRBUF_INIT;
> +	size_t bundle_prefix_len;
> +
> +	if ((bundle_fd = read_bundle_header(file, &header)) < 0)
> +		return 1;
> +
> +	if ((result = unbundle(r, &header, bundle_fd, NULL)))
> +		return 1;
> +
> +	/*
> +	 * Convert all refs/heads/ from the bundle into refs/bundles/
> +	 * in the local repository.
> +	 */
> +	strbuf_addstr(&bundle_ref, "refs/bundles/");
> +	bundle_prefix_len = bundle_ref.len;
> +
> +	for_each_string_list_item(refname, &header.references) {
> +		struct object_id *oid = refname->util;
> +		struct object_id old_oid;
> +		const char *branch_name;
> +		int has_old;
> +
> +		if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
> +			continue;
> +
> +		strbuf_setlen(&bundle_ref, bundle_prefix_len);
> +		strbuf_addstr(&bundle_ref, branch_name);
> +
> +		has_old = !read_ref(bundle_ref.buf, &old_oid);
> +		update_ref("fetched bundle", bundle_ref.buf, oid,
> +			   has_old ? &old_oid : NULL,
> +			   REF_SKIP_OID_VERIFICATION,
> +			   UPDATE_REFS_MSG_ON_ERR);
> +	}
> +
> +	bundle_header_release(&header);

We still also need to release bundle_ref here, right?


> +	return result;
> +}
> +
> +int fetch_bundle_uri(struct repository *r, const char *uri)
> +{
> +	int result = 0;
> +	struct strbuf filename = STRBUF_INIT;
> +
> +	if ((result = find_temp_filename(&filename)))
> +		goto cleanup;
> +
> +	if ((result = copy_uri_to_file(filename.buf, uri))) {
> +		warning(_("failed to download bundle from URI '%s'"), uri);
> +		goto cleanup;
> +	}
> +
> +	if ((result = !is_bundle(filename.buf, 0))) {
> +		warning(_("file at URI '%s' is not a bundle"), uri);
> +		goto cleanup;
> +	}
> +
> +	if ((result = unbundle_from_file(r, filename.buf))) {
> +		warning(_("failed to unbundle bundle from URI '%s'"), uri);
> +		goto cleanup;
> +	}
> +
> +cleanup:
> +	unlink(filename.buf);
> +	strbuf_release(&filename);
> +	return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> new file mode 100644
> index 00000000000..8a152f1ef14
> --- /dev/null
> +++ b/bundle-uri.h
> @@ -0,0 +1,14 @@
> +#ifndef BUNDLE_URI_H
> +#define BUNDLE_URI_H
> +
> +struct repository;
> +
> +/**
> + * Fetch data from the given 'uri' and unbundle the bundle data found
> + * based on that information.
> + *
> + * Returns non-zero if no bundle information is found at the given 'uri'.
> + */
> +int fetch_bundle_uri(struct repository *r, const char *uri);
> +
> +#endif
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> Before implementing a way to fetch bundles into a repository, create the
>> basic logic. Assume that the URI is actually a file path. Future logic
>> will make this more careful to other protocols.
>>
>> For now, we also only succeed if the content at the URI is a bundle
>> file, not a bundle list. Bundle lists will be implemented in a future
>> change.
>>
>> Note that the discovery of a temporary filename is slightly racy because
>> the odb_mkstemp() relies on the temporary file not existing. With the
>> current implementation being limited to file copies, we could replace
>> the copy_file() with copy_fd(). The tricky part comes in future changes
>> that send the filename to 'git remote-https' and its 'get' capability.
> 
>> At that point, we need the file descriptor closed _and_ the file
>> unlinked.
> 
> Ahh, it looks like this was the point I missed in my previous review.
> IIUC, we need the file unlinked because http_get_file() will eventually
> call finalize_object_file() to move a tempfile to the final object name,
> and that will fail if we have an empty file already in place.

Yes, and I also was not sure what would happen if the empty file existed.
I tested it and thought allowing overwriting an existing file would be a
bigger problem than this choice of a filename.

We also discussed options about how it would be nice to have a predictable
filename so we could resume downloads _across Git process failures_
instead of just a network failure within a single Git process. This is
something to explore when creating that functionality.

Thanks,
-Stolee

LIB_OBJS += branch.o
LIB_OBJS += bulk-checkin.o
LIB_OBJS += bundle-uri.o
LIB_OBJS += bundle.o
LIB_OBJS += cache-tree.o
LIB_OBJS += cbtree.o
Expand Down
18 changes: 18 additions & 0 deletions builtin/clone.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#include "list-objects-filter-options.h"
#include "hook.h"
#include "bundle.h"
#include "bundle-uri.h"

/*
* Overall FIXMEs:
Expand Down Expand Up @@ -77,6 +78,7 @@ static int option_filter_submodules = -1; /* unspecified */
static int config_filter_submodules = -1; /* unspecified */
static struct string_list server_options = STRING_LIST_INIT_NODUP;
static int option_remote_submodules;
static const char *bundle_uri;

static int recurse_submodules_cb(const struct option *opt,
const char *arg, int unset)
Expand Down Expand Up @@ -160,6 +162,8 @@ static struct option builtin_clone_options[] = {
N_("any cloned submodules will use their remote-tracking branch")),
OPT_BOOL(0, "sparse", &option_sparse_checkout,
N_("initialize sparse-checkout file to include only files at root")),
OPT_STRING(0, "bundle-uri", &bundle_uri,
N_("uri"), N_("a URI for downloading bundles before fetching from origin remote")),
OPT_END()
};

Expand Down Expand Up @@ -933,6 +937,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
option_no_checkout = 1;
}

if (bundle_uri && deepen)
die(_("--bundle-uri is incompatible with --depth, --shallow-since, and --shallow-exclude"));

repo_name = argv[0];

path = get_repo_path(repo_name, &is_bundle);
Expand Down Expand Up @@ -1232,6 +1239,17 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
if (transport->smart_options && !deepen && !filter_options.choice)
transport->smart_options->check_self_contained_and_connected = 1;

/*
* Before fetching from the remote, download and install bundle
* data from the --bundle-uri option.
*/
if (bundle_uri) {
/* At this point, we need the_repository to match the cloned repo. */
repo_init(the_repository, git_dir, work_tree);
if (fetch_bundle_uri(the_repository, bundle_uri))
warning(_("failed to fetch objects from bundle URI '%s'"),
bundle_uri);
}

strvec_push(&transport_ls_refs_options.ref_prefixes, "HEAD");
refspec_ref_prefixes(&remote->fetch,
Expand Down
168 changes: 168 additions & 0 deletions bundle-uri.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
#include "cache.h"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Josh Steadmon wrote (reply to this):

On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> 
> The previous change created the 'git clone --bundle-uri=<uri>' option.
> Currently, <uri> must be a filename.
> 
> Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
> and use git-remote-https as the way to download the data at that URI.
> Otherwise, check to see if file:// is present and modify the prefix
> accordingly.
> 
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
>  bundle-uri.c                | 70 +++++++++++++++++++++++++++++++++++--
>  t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
>  2 files changed, 112 insertions(+), 3 deletions(-)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index b35babc36aa..7086382b186 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
>  	return 0;
>  }
>  
> -static int copy_uri_to_file(const char *file, const char *uri)
> +static int download_https_uri_to_file(const char *file, const char *uri)
>  {
> -	/* File-based URIs only for now. */
> -	return copy_file(file, uri, 0);
> +	int result = 0;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	FILE *child_in = NULL, *child_out = NULL;
> +	struct strbuf line = STRBUF_INIT;
> +	int found_get = 0;
> +
> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
> +	cp.in = -1;
> +	cp.out = -1;
> +
> +	if (start_command(&cp))
> +		return 1;
> +
> +	child_in = fdopen(cp.in, "w");
> +	if (!child_in) {
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	child_out = fdopen(cp.out, "r");
> +	if (!child_out) {
> +		result = 1;
> +		goto cleanup;
> +	}
> +
> +	fprintf(child_in, "capabilities\n");
> +	fflush(child_in);
> +
> +	while (!strbuf_getline(&line, child_out)) {
> +		if (!line.len)
> +			break;
> +		if (!strcmp(line.buf, "get"))
> +			found_get = 1;
> +	}
> +	strbuf_release(&line);
> +
> +	if (!found_get) {
> +		result = error(_("insufficient capabilities"));
> +		goto cleanup;
> +	}
> +
> +	fprintf(child_in, "get %s %s\n\n", uri, file);
> +
> +cleanup:
> +	if (child_in)
> +		fclose(child_in);
> +	if (finish_command(&cp))
> +		return 1;
> +	if (child_out)
> +		fclose(child_out);
> +	return result;
> +}
> +
> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> +	const char *out;
> +
> +	if (skip_prefix(uri, "https:", &out) ||
> +	    skip_prefix(uri, "http:", &out))
> +		return download_https_uri_to_file(filename, uri);

Since we're not using "out" here, should we replace the skip_prefix()s
with starts_with(), or perhaps even istarts_with()?


> +	if (!skip_prefix(uri, "file://", &out))
> +		out = uri;
> +
> +	/* Copy as a file */
> +	return copy_file(filename, out, 0);
>  }
>  
>  static int unbundle_from_file(struct repository *r, const char *file)
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index f709bcb729c..ad666a2d28a 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'clone with file:// bundle' '
> +	git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
> +		clone-from clone-file &&
> +	git -C clone-file rev-parse refs/bundles/topic >actual &&
> +	git -C clone-from rev-parse topic >expect &&
> +	test_cmp expect actual
> +'
> +
> +#########################################################################
> +# HTTP tests begin here
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'fail to fetch from non-existent HTTP URL' '
> +	test_when_finished rm -rf test &&
> +	git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
> +	grep "failed to download bundle from URI" err
> +'
> +
> +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
> +	test_when_finished rm -rf test &&
> +	echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
> +	git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
> +	grep "is not a bundle" err
> +'
> +
> +test_expect_success 'clone HTTP bundle' '
> +	cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
> +
> +	git clone --no-local --mirror clone-from \
> +		"$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
> +
> +	git clone --bundle-uri="$HTTPD_URL/B.bundle" \
> +		"$HTTPD_URL/smart/fetch.git" clone-http &&
> +	git -C clone-http rev-parse refs/bundles/topic >actual &&
> +	git -C clone-from rev-parse topic >expect &&
> +	test_cmp expect actual &&
> +
> +	test_config -C clone-http log.excludedecoration refs/bundle/
> +'
> +
> +# Do not add tests here unless they use the HTTP server, as they will
> +# not run unless the HTTP dependencies exist.
> +
>  test_done
> -- 
> gitgitgadget
> 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:

>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> +	const char *out;
>> +
>> +	if (skip_prefix(uri, "https:", &out) ||
>> +	    skip_prefix(uri, "http:", &out))
>> +		return download_https_uri_to_file(filename, uri);
> 
> Since we're not using "out" here, should we replace the skip_prefix()s
> with starts_with(), or perhaps even istarts_with()?

Sounds good to me. Can't abandon "out" because it is used
for the "file://" prefix, but istarts_with() is better here.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> +	const char *out;
> +
> +	if (istarts_with(uri, "https:") ||
> +	    istarts_with(uri, "http:"))

Let's be a bit more strict to avoid mistakes and make the code
immediately obvious, e.g.

	if (istarts_with(uri, "https://") ||
	    istarts_with(uri, "http://"))

> +		return download_https_uri_to_file(filename, uri);
> +
> +	if (!skip_prefix(uri, "file://", &out))
> +		out = uri;

If we are using istarts_with because URI scheme name is case
insensitive, shouldn't we do the same for "file://" URL, not
just for "http(s)://" URL?  IOW

	if (!skip_iprefix(uri, "file://", &out))

> +static int download_https_uri_to_file(const char *file, const char *uri)
>  {
> +	int result = 0;
> +	struct child_process cp = CHILD_PROCESS_INIT;
> +	FILE *child_in = NULL, *child_out = NULL;
> +	struct strbuf line = STRBUF_INIT;
> +	int found_get = 0;
> +
> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);

Does "git-remote-https" talk to a "http://" URL just fine when uri
parameter starts with "http://"?  Would it be the same if the uri
parameter begins with say "Http://"?

Other than that, looks sensible.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/2/2022 5:32 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> +	const char *out;
>> +
>> +	if (istarts_with(uri, "https:") ||
>> +	    istarts_with(uri, "http:"))
> 
> Let's be a bit more strict to avoid mistakes and make the code
> immediately obvious, e.g.
> 
> 	if (istarts_with(uri, "https://") ||
> 	    istarts_with(uri, "http://"))
> 
>> +		return download_https_uri_to_file(filename, uri);
>> +
>> +	if (!skip_prefix(uri, "file://", &out))
>> +		out = uri;
> 
> If we are using istarts_with because URI scheme name is case
> insensitive, shouldn't we do the same for "file://" URL, not
> just for "http(s)://" URL?  IOW
> 
> 	if (!skip_iprefix(uri, "file://", &out))

Good ideas. Of course, we don't have a skip_iprefix(), but
I can use "istarts_with()" and then manually add the length.
If we see more need for that in the future, we can consider
adding it.

(It's interesting that these uses in bundle-uri.c are the
only uses of istarts_with() that I see in the codebase.)

>> +static int download_https_uri_to_file(const char *file, const char *uri)
>>  {
>> +	int result = 0;
>> +	struct child_process cp = CHILD_PROCESS_INIT;
>> +	FILE *child_in = NULL, *child_out = NULL;
>> +	struct strbuf line = STRBUF_INIT;
>> +	int found_get = 0;
>> +
>> +	strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
> 
> Does "git-remote-https" talk to a "http://" URL just fine when uri
> parameter starts with "http://"?  Would it be the same if the uri
> parameter begins with say "Http://"?

I did a quick check of our HTTPS tests modifying the HTTPD_PROTO
variable in lib-httpd.sh to "HtTP" and we get this fun error:

+ git clone --filter=blob:limit=0 HtTP://127.0.0.1:5601/smart/server client
Cloning into 'client'...
git: 'remote-HtTP' is not a git command. See 'git --help'.

So I guess I can keep case-sensitive comparisons here.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

>>> +	if (istarts_with(uri, "https:") ||
>>> +	    istarts_with(uri, "http:"))
>> 
>> Let's be a bit more strict to avoid mistakes and make the code
>> immediately obvious, e.g.
>> 
>> 	if (istarts_with(uri, "https://") ||
>> 	    istarts_with(uri, "http://"))

This part (i.e. check for "<scheme>://", not "<scheme>:") still
stands, as the latter could be an scp style Git URL that goes to the
host whose name is <scheme>.  As mentioned later, s/istarts/starts/
is probably a good thing to do here.

>> Does "git-remote-https" talk to a "http://" URL just fine when uri
>> parameter starts with "http://"?  Would it be the same if the uri
>> parameter begins with say "Http://"?
>
> I did a quick check of our HTTPS tests modifying the HTTPD_PROTO
> variable in lib-httpd.sh to "HtTP" and we get this fun error:
>
> + git clone --filter=blob:limit=0 HtTP://127.0.0.1:5601/smart/server client
> Cloning into 'client'...
> git: 'remote-HtTP' is not a git command. See 'git --help'.
>
> So I guess I can keep case-sensitive comparisons here.

Guarding them to lowercase-only may sound like a cop-out to purists,
but I think it is reasonable thing to do.  The only folks that would
be offended by are protocol lawyers, and as your check shows, we are
treating <scheme> case sensitively already.

An obvious alternative is to downcase the "<scheme>://" part but I
do not think it is worth it; we have to do that everywhere and we
need to be confident that we covered all the code paths---the latter
is expensive.

There do exist skip_iprefix() that are used in many places, like
convert, http, mailinfo, etc. by the way.  That is a moot point as
we are not doing case insensitive comparison anymore.

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Teng Long wrote (reply to this):

Derrick Stolee <derrickstolee@github.com> writes:

> +	while (!strbuf_getline(&line, child_out)) {
> +		if (!line.len)
> +			break;
> +		if (!strcmp(line.buf, "get"))
> +			found_get = 1;
> +	}

Clear implementation for me to read, thanks.

I'm not sure but maybe a nit, should it be a "break;" after
"found_get = 1;" for omitting the left uncencerned capabilities
to quit earlier?

Thanks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 8/29/2022 12:58 AM, Teng Long wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> +	while (!strbuf_getline(&line, child_out)) {
>> +		if (!line.len)
>> +			break;
>> +		if (!strcmp(line.buf, "get"))
>> +			found_get = 1;
>> +	}
> 
> Clear implementation for me to read, thanks.
> 
> I'm not sure but maybe a nit, should it be a "break;" after
> "found_get = 1;" for omitting the left uncencerned capabilities
> to quit earlier?

Teng, you are right that it does not hurt to have a 'break'
here. Since this topic has already merged, that change will
need a forward-fix that I can add to a future series.

Thanks,
-Stolee

#include "bundle-uri.h"
#include "bundle.h"
#include "object-store.h"
#include "refs.h"
#include "run-command.h"

static int find_temp_filename(struct strbuf *name)
{
int fd;
/*
* Find a temporary filename that is available. This is briefly
* racy, but unlikely to collide.
*/
fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
if (fd < 0) {
warning(_("failed to create temporary file"));
return -1;
}

close(fd);
unlink(name->buf);
return 0;
}

static int download_https_uri_to_file(const char *file, const char *uri)
{
int result = 0;
struct child_process cp = CHILD_PROCESS_INIT;
FILE *child_in = NULL, *child_out = NULL;
struct strbuf line = STRBUF_INIT;
int found_get = 0;

strvec_pushl(&cp.args, "git-remote-https", uri, NULL);
cp.in = -1;
cp.out = -1;

if (start_command(&cp))
return 1;

child_in = fdopen(cp.in, "w");
if (!child_in) {
result = 1;
goto cleanup;
}

child_out = fdopen(cp.out, "r");
if (!child_out) {
result = 1;
goto cleanup;
}

fprintf(child_in, "capabilities\n");
fflush(child_in);

while (!strbuf_getline(&line, child_out)) {
if (!line.len)
break;
if (!strcmp(line.buf, "get"))
found_get = 1;
}
strbuf_release(&line);

if (!found_get) {
result = error(_("insufficient capabilities"));
goto cleanup;
}

fprintf(child_in, "get %s %s\n\n", uri, file);

cleanup:
if (child_in)
fclose(child_in);
if (finish_command(&cp))
return 1;
if (child_out)
fclose(child_out);
return result;
}

static int copy_uri_to_file(const char *filename, const char *uri)
{
const char *out;

if (starts_with(uri, "https:") ||
starts_with(uri, "http:"))
return download_https_uri_to_file(filename, uri);

if (skip_prefix(uri, "file://", &out))
uri = out;

/* Copy as a file */
return copy_file(filename, uri, 0);
}

static int unbundle_from_file(struct repository *r, const char *file)
{
int result = 0;
int bundle_fd;
struct bundle_header header = BUNDLE_HEADER_INIT;
struct string_list_item *refname;
struct strbuf bundle_ref = STRBUF_INIT;
size_t bundle_prefix_len;

if ((bundle_fd = read_bundle_header(file, &header)) < 0)
return 1;

if ((result = unbundle(r, &header, bundle_fd, NULL)))
return 1;

/*
* Convert all refs/heads/ from the bundle into refs/bundles/
* in the local repository.
*/
strbuf_addstr(&bundle_ref, "refs/bundles/");
bundle_prefix_len = bundle_ref.len;

for_each_string_list_item(refname, &header.references) {
struct object_id *oid = refname->util;
struct object_id old_oid;
const char *branch_name;
int has_old;

if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
continue;

strbuf_setlen(&bundle_ref, bundle_prefix_len);
strbuf_addstr(&bundle_ref, branch_name);

has_old = !read_ref(bundle_ref.buf, &old_oid);
update_ref("fetched bundle", bundle_ref.buf, oid,
has_old ? &old_oid : NULL,
REF_SKIP_OID_VERIFICATION,
UPDATE_REFS_MSG_ON_ERR);
}

bundle_header_release(&header);
return result;
}

int fetch_bundle_uri(struct repository *r, const char *uri)
{
int result = 0;
struct strbuf filename = STRBUF_INIT;

if ((result = find_temp_filename(&filename)))
goto cleanup;

if ((result = copy_uri_to_file(filename.buf, uri))) {
warning(_("failed to download bundle from URI '%s'"), uri);
goto cleanup;
}

if ((result = !is_bundle(filename.buf, 0))) {
warning(_("file at URI '%s' is not a bundle"), uri);
goto cleanup;
}

if ((result = unbundle_from_file(r, filename.buf))) {
warning(_("failed to unbundle bundle from URI '%s'"), uri);
goto cleanup;
}

cleanup:
unlink(filename.buf);
strbuf_release(&filename);
return result;
}
14 changes: 14 additions & 0 deletions bundle-uri.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#ifndef BUNDLE_URI_H
#define BUNDLE_URI_H

struct repository;

/**
* Fetch data from the given 'uri' and unbundle the bundle data found
* based on that information.
*
* Returns non-zero if no bundle information is found at the given 'uri'.
*/
int fetch_bundle_uri(struct repository *r, const char *uri);

#endif
28 changes: 28 additions & 0 deletions remote-curl.c
Original file line number Diff line number Diff line change
Expand Up @@ -1286,6 +1286,29 @@ static void parse_fetch(struct strbuf *buf)
strbuf_reset(buf);
}

static void parse_get(const char *arg)
{
struct strbuf url = STRBUF_INIT;
struct strbuf path = STRBUF_INIT;
const char *space;

space = strchr(arg, ' ');

if (!space)
die(_("protocol error: expected '<url> <path>', missing space"));

strbuf_add(&url, arg, space - arg);
strbuf_addstr(&path, space + 1);

if (http_get_file(url.buf, path.buf, NULL))
die(_("failed to download file at URL '%s'"), url.buf);

strbuf_release(&url);
strbuf_release(&path);
printf("\n");
fflush(stdout);
}

static int push_dav(int nr_spec, const char **specs)
{
struct child_process child = CHILD_PROCESS_INIT;
Expand Down Expand Up @@ -1564,9 +1587,14 @@ int cmd_main(int argc, const char **argv)
printf("unsupported\n");
fflush(stdout);

} else if (skip_prefix(buf.buf, "get ", &arg)) {
parse_get(arg);
fflush(stdout);

} else if (!strcmp(buf.buf, "capabilities")) {
printf("stateless-connect\n");
printf("fetch\n");
printf("get\n");
printf("option\n");
printf("push\n");
printf("check-connectivity\n");
Expand Down
39 changes: 39 additions & 0 deletions t/t5557-http-get.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/sh

test_description='test downloading a file by URL'

TEST_PASSES_SANITIZE_LEAK=true

. ./test-lib.sh

. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd

test_expect_success 'get by URL: 404' '
test_when_finished "rm -f file.temp" &&
url="$HTTPD_URL/none.txt" &&
cat >input <<-EOF &&
capabilities
get $url file1
EOF

test_must_fail git remote-http $url <input 2>err &&
test_path_is_missing file1 &&
grep "failed to download file at URL" err
'

test_expect_success 'get by URL: 200' '
echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&

url="$HTTPD_URL/exists.txt" &&
cat >input <<-EOF &&
capabilities
get $url file2

EOF

git remote-http $url <input &&
test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
'

test_done
Loading