-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bundle URIs II: git clone --bundle-uri #1300
Changes from all commits
4df4a1d
6a6f1a0
00debaf
66a1ce4
ed76d84
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -168,6 +168,9 @@ Supported commands: 'list', 'import'. | |
Can guarantee that when a clone is requested, the received | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Josh Steadmon wrote (reply to this): On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> A future change will want a way to download a file over HTTP(S) using
> the simplest of download mechanisms. We do not want to assume that the
> server on the other side understands anything about the Git protocol but
> could be a simple static web server.
>
> Create the new 'get' capability for the remote helpers which advertises
> that the 'get' command is avalable. A caller can send a line containing
> 'get <url> <path>' to download the file at <url> into the file at
> <path>.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> Documentation/gitremote-helpers.txt | 9 +++++++
> remote-curl.c | 32 +++++++++++++++++++++++++
> t/t5557-http-get.sh | 37 +++++++++++++++++++++++++++++
> 3 files changed, 78 insertions(+)
> create mode 100755 t/t5557-http-get.sh
>
> diff --git a/Documentation/gitremote-helpers.txt b/Documentation/gitremote-helpers.txt
> index 6f1e269ae43..ed8da428c98 100644
> --- a/Documentation/gitremote-helpers.txt
> +++ b/Documentation/gitremote-helpers.txt
> @@ -168,6 +168,9 @@ Supported commands: 'list', 'import'.
> Can guarantee that when a clone is requested, the received
> pack is self contained and is connected.
>
> +'get'::
> + Can use the 'get' command to download a file from a given URI.
> +
> If a helper advertises 'connect', Git will use it if possible and
> fall back to another capability if the helper requests so when
> connecting (see the 'connect' command under COMMANDS).
> @@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability.
> +
> Supported if the helper has the "stateless-connect" capability.
>
> +'get' <uri> <path>::
> + Downloads the file from the given `<uri>` to the given `<path>`. If
> + `<path>.temp` exists, then Git assumes that the `.temp` file is a
> + partial download from a previous attempt and will resume the
> + download from that position.
> +
> If a fatal error occurs, the program writes the error message to
> stderr and exits. The caller should expect that a suitable error
> message has been printed if the child closes the connection without
> diff --git a/remote-curl.c b/remote-curl.c
> index b8758757ece..73fbdbddd84 100644
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -1286,6 +1286,33 @@ static void parse_fetch(struct strbuf *buf)
> strbuf_reset(buf);
> }
>
> +static void parse_get(struct strbuf *buf)
> +{
> + struct strbuf url = STRBUF_INIT;
> + struct strbuf path = STRBUF_INIT;
> + const char *p, *space;
> +
> + if (!skip_prefix(buf->buf, "get ", &p))
> + die(_("http transport does not support %s"), buf->buf);
Nit: since we're already calling skip_prefix(...) in cmd_main() below,
can we just pass the suffix to parse_get() and avoid having to skip the
prefix twice?
> +
> + space = strchr(p, ' ');
> +
> + if (!space)
> + die(_("protocol error: expected '<url> <path>', missing space"));
> +
> + strbuf_add(&url, p, space - p);
> + strbuf_addstr(&path, space + 1);
> +
> + if (http_get_file(url.buf, path.buf, NULL))
> + die(_("failed to download file at URL '%s'"), url.buf);
> +
> + strbuf_release(&url);
> + strbuf_release(&path);
> + printf("\n");
> + fflush(stdout);
> + strbuf_reset(buf);
> +}
> +
> static int push_dav(int nr_spec, const char **specs)
> {
> struct child_process child = CHILD_PROCESS_INIT;
> @@ -1564,9 +1591,14 @@ int cmd_main(int argc, const char **argv)
> printf("unsupported\n");
> fflush(stdout);
>
> + } else if (skip_prefix(buf.buf, "get ", &arg)) {
> + parse_get(&buf);
> + fflush(stdout);
> +
> } else if (!strcmp(buf.buf, "capabilities")) {
> printf("stateless-connect\n");
> printf("fetch\n");
> + printf("get\n");
> printf("option\n");
> printf("push\n");
> printf("check-connectivity\n");
> diff --git a/t/t5557-http-get.sh b/t/t5557-http-get.sh
> new file mode 100755
> index 00000000000..50b7dbcf957
> --- /dev/null
> +++ b/t/t5557-http-get.sh
> @@ -0,0 +1,37 @@
> +#!/bin/sh
> +
> +test_description='test downloading a file by URL'
> +
> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> + url="$HTTPD_URL/none.txt" &&
> + cat >input <<-EOF &&
> + capabilities
> + get $url file1
> + EOF
> +
> + test_must_fail git remote-http $url <input 2>err &&
> + test_path_is_missing file1 &&
> + grep "failed to download file at URL" err &&
> + rm file1.temp
> +'
> +
> +test_expect_success 'get by URL: 200' '
> + echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> + url="$HTTPD_URL/exists.txt" &&
> + cat >input <<-EOF &&
> + capabilities
> + get $url file2
> +
> + EOF
> +
> + GIT_TRACE2_PERF=1 git remote-http $url <input &&
> + test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2
> +'
> +
> +test_done
> --
> gitgitgadget
> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this): On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> [...]
Add a:
TEST_PASSES_SANITIZE_LEAK=true
Before:
> +. ./test-lib.sh
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'get by URL: 404' '
> + url="$HTTPD_URL/none.txt" &&
> + cat >input <<-EOF &&
> + capabilities
> + get $url file1
> + EOF
> +
> + test_must_fail git remote-http $url <input 2>err &&
> + test_path_is_missing file1 &&
> + grep "failed to download file at URL" err &&
> + rm file1.temp
This should presumably be a :
test_when_finished "rm file.temp"
Before the "test_must_fail", otherwise we'll leave this around if
e.g. the "grep" in the &&-chain fails, but we surely want to clean this
up in that case..
> +'
> +
> +test_expect_success 'get by URL: 200' '
> + echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
> +
> + url="$HTTPD_URL/exists.txt" &&
> + cat >input <<-EOF &&
> + capabilities
> + get $url file2
> +
> + EOF
> +
> + GIT_TRACE2_PERF=1 git remote-http $url <input &&
Is this ad-hoc debugging that snuck into the test? Nothing here needs
this GIT_TRACE2_PERF... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
>
> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
>
>> From: Derrick Stolee <derrickstolee@github.com>
>> [...]
>
> Add a:
>
> TEST_PASSES_SANITIZE_LEAK=true
I trust this is important for your series that flags tests
that _could_ pass with this. Thanks.
>> +. ./test-lib.sh
>> +
>> +. "$TEST_DIRECTORY"/lib-httpd.sh
>> +start_httpd
>> +
>> +test_expect_success 'get by URL: 404' '
>> + url="$HTTPD_URL/none.txt" &&
>> + cat >input <<-EOF &&
>> + capabilities
>> + get $url file1
>> + EOF
>> +
>> + test_must_fail git remote-http $url <input 2>err &&
>> + test_path_is_missing file1 &&
>> + grep "failed to download file at URL" err &&
>> + rm file1.temp
>
> This should presumably be a :
>
> test_when_finished "rm file.temp"
Yes. Thanks.
>> +'
>> +
>> +test_expect_success 'get by URL: 200' '
>> + echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" &&
>> +
>> + url="$HTTPD_URL/exists.txt" &&
>> + cat >input <<-EOF &&
>> + capabilities
>> + get $url file2
>> +
>> + EOF
>> +
>> + GIT_TRACE2_PERF=1 git remote-http $url <input &&
>
> Is this ad-hoc debugging that snuck into the test? Nothing here needs
> this GIT_TRACE2_PERF...
Good eye. Removed.
Thanks,
-Stolee There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Ævar Arnfjörð Bjarmason wrote (reply to this): On Mon, Aug 01 2022, Derrick Stolee wrote:
> On 7/27/2022 7:00 PM, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Mon, Jul 25 2022, Derrick Stolee via GitGitGadget wrote:
>>
>>> From: Derrick Stolee <derrickstolee@github.com>
>>> [...]
>>
>> Add a:
>>
>> TEST_PASSES_SANITIZE_LEAK=true
>
> I trust this is important for your series that flags tests
> that _could_ pass with this. Thanks.
Not really, it does add a "check" mode, and this is one of the few tests
crop up when merged with "seen" if run in that mode, but currentlny it's
only run manually, and we have other outstanding deviations.
It's just something for "master": We have the "linux-leaks" job for a
while now, so it makes sense for new tests to declare if they're
leak-free or not, so we guard against the introduction of leaks with new
code, if possible.
I (or someone else) can always follow-up with something like[1], but as
it's easy it makes sense to just add it in the first place, thanks.
1. https://lore.kernel.org/git/patch-v3-13.15-28255ac3239-20220727T230800Z-avarab@gmail.com/ |
||
pack is self contained and is connected. | ||
|
||
'get':: | ||
Can use the 'get' command to download a file from a given URI. | ||
|
||
If a helper advertises 'connect', Git will use it if possible and | ||
fall back to another capability if the helper requests so when | ||
connecting (see the 'connect' command under COMMANDS). | ||
|
@@ -418,6 +421,12 @@ Supported if the helper has the "connect" capability. | |
+ | ||
Supported if the helper has the "stateless-connect" capability. | ||
|
||
'get' <uri> <path>:: | ||
Downloads the file from the given `<uri>` to the given `<path>`. If | ||
`<path>.temp` exists, then Git assumes that the `.temp` file is a | ||
partial download from a previous attempt and will resume the | ||
download from that position. | ||
|
||
If a fatal error occurs, the program writes the error message to | ||
stderr and exits. The caller should expect that a suitable error | ||
message has been printed if the child closes the connection without | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -906,6 +906,7 @@ LIB_OBJS += blob.o | |
LIB_OBJS += bloom.o | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Josh Steadmon wrote (reply to this): On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> Before implementing a way to fetch bundles into a repository, create the
> basic logic. Assume that the URI is actually a file path. Future logic
> will make this more careful to other protocols.
>
> For now, we also only succeed if the content at the URI is a bundle
> file, not a bundle list. Bundle lists will be implemented in a future
> change.
>
> Note that the discovery of a temporary filename is slightly racy because
> the odb_mkstemp() relies on the temporary file not existing. With the
> current implementation being limited to file copies, we could replace
> the copy_file() with copy_fd(). The tricky part comes in future changes
> that send the filename to 'git remote-https' and its 'get' capability.
> At that point, we need the file descriptor closed _and_ the file
> unlinked.
Ahh, it looks like this was the point I missed in my previous review.
IIUC, we need the file unlinked because http_get_file() will eventually
call finalize_object_file() to move a tempfile to the final object name,
and that will fail if we have an empty file already in place.
> If we were to keep the file descriptor open for the sake of
> normal file copies, then we would pollute the rest of the code for
> little benefit. This is especially the case because we expect that most
> bundle URI use will be based on HTTPS instead of file copies.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> Makefile | 1 +
> bundle-uri.c | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++
> bundle-uri.h | 14 +++++++
> 3 files changed, 119 insertions(+)
> create mode 100644 bundle-uri.c
> create mode 100644 bundle-uri.h
>
> diff --git a/Makefile b/Makefile
> index 1624471badc..7d5f48069ea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -906,6 +906,7 @@ LIB_OBJS += blob.o
> LIB_OBJS += bloom.o
> LIB_OBJS += branch.o
> LIB_OBJS += bulk-checkin.o
> +LIB_OBJS += bundle-uri.o
> LIB_OBJS += bundle.o
> LIB_OBJS += cache-tree.o
> LIB_OBJS += cbtree.o
> diff --git a/bundle-uri.c b/bundle-uri.c
> new file mode 100644
> index 00000000000..b35babc36aa
> --- /dev/null
> +++ b/bundle-uri.c
> @@ -0,0 +1,104 @@
> +#include "cache.h"
> +#include "bundle-uri.h"
> +#include "bundle.h"
> +#include "object-store.h"
> +#include "refs.h"
> +#include "run-command.h"
> +
> +static int find_temp_filename(struct strbuf *name)
> +{
> + int fd;
> + /*
> + * Find a temporary filename that is available. This is briefly
> + * racy, but unlikely to collide.
> + */
> + fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX");
> + if (fd < 0) {
> + warning(_("failed to create temporary file"));
> + return -1;
> + }
> +
> + close(fd);
> + unlink(name->buf);
> + return 0;
> +}
> +
> +static int copy_uri_to_file(const char *file, const char *uri)
> +{
> + /* File-based URIs only for now. */
> + return copy_file(file, uri, 0);
> +}
> +
> +static int unbundle_from_file(struct repository *r, const char *file)
> +{
> + int result = 0;
> + int bundle_fd;
> + struct bundle_header header = BUNDLE_HEADER_INIT;
> + struct string_list_item *refname;
> + struct strbuf bundle_ref = STRBUF_INIT;
> + size_t bundle_prefix_len;
> +
> + if ((bundle_fd = read_bundle_header(file, &header)) < 0)
> + return 1;
> +
> + if ((result = unbundle(r, &header, bundle_fd, NULL)))
> + return 1;
> +
> + /*
> + * Convert all refs/heads/ from the bundle into refs/bundles/
> + * in the local repository.
> + */
> + strbuf_addstr(&bundle_ref, "refs/bundles/");
> + bundle_prefix_len = bundle_ref.len;
> +
> + for_each_string_list_item(refname, &header.references) {
> + struct object_id *oid = refname->util;
> + struct object_id old_oid;
> + const char *branch_name;
> + int has_old;
> +
> + if (!skip_prefix(refname->string, "refs/heads/", &branch_name))
> + continue;
> +
> + strbuf_setlen(&bundle_ref, bundle_prefix_len);
> + strbuf_addstr(&bundle_ref, branch_name);
> +
> + has_old = !read_ref(bundle_ref.buf, &old_oid);
> + update_ref("fetched bundle", bundle_ref.buf, oid,
> + has_old ? &old_oid : NULL,
> + REF_SKIP_OID_VERIFICATION,
> + UPDATE_REFS_MSG_ON_ERR);
> + }
> +
> + bundle_header_release(&header);
We still also need to release bundle_ref here, right?
> + return result;
> +}
> +
> +int fetch_bundle_uri(struct repository *r, const char *uri)
> +{
> + int result = 0;
> + struct strbuf filename = STRBUF_INIT;
> +
> + if ((result = find_temp_filename(&filename)))
> + goto cleanup;
> +
> + if ((result = copy_uri_to_file(filename.buf, uri))) {
> + warning(_("failed to download bundle from URI '%s'"), uri);
> + goto cleanup;
> + }
> +
> + if ((result = !is_bundle(filename.buf, 0))) {
> + warning(_("file at URI '%s' is not a bundle"), uri);
> + goto cleanup;
> + }
> +
> + if ((result = unbundle_from_file(r, filename.buf))) {
> + warning(_("failed to unbundle bundle from URI '%s'"), uri);
> + goto cleanup;
> + }
> +
> +cleanup:
> + unlink(filename.buf);
> + strbuf_release(&filename);
> + return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> new file mode 100644
> index 00000000000..8a152f1ef14
> --- /dev/null
> +++ b/bundle-uri.h
> @@ -0,0 +1,14 @@
> +#ifndef BUNDLE_URI_H
> +#define BUNDLE_URI_H
> +
> +struct repository;
> +
> +/**
> + * Fetch data from the given 'uri' and unbundle the bundle data found
> + * based on that information.
> + *
> + * Returns non-zero if no bundle information is found at the given 'uri'.
> + */
> +int fetch_bundle_uri(struct repository *r, const char *uri);
> +
> +#endif
> --
> gitgitgadget
> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> Before implementing a way to fetch bundles into a repository, create the
>> basic logic. Assume that the URI is actually a file path. Future logic
>> will make this more careful to other protocols.
>>
>> For now, we also only succeed if the content at the URI is a bundle
>> file, not a bundle list. Bundle lists will be implemented in a future
>> change.
>>
>> Note that the discovery of a temporary filename is slightly racy because
>> the odb_mkstemp() relies on the temporary file not existing. With the
>> current implementation being limited to file copies, we could replace
>> the copy_file() with copy_fd(). The tricky part comes in future changes
>> that send the filename to 'git remote-https' and its 'get' capability.
>
>> At that point, we need the file descriptor closed _and_ the file
>> unlinked.
>
> Ahh, it looks like this was the point I missed in my previous review.
> IIUC, we need the file unlinked because http_get_file() will eventually
> call finalize_object_file() to move a tempfile to the final object name,
> and that will fail if we have an empty file already in place.
Yes, and I also was not sure what would happen if the empty file existed.
I tested it and thought allowing overwriting an existing file would be a
bigger problem than this choice of a filename.
We also discussed options about how it would be nice to have a predictable
filename so we could resume downloads _across Git process failures_
instead of just a network failure within a single Git process. This is
something to explore when creating that functionality.
Thanks,
-Stolee |
||
LIB_OBJS += branch.o | ||
LIB_OBJS += bulk-checkin.o | ||
LIB_OBJS += bundle-uri.o | ||
LIB_OBJS += bundle.o | ||
LIB_OBJS += cache-tree.o | ||
LIB_OBJS += cbtree.o | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
#include "cache.h" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Josh Steadmon wrote (reply to this): On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> The previous change created the 'git clone --bundle-uri=<uri>' option.
> Currently, <uri> must be a filename.
>
> Update copy_uri_to_file() to first inspect the URI for an HTTP(S) prefix
> and use git-remote-https as the way to download the data at that URI.
> Otherwise, check to see if file:// is present and modify the prefix
> accordingly.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> bundle-uri.c | 70 +++++++++++++++++++++++++++++++++++--
> t/t5558-clone-bundle-uri.sh | 45 ++++++++++++++++++++++++
> 2 files changed, 112 insertions(+), 3 deletions(-)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index b35babc36aa..7086382b186 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -23,10 +23,74 @@ static int find_temp_filename(struct strbuf *name)
> return 0;
> }
>
> -static int copy_uri_to_file(const char *file, const char *uri)
> +static int download_https_uri_to_file(const char *file, const char *uri)
> {
> - /* File-based URIs only for now. */
> - return copy_file(file, uri, 0);
> + int result = 0;
> + struct child_process cp = CHILD_PROCESS_INIT;
> + FILE *child_in = NULL, *child_out = NULL;
> + struct strbuf line = STRBUF_INIT;
> + int found_get = 0;
> +
> + strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
> + cp.in = -1;
> + cp.out = -1;
> +
> + if (start_command(&cp))
> + return 1;
> +
> + child_in = fdopen(cp.in, "w");
> + if (!child_in) {
> + result = 1;
> + goto cleanup;
> + }
> +
> + child_out = fdopen(cp.out, "r");
> + if (!child_out) {
> + result = 1;
> + goto cleanup;
> + }
> +
> + fprintf(child_in, "capabilities\n");
> + fflush(child_in);
> +
> + while (!strbuf_getline(&line, child_out)) {
> + if (!line.len)
> + break;
> + if (!strcmp(line.buf, "get"))
> + found_get = 1;
> + }
> + strbuf_release(&line);
> +
> + if (!found_get) {
> + result = error(_("insufficient capabilities"));
> + goto cleanup;
> + }
> +
> + fprintf(child_in, "get %s %s\n\n", uri, file);
> +
> +cleanup:
> + if (child_in)
> + fclose(child_in);
> + if (finish_command(&cp))
> + return 1;
> + if (child_out)
> + fclose(child_out);
> + return result;
> +}
> +
> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> + const char *out;
> +
> + if (skip_prefix(uri, "https:", &out) ||
> + skip_prefix(uri, "http:", &out))
> + return download_https_uri_to_file(filename, uri);
Since we're not using "out" here, should we replace the skip_prefix()s
with starts_with(), or perhaps even istarts_with()?
> + if (!skip_prefix(uri, "file://", &out))
> + out = uri;
> +
> + /* Copy as a file */
> + return copy_file(filename, out, 0);
> }
>
> static int unbundle_from_file(struct repository *r, const char *file)
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index f709bcb729c..ad666a2d28a 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -33,4 +33,49 @@ test_expect_success 'clone with path bundle' '
> test_cmp expect actual
> '
>
> +test_expect_success 'clone with file:// bundle' '
> + git clone --bundle-uri="file://$(pwd)/clone-from/B.bundle" \
> + clone-from clone-file &&
> + git -C clone-file rev-parse refs/bundles/topic >actual &&
> + git -C clone-from rev-parse topic >expect &&
> + test_cmp expect actual
> +'
> +
> +#########################################################################
> +# HTTP tests begin here
> +
> +. "$TEST_DIRECTORY"/lib-httpd.sh
> +start_httpd
> +
> +test_expect_success 'fail to fetch from non-existent HTTP URL' '
> + test_when_finished rm -rf test &&
> + git clone --bundle-uri="$HTTPD_URL/does-not-exist" . test 2>err &&
> + grep "failed to download bundle from URI" err
> +'
> +
> +test_expect_success 'fail to fetch from non-bundle HTTP URL' '
> + test_when_finished rm -rf test &&
> + echo bogus >"$HTTPD_DOCUMENT_ROOT_PATH/bogus" &&
> + git clone --bundle-uri="$HTTPD_URL/bogus" . test 2>err &&
> + grep "is not a bundle" err
> +'
> +
> +test_expect_success 'clone HTTP bundle' '
> + cp clone-from/B.bundle "$HTTPD_DOCUMENT_ROOT_PATH/B.bundle" &&
> +
> + git clone --no-local --mirror clone-from \
> + "$HTTPD_DOCUMENT_ROOT_PATH/fetch.git" &&
> +
> + git clone --bundle-uri="$HTTPD_URL/B.bundle" \
> + "$HTTPD_URL/smart/fetch.git" clone-http &&
> + git -C clone-http rev-parse refs/bundles/topic >actual &&
> + git -C clone-from rev-parse topic >expect &&
> + test_cmp expect actual &&
> +
> + test_config -C clone-http log.excludedecoration refs/bundle/
> +'
> +
> +# Do not add tests here unless they use the HTTP server, as they will
> +# not run unless the HTTP dependencies exist.
> +
> test_done
> --
> gitgitgadget
> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 7/27/2022 6:09 PM, Josh Steadmon wrote:
> On 2022.07.25 20:34, Derrick Stolee via GitGitGadget wrote:
>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> + const char *out;
>> +
>> + if (skip_prefix(uri, "https:", &out) ||
>> + skip_prefix(uri, "http:", &out))
>> + return download_https_uri_to_file(filename, uri);
>
> Since we're not using "out" here, should we replace the skip_prefix()s
> with starts_with(), or perhaps even istarts_with()?
Sounds good to me. Can't abandon "out" because it is used
for the "file://" prefix, but istarts_with() is better here.
Thanks,
-Stolee There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Junio C Hamano wrote (reply to this): "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +static int copy_uri_to_file(const char *filename, const char *uri)
> +{
> + const char *out;
> +
> + if (istarts_with(uri, "https:") ||
> + istarts_with(uri, "http:"))
Let's be a bit more strict to avoid mistakes and make the code
immediately obvious, e.g.
if (istarts_with(uri, "https://") ||
istarts_with(uri, "http://"))
> + return download_https_uri_to_file(filename, uri);
> +
> + if (!skip_prefix(uri, "file://", &out))
> + out = uri;
If we are using istarts_with because URI scheme name is case
insensitive, shouldn't we do the same for "file://" URL, not
just for "http(s)://" URL? IOW
if (!skip_iprefix(uri, "file://", &out))
> +static int download_https_uri_to_file(const char *file, const char *uri)
> {
> + int result = 0;
> + struct child_process cp = CHILD_PROCESS_INIT;
> + FILE *child_in = NULL, *child_out = NULL;
> + struct strbuf line = STRBUF_INIT;
> + int found_get = 0;
> +
> + strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
Does "git-remote-https" talk to a "http://" URL just fine when uri
parameter starts with "http://"? Would it be the same if the uri
parameter begins with say "Http://"?
Other than that, looks sensible.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 8/2/2022 5:32 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> +static int copy_uri_to_file(const char *filename, const char *uri)
>> +{
>> + const char *out;
>> +
>> + if (istarts_with(uri, "https:") ||
>> + istarts_with(uri, "http:"))
>
> Let's be a bit more strict to avoid mistakes and make the code
> immediately obvious, e.g.
>
> if (istarts_with(uri, "https://") ||
> istarts_with(uri, "http://"))
>
>> + return download_https_uri_to_file(filename, uri);
>> +
>> + if (!skip_prefix(uri, "file://", &out))
>> + out = uri;
>
> If we are using istarts_with because URI scheme name is case
> insensitive, shouldn't we do the same for "file://" URL, not
> just for "http(s)://" URL? IOW
>
> if (!skip_iprefix(uri, "file://", &out))
Good ideas. Of course, we don't have a skip_iprefix(), but
I can use "istarts_with()" and then manually add the length.
If we see more need for that in the future, we can consider
adding it.
(It's interesting that these uses in bundle-uri.c are the
only uses of istarts_with() that I see in the codebase.)
>> +static int download_https_uri_to_file(const char *file, const char *uri)
>> {
>> + int result = 0;
>> + struct child_process cp = CHILD_PROCESS_INIT;
>> + FILE *child_in = NULL, *child_out = NULL;
>> + struct strbuf line = STRBUF_INIT;
>> + int found_get = 0;
>> +
>> + strvec_pushl(&cp.args, "git-remote-https", "origin", uri, NULL);
>
> Does "git-remote-https" talk to a "http://" URL just fine when uri
> parameter starts with "http://"? Would it be the same if the uri
> parameter begins with say "Http://"?
I did a quick check of our HTTPS tests modifying the HTTPD_PROTO
variable in lib-httpd.sh to "HtTP" and we get this fun error:
+ git clone --filter=blob:limit=0 HtTP://127.0.0.1:5601/smart/server client
Cloning into 'client'...
git: 'remote-HtTP' is not a git command. See 'git --help'.
So I guess I can keep case-sensitive comparisons here.
Thanks,
-Stolee There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Junio C Hamano wrote (reply to this): Derrick Stolee <derrickstolee@github.com> writes:
>>> + if (istarts_with(uri, "https:") ||
>>> + istarts_with(uri, "http:"))
>>
>> Let's be a bit more strict to avoid mistakes and make the code
>> immediately obvious, e.g.
>>
>> if (istarts_with(uri, "https://") ||
>> istarts_with(uri, "http://"))
This part (i.e. check for "<scheme>://", not "<scheme>:") still
stands, as the latter could be an scp style Git URL that goes to the
host whose name is <scheme>. As mentioned later, s/istarts/starts/
is probably a good thing to do here.
>> Does "git-remote-https" talk to a "http://" URL just fine when uri
>> parameter starts with "http://"? Would it be the same if the uri
>> parameter begins with say "Http://"?
>
> I did a quick check of our HTTPS tests modifying the HTTPD_PROTO
> variable in lib-httpd.sh to "HtTP" and we get this fun error:
>
> + git clone --filter=blob:limit=0 HtTP://127.0.0.1:5601/smart/server client
> Cloning into 'client'...
> git: 'remote-HtTP' is not a git command. See 'git --help'.
>
> So I guess I can keep case-sensitive comparisons here.
Guarding them to lowercase-only may sound like a cop-out to purists,
but I think it is reasonable thing to do. The only folks that would
be offended by are protocol lawyers, and as your check shows, we are
treating <scheme> case sensitively already.
An obvious alternative is to downcase the "<scheme>://" part but I
do not think it is worth it; we have to do that everywhere and we
need to be confident that we covered all the code paths---the latter
is expensive.
There do exist skip_iprefix() that are used in many places, like
convert, http, mailinfo, etc. by the way. That is a moot point as
we are not doing case insensitive comparison anymore.
Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Teng Long wrote (reply to this): Derrick Stolee <derrickstolee@github.com> writes:
> + while (!strbuf_getline(&line, child_out)) {
> + if (!line.len)
> + break;
> + if (!strcmp(line.buf, "get"))
> + found_get = 1;
> + }
Clear implementation for me to read, thanks.
I'm not sure but maybe a nit, should it be a "break;" after
"found_get = 1;" for omitting the left uncencerned capabilities
to quit earlier?
Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On the Git mailing list, Derrick Stolee wrote (reply to this): On 8/29/2022 12:58 AM, Teng Long wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
>
>> + while (!strbuf_getline(&line, child_out)) {
>> + if (!line.len)
>> + break;
>> + if (!strcmp(line.buf, "get"))
>> + found_get = 1;
>> + }
>
> Clear implementation for me to read, thanks.
>
> I'm not sure but maybe a nit, should it be a "break;" after
> "found_get = 1;" for omitting the left uncencerned capabilities
> to quit earlier?
Teng, you are right that it does not hurt to have a 'break'
here. Since this topic has already merged, that change will
need a forward-fix that I can add to a future series.
Thanks,
-Stolee |
||
#include "bundle-uri.h" | ||
#include "bundle.h" | ||
#include "object-store.h" | ||
#include "refs.h" | ||
#include "run-command.h" | ||
|
||
static int find_temp_filename(struct strbuf *name) | ||
{ | ||
int fd; | ||
/* | ||
* Find a temporary filename that is available. This is briefly | ||
* racy, but unlikely to collide. | ||
*/ | ||
fd = odb_mkstemp(name, "bundles/tmp_uri_XXXXXX"); | ||
if (fd < 0) { | ||
warning(_("failed to create temporary file")); | ||
return -1; | ||
} | ||
|
||
close(fd); | ||
unlink(name->buf); | ||
return 0; | ||
} | ||
|
||
static int download_https_uri_to_file(const char *file, const char *uri) | ||
{ | ||
int result = 0; | ||
struct child_process cp = CHILD_PROCESS_INIT; | ||
FILE *child_in = NULL, *child_out = NULL; | ||
struct strbuf line = STRBUF_INIT; | ||
int found_get = 0; | ||
|
||
strvec_pushl(&cp.args, "git-remote-https", uri, NULL); | ||
cp.in = -1; | ||
cp.out = -1; | ||
|
||
if (start_command(&cp)) | ||
return 1; | ||
|
||
child_in = fdopen(cp.in, "w"); | ||
if (!child_in) { | ||
result = 1; | ||
goto cleanup; | ||
} | ||
|
||
child_out = fdopen(cp.out, "r"); | ||
if (!child_out) { | ||
result = 1; | ||
goto cleanup; | ||
} | ||
|
||
fprintf(child_in, "capabilities\n"); | ||
fflush(child_in); | ||
|
||
while (!strbuf_getline(&line, child_out)) { | ||
if (!line.len) | ||
break; | ||
if (!strcmp(line.buf, "get")) | ||
found_get = 1; | ||
} | ||
strbuf_release(&line); | ||
|
||
if (!found_get) { | ||
result = error(_("insufficient capabilities")); | ||
goto cleanup; | ||
} | ||
|
||
fprintf(child_in, "get %s %s\n\n", uri, file); | ||
|
||
cleanup: | ||
if (child_in) | ||
fclose(child_in); | ||
if (finish_command(&cp)) | ||
return 1; | ||
if (child_out) | ||
fclose(child_out); | ||
return result; | ||
} | ||
|
||
static int copy_uri_to_file(const char *filename, const char *uri) | ||
{ | ||
const char *out; | ||
|
||
if (starts_with(uri, "https:") || | ||
starts_with(uri, "http:")) | ||
return download_https_uri_to_file(filename, uri); | ||
|
||
if (skip_prefix(uri, "file://", &out)) | ||
uri = out; | ||
|
||
/* Copy as a file */ | ||
return copy_file(filename, uri, 0); | ||
} | ||
|
||
static int unbundle_from_file(struct repository *r, const char *file) | ||
{ | ||
int result = 0; | ||
int bundle_fd; | ||
struct bundle_header header = BUNDLE_HEADER_INIT; | ||
struct string_list_item *refname; | ||
struct strbuf bundle_ref = STRBUF_INIT; | ||
size_t bundle_prefix_len; | ||
|
||
if ((bundle_fd = read_bundle_header(file, &header)) < 0) | ||
return 1; | ||
|
||
if ((result = unbundle(r, &header, bundle_fd, NULL))) | ||
return 1; | ||
|
||
/* | ||
* Convert all refs/heads/ from the bundle into refs/bundles/ | ||
* in the local repository. | ||
*/ | ||
strbuf_addstr(&bundle_ref, "refs/bundles/"); | ||
bundle_prefix_len = bundle_ref.len; | ||
|
||
for_each_string_list_item(refname, &header.references) { | ||
struct object_id *oid = refname->util; | ||
struct object_id old_oid; | ||
const char *branch_name; | ||
int has_old; | ||
|
||
if (!skip_prefix(refname->string, "refs/heads/", &branch_name)) | ||
continue; | ||
|
||
strbuf_setlen(&bundle_ref, bundle_prefix_len); | ||
strbuf_addstr(&bundle_ref, branch_name); | ||
|
||
has_old = !read_ref(bundle_ref.buf, &old_oid); | ||
update_ref("fetched bundle", bundle_ref.buf, oid, | ||
has_old ? &old_oid : NULL, | ||
REF_SKIP_OID_VERIFICATION, | ||
UPDATE_REFS_MSG_ON_ERR); | ||
} | ||
|
||
bundle_header_release(&header); | ||
return result; | ||
} | ||
|
||
int fetch_bundle_uri(struct repository *r, const char *uri) | ||
{ | ||
int result = 0; | ||
struct strbuf filename = STRBUF_INIT; | ||
|
||
if ((result = find_temp_filename(&filename))) | ||
goto cleanup; | ||
|
||
if ((result = copy_uri_to_file(filename.buf, uri))) { | ||
warning(_("failed to download bundle from URI '%s'"), uri); | ||
goto cleanup; | ||
} | ||
|
||
if ((result = !is_bundle(filename.buf, 0))) { | ||
warning(_("file at URI '%s' is not a bundle"), uri); | ||
goto cleanup; | ||
} | ||
|
||
if ((result = unbundle_from_file(r, filename.buf))) { | ||
warning(_("failed to unbundle bundle from URI '%s'"), uri); | ||
goto cleanup; | ||
} | ||
|
||
cleanup: | ||
unlink(filename.buf); | ||
strbuf_release(&filename); | ||
return result; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#ifndef BUNDLE_URI_H | ||
#define BUNDLE_URI_H | ||
|
||
struct repository; | ||
|
||
/** | ||
* Fetch data from the given 'uri' and unbundle the bundle data found | ||
* based on that information. | ||
* | ||
* Returns non-zero if no bundle information is found at the given 'uri'. | ||
*/ | ||
int fetch_bundle_uri(struct repository *r, const char *uri); | ||
|
||
#endif |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/bin/sh | ||
|
||
test_description='test downloading a file by URL' | ||
|
||
TEST_PASSES_SANITIZE_LEAK=true | ||
|
||
. ./test-lib.sh | ||
|
||
. "$TEST_DIRECTORY"/lib-httpd.sh | ||
start_httpd | ||
|
||
test_expect_success 'get by URL: 404' ' | ||
test_when_finished "rm -f file.temp" && | ||
url="$HTTPD_URL/none.txt" && | ||
cat >input <<-EOF && | ||
capabilities | ||
get $url file1 | ||
EOF | ||
|
||
test_must_fail git remote-http $url <input 2>err && | ||
test_path_is_missing file1 && | ||
grep "failed to download file at URL" err | ||
' | ||
|
||
test_expect_success 'get by URL: 200' ' | ||
echo data >"$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" && | ||
|
||
url="$HTTPD_URL/exists.txt" && | ||
cat >input <<-EOF && | ||
capabilities | ||
get $url file2 | ||
|
||
EOF | ||
|
||
git remote-http $url <input && | ||
test_cmp "$HTTPD_DOCUMENT_ROOT_PATH/exists.txt" file2 | ||
' | ||
|
||
test_done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):