-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bundle URIs III: Parse and download from bundle lists #1333
Bundle URIs III: Parse and download from bundle lists #1333
Conversation
/submit |
Submitted as pull.1333.git.1661181174.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -4,6 +4,67 @@ | |||
#include "object-store.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +/**
> + * The remote_bundle_info struct contains information for a single bundle
> + * URI. This may be initialized simply by a given URI or might have
> + * additional metadata associated with it if the bundle was advertised by
> + * a bundle list.
> + */
> +struct remote_bundle_info {
> + struct hashmap_entry ent;
> +
> + /**
> + * The 'id' is a name given to the bundle for reference
> + * by other bundle infos.
> + */
> + char *id;
> +
> + /**
> + * The 'uri' is the location of the remote bundle so
> + * it can be downloaded on-demand. This will be NULL
> + * if there was no table of contents.
> + */
> + char *uri;
> +
> + /**
> + * If the bundle has been downloaded, then 'file' is a
> + * filename storing its contents. Otherwise, 'file' is
> + * an empty string.
> + */
> + struct strbuf file;
> +};
Presumably the sequence of events are that first a bundle list is
obtained, with their .file member set to empty, then http worker(s)
download and deposit the contents to files at which time the .file
member is set to the resulting file. The file downloader presumably
uses the usual "create a temporary file, download to it, and then
commit it by closing and then renaming" dance, and the downloading
http worker may want to have two strbufs somewhere it can access to
come up with the name of the temporary and the name of the final
file. But once the result becomes a committed file, its name will
not change, or will it?
At this step without the code that actually uses the data, use of
strbuf, instead of "char *" like id and uri members do, smells like
a premature optimization, and it is unclear if the optimization is
even effective.
Other than that, looks good to me.
@@ -387,6 +387,8 @@ include::config/branch.txt[] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index e376d547ce0..4280af6992e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -387,6 +387,8 @@ include::config/branch.txt[]
>
> include::config/browser.txt[]
>
> +include::config/bundle.txt[]
> +
The file that records a list of bundles may borrow the format of git
config files, but will we store their contents in configuration
files in the receiving (or originating) repository? With the
presence of fields like "bundle.version", I somehow doubt it.
Should "git config --help" list them?
> diff --git a/Documentation/config/bundle.txt b/Documentation/config/bundle.txt
> new file mode 100644
> index 00000000000..3515bfe38d1
> --- /dev/null
> +++ b/Documentation/config/bundle.txt
If the answer is "no", then this file looks out of place.
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ceeef0b6641..ade7eccce39 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -66,6 +66,80 @@ int for_all_bundles_in_list(struct bundle_list *list,
> return 0;
> }
>
> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.
Let's stick to the "error is negative" if we do not have a strong
reason not to.
> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> + struct bundle_list *list)
> +{
> + const char *pkey, *dot;
> + struct strbuf id = STRBUF_INIT;
> + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
> + struct remote_bundle_info *bundle;
> +
> + if (!skip_prefix(key, "bundle.", &pkey))
> + return 1;
> + dot = strchr(pkey, '.');
> + if (!dot) {
> + if (!strcmp(pkey, "version")) {
> + int version = atoi(value);
Can atoi() safely fail? Are we happy of pkey that says "1A" and we
parse it as "1"?
> + if (version != 1)
> + return 1;
> +
> + list->version = version;
> + return 0;
> + }
Is it OK for a bundle list described in the config-file format to
have "bundle.version" twice, giving different values? It feels
counter-intuitive to apply the "last one wins" rule that is usual
for configuration files.
> + if (!strcmp(pkey, "mode")) {
> + if (!strcmp(value, "all"))
> + list->mode = BUNDLE_MODE_ALL;
> + else if (!strcmp(value, "any"))
> + list->mode = BUNDLE_MODE_ANY;
> + else
> + return 1;
> + return 0;
> + }
Likewise for bundle.mode
> + /* Ignore other unknown global keys. */
> + return 0;
> + }
> +
> + strbuf_add(&id, pkey, dot - pkey);
> + dot++;
> +
> + /*
> + * Check for an existing bundle with this <id>, or create one
> + * if necessary.
> + */
> + lookup.id = id.buf;
> + hashmap_entry_init(&lookup.ent, strhash(lookup.id));
> + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
> + CALLOC_ARRAY(bundle, 1);
> + bundle->id = strbuf_detach(&id, NULL);
> + strbuf_init(&bundle->file, 0);
> + hashmap_entry_init(&bundle->ent, strhash(bundle->id));
> + hashmap_add(&list->bundles, &bundle->ent);
> + }
> + strbuf_release(&id);
> +
> + if (!strcmp(dot, "uri")) {
> + free(bundle->uri);
> + bundle->uri = xstrdup(value);
> + return 0;
> + }
This explicitly implements "the last one wins". Would it really
make sense for a server to serve a bundle list that says redundant
and wasteful pieces of information, i.e.
[bundle "1"]
url = one
url = two
It is not like doing so would allow us to reuse an otherwise mostly
good file by appending new information and that would be a performance
or storage win. So I am not quite sure why we want "the last one wins"
rule here. It instead looks like something we want to sanity check
and complain about.
> + /*
> + * At this point, we ignore any information that we don't
> + * understand, assuming it to be hints for a heuristic the client
> + * does not currently understand.
> + */
This is sensible.
> + return 0;
> +}
> +
> static int find_temp_filename(struct strbuf *name)
> {
> int fd;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/22/2022 2:20 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index e376d547ce0..4280af6992e 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -387,6 +387,8 @@ include::config/branch.txt[]
>>
>> include::config/browser.txt[]
>>
>> +include::config/bundle.txt[]
>> +
>
> The file that records a list of bundles may borrow the format of git
> config files, but will we store their contents in configuration
> files in the receiving (or originating) repository? With the
> presence of fields like "bundle.version", I somehow doubt it.
>
> Should "git config --help" list them?
I suppose that at this point, they should be left out, since
writing them to your Git config does nothing.
In the future, having these config values present will advertise
the bundle list during the 'bundle-uri' protocol v2 command. That
could use some clarification in the documentation, too, perhaps
with a "bundle.*" item discussing how all of the other items are
related to that advertisement.
>> +/**
>> + * Given a key-value pair, update the state of the given bundle list.
>> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
>> + * is not understood or the value is malformed.
>
> Let's stick to the "error is negative" if we do not have a strong
> reason not to.
Right. Can do.
>> + */
>> +MAYBE_UNUSED
>> +static int bundle_list_update(const char *key, const char *value,
>> + struct bundle_list *list)
>> +{
>> + const char *pkey, *dot;
>> + struct strbuf id = STRBUF_INIT;
>> + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
>> + struct remote_bundle_info *bundle;
>> +
>> + if (!skip_prefix(key, "bundle.", &pkey))
>> + return 1;
>> + dot = strchr(pkey, '.');
>> + if (!dot) {
>> + if (!strcmp(pkey, "version")) {
>> + int version = atoi(value);
>
> Can atoi() safely fail? Are we happy of pkey that says "1A" and we
> parse it as "1"?
>
>> + if (version != 1)
>> + return 1;
>> +
>> + list->version = version;
>> + return 0;
>> + }
>
> Is it OK for a bundle list described in the config-file format to
> have "bundle.version" twice, giving different values? It feels
> counter-intuitive to apply the "last one wins" rule that is usual
> for configuration files.
...
> This explicitly implements "the last one wins". Would it really
> make sense for a server to serve a bundle list that says redundant
> and wasteful pieces of information, i.e.
>
> [bundle "1"]
> url = one
> url = two
>
> It is not like doing so would allow us to reuse an otherwise mostly
> good file by appending new information and that would be a performance
> or storage win. So I am not quite sure why we want "the last one wins"
> rule here. It instead looks like something we want to sanity check
> and complain about.
I could switch this to "expect at most one value" and add warnings for
duplicate keys. Should duplicate keys then mean "the bundle list is
malformed, abort downloading bundles"? That seems reasonable to me.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jonathan Tan wrote (reply to this):
Derrick Stolee <derrickstolee@github.com> writes:
> On 8/22/2022 2:20 PM, Junio C Hamano wrote:
> > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> >
> >> diff --git a/Documentation/config.txt b/Documentation/config.txt
> >> index e376d547ce0..4280af6992e 100644
> >> --- a/Documentation/config.txt
> >> +++ b/Documentation/config.txt
> >> @@ -387,6 +387,8 @@ include::config/branch.txt[]
> >>
> >> include::config/browser.txt[]
> >>
> >> +include::config/bundle.txt[]
> >> +
> >
> > The file that records a list of bundles may borrow the format of git
> > config files, but will we store their contents in configuration
> > files in the receiving (or originating) repository? With the
> > presence of fields like "bundle.version", I somehow doubt it.
> >
> > Should "git config --help" list them?
>
> I suppose that at this point, they should be left out, since
> writing them to your Git config does nothing.
>
> In the future, having these config values present will advertise
> the bundle list during the 'bundle-uri' protocol v2 command. That
> could use some clarification in the documentation, too, perhaps
> with a "bundle.*" item discussing how all of the other items are
> related to that advertisement.
I think the main point of confusion is that these config variables
currently do nothing when in a repo config, but they will be
subsequently used once we implement advertising them, and it is
convenient that these configs delegate to other files that have the same
format (and that we can specify, at the CLI, a file of the same format).
Maybe documentation like this would clear up the confusion:
bundle.*::
The `bundle.*` keys may appear in a repo's config, in a file
linked by bundle.<id>.uri, or in a file passed to "clone
--bundle-uri".
+
NEEDSWORK: Currently, only the latter 2 situations work. `bundle.*` keys
appearing in a repo's config will take effect once support for
advertising bundles in fetch protocol v2 is implemented.
+
See link:technical/bundle-uri.html[the bundle URI design document] for
more details.
@@ -4,6 +4,140 @@ | |||
#include "object-store.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Ævar Arnfjörð Bjarmason via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> +/**
> + * General API for {transport,connect}.c etc.
> + */
> +int bundle_uri_parse_line(struct bundle_list *list, const char *line)
> +{
> + int result;
> + const char *equals;
> + struct strbuf key = STRBUF_INIT;
> +
> + if (!strlen(line))
> + return error(_("bundle-uri: got an empty line"));
> +
> + equals = strchr(line, '=');
> +
> + if (!equals)
> + return error(_("bundle-uri: line is not of the form 'key=value'"));
> + if (line == equals || !*(equals + 1))
> + return error(_("bundle-uri: line has empty key or value"));
The suggestions implied by my asking fall strictly into the "it does
not have to exist here at this step and we can later extend it", but
for something whose equivalent can be stored in our configuration
file, it is curious why we _insist_ to refuse an empty string as the
value.
I do not miss the "key alone without even '=' means 'true'"
convention, personally, so insisting to have '=' is OK, but the
inability to have an empty string as a value looks a bit disturbing.
This depends on how the helper gets called, but most likely the
caller has a single line of pkt-line that it GAVE us to process, so
it sounds a bit wasteful to insist that "line" to be const to us and
force us to use a separate strbuf, instead of just stuffing NUL at
where we found '=' and pass the two halves to bundle_list_update().
Not a huge deal, it is just something I found funny in the "back in
the days we coded together, Linus would never have written like
this" way.
Other than that small detail, the code looks OK to me.
> + strbuf_add(&key, line, equals - line);
> + result = bundle_list_update(key.buf, equals + 1, list);
> + strbuf_release(&key);
> +
> + return result;
> +}
> diff --git a/bundle-uri.h b/bundle-uri.h
> index 6692aa4b170..f725c9796f7 100644
> --- a/bundle-uri.h
> +++ b/bundle-uri.h
> @@ -76,4 +76,16 @@ int for_all_bundles_in_list(struct bundle_list *list,
> */
> int fetch_bundle_uri(struct repository *r, const char *uri);
>
> -#endif
> +/**
> + * General API for {transport,connect}.c etc.
> + */
> +
> +/**
> + * Parse a "key=value" packet line from the bundle-uri verb.
> + *
> + * Returns 0 on success and non-zero on error.
> + */
> +int bundle_uri_parse_line(struct bundle_list *list,
> + const char *line);
> +
> +#endif /* BUNDLE_URI_H */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/22/2022 3:17 PM, Junio C Hamano wrote:
> "Ævar Arnfjörð Bjarmason via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
>> +/**
>> + * General API for {transport,connect}.c etc.
>> + */
>> +int bundle_uri_parse_line(struct bundle_list *list, const char *line)
>> +{
>> + int result;
>> + const char *equals;
>> + struct strbuf key = STRBUF_INIT;
>> +
>> + if (!strlen(line))
>> + return error(_("bundle-uri: got an empty line"));
>> +
>> + equals = strchr(line, '=');
>> +
>> + if (!equals)
>> + return error(_("bundle-uri: line is not of the form 'key=value'"));
>> + if (line == equals || !*(equals + 1))
>> + return error(_("bundle-uri: line has empty key or value"));
>
> The suggestions implied by my asking fall strictly into the "it does
> not have to exist here at this step and we can later extend it", but
> for something whose equivalent can be stored in our configuration
> file, it is curious why we _insist_ to refuse an empty string as the
> value.
>
> I do not miss the "key alone without even '=' means 'true'"
> convention, personally, so insisting to have '=' is OK, but the
> inability to have an empty string as a value looks a bit disturbing.
I'd be happy to switch this to allow an empty value.
> This depends on how the helper gets called, but most likely the
> caller has a single line of pkt-line that it GAVE us to process, so
> it sounds a bit wasteful to insist that "line" to be const to us and
> force us to use a separate strbuf, instead of just stuffing NUL at
> where we found '=' and pass the two halves to bundle_list_update().
I can look into using a non-const buffer.
Thanks,
-Stolee
@@ -4,6 +4,202 @@ | |||
#include "object-store.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> To allow for the incremental fetch case, teach Git to understand a
> bundle list that could be advertised at an independent bundle URI. Such
> a bundle list is likely to be inspected by human readers, even if only
> by the bundle provider creating the list. For this reason, we can take
> our expected "key=value" pairs and instead format them using Git config
> format.
"can" does not explain why it is a good idea. "As a sequence of
key=value pairs is a lot more dense and harder to read than the
configuration file format, let's declare that it is the format we
use in a file that holds a bundle-list" would be.
I do not personally buy it, though. As I hinted in an earlier step,
some trait we associate with our configuration fioe format, like the
"last one wins" semantics, are undesirable ones, so even if we reuse
the appearance of the text, the semantics would have to become
different (including "syntax errors lead to die()" mentioned
elsewhere in the proposed log message).
> Update 'test-tool bundle-uri' to take this config file format as input.
> It uses a filename instead of stdin because there is no existing way to
> parse a FILE pointer in the config machinery. Using
> git_config_from_mem() is overly complicated and more likely to introduce
> bugs than this simpler version. I would rather have a slightly confusing
> test helper than complicated product code.
All the troubles described above seem to come from the initial
mistake to try reusing the configuration file parser or reusing the
configuration file format, at least to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/22/2022 3:25 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> To allow for the incremental fetch case, teach Git to understand a
>> bundle list that could be advertised at an independent bundle URI. Such
>> a bundle list is likely to be inspected by human readers, even if only
>> by the bundle provider creating the list. For this reason, we can take
>> our expected "key=value" pairs and instead format them using Git config
>> format.
>
> "can" does not explain why it is a good idea. "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.
This "more dense and harder to read" was definitely my intention for
wanting a different format.
> I do not personally buy it, though. As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).
The points you made earlier about "last one wins" semantics are the
biggest road-blocks to using the config file format, from what I've read
so far. We could change those semantics to be different from my current
implementation which respects the "last one wins" rule, and then that
makes the config format match not as closely. That burden of avoiding
multiple key values is not on the end-user but the bundle provider to
match the new expectations. (There might be something we should be careful
about when advertising the bundle list from our Git config in the
'bundle-uri' command in the next series.)
The "syntax errors lead to die()" is mitigated by using
CONFIG_ERROR_ERROR, which is what I meant by "Be careful to call..." I
should have been more clear that we are _not_ going to die() based on the
remote data. We might write an error message and then abort the bundle
download.
With all of these points in mind, I'd still prefer to use the config file
format as described in the design document. If you still don't agree, then
I'll change the format to be key=value pairs split with newlines, and
update the design document accordingly.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jonathan Tan wrote (reply to this):
Junio C Hamano <gitster@pobox.com> writes:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > To allow for the incremental fetch case, teach Git to understand a
> > bundle list that could be advertised at an independent bundle URI. Such
> > a bundle list is likely to be inspected by human readers, even if only
> > by the bundle provider creating the list. For this reason, we can take
> > our expected "key=value" pairs and instead format them using Git config
> > format.
>
> "can" does not explain why it is a good idea. "As a sequence of
> key=value pairs is a lot more dense and harder to read than the
> configuration file format, let's declare that it is the format we
> use in a file that holds a bundle-list" would be.
>
> I do not personally buy it, though. As I hinted in an earlier step,
> some trait we associate with our configuration fioe format, like the
> "last one wins" semantics, are undesirable ones, so even if we reuse
> the appearance of the text, the semantics would have to become
> different (including "syntax errors lead to die()" mentioned
> elsewhere in the proposed log message).
One reason for using the configuration file format (which perhaps could
have been better explained in the commit message) is that we plan to
have a way for a repo to advertise a list of bundles during fetch.
I think that config is a natural place to put that, even with its "last
one wins" semantics.
It could be argued that we can just put a single URI in config and only
allow advertising of a single URI (and then use a different format for
the bundle lists with semantics that are stricter than "last one wins"),
but that seems unnecessarily restrictive (and would make the client make
one more network request). And if we're advertising multiple bundles, it
seems reasonable to make all bundle lists have the same format (whether
they are in config or in a separate file).
This branch is now known as |
This patch series was integrated into seen via git@5367958. |
This patch series was integrated into seen via git@45a6cf2. |
This patch series was integrated into seen via git@475d589. |
This patch series was integrated into seen via git@d6010ce. |
This patch series was integrated into seen via git@0709a31. |
There was a status update in the "New Topics" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. Needs review. source: <pull.1333.git.1661181174.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@713624f. |
This patch series was integrated into seen via git@61be489. |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. Needs review. source: <pull.1333.git.1661181174.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@d5803d0. |
This patch series was integrated into seen via git@b4f8e61. |
This patch series was integrated into seen via git@e028421. |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. Needs review. source: <pull.1333.git.1661181174.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@35b6c51. |
@@ -387,6 +387,8 @@ include::config/branch.txt[] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Glen Choo wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> +/**
> + * Given a key-value pair, update the state of the given bundle list.
> + * Returns 0 if the key-value pair is understood. Returns 1 if the key
> + * is not understood or the value is malformed.
> + */
> +MAYBE_UNUSED
> +static int bundle_list_update(const char *key, const char *value,
> + struct bundle_list *list)
> +{
> + const char *pkey, *dot;
> + struct strbuf id = STRBUF_INIT;
> + struct remote_bundle_info lookup = REMOTE_BUNDLE_INFO_INIT;
> + struct remote_bundle_info *bundle;
> +
> + if (!skip_prefix(key, "bundle.", &pkey))
> + return 1;
> +
> + dot = strchr(pkey, '.');
> + if (!dot) {
> + if (!strcmp(pkey, "version")) {
> + int version = atoi(value);
> + if (version != 1)
> + return 1;
> +
> + list->version = version;
> + return 0;
> + }
> +
> + if (!strcmp(pkey, "mode")) {
> + if (!strcmp(value, "all"))
> + list->mode = BUNDLE_MODE_ALL;
> + else if (!strcmp(value, "any"))
> + list->mode = BUNDLE_MODE_ANY;
> + else
> + return 1;
> + return 0;
> + }
Drive-by comment from Review Club: we could simplify
"section.[subsection.]key" parsing using parse_config_key(). There are
other places in the code that do custom parsing like this, but maybe
they should use parse_config_key() too.
> +
> + /* Ignore other unknown global keys. */
> + return 0;
> + }
> +
> + strbuf_add(&id, pkey, dot - pkey);
> + dot++;
> +
> + /*
> + * Check for an existing bundle with this <id>, or create one
> + * if necessary.
> + */
> + lookup.id = id.buf;
> + hashmap_entry_init(&lookup.ent, strhash(lookup.id));
> + if (!(bundle = hashmap_get_entry(&list->bundles, &lookup, ent, NULL))) {
> + CALLOC_ARRAY(bundle, 1);
> + bundle->id = strbuf_detach(&id, NULL);
> + strbuf_init(&bundle->file, 0);
> + hashmap_entry_init(&bundle->ent, strhash(bundle->id));
> + hashmap_add(&list->bundles, &bundle->ent);
> + }
> + strbuf_release(&id);
> +
> + if (!strcmp(dot, "uri")) {
> + free(bundle->uri);
> + bundle->uri = xstrdup(value);
> + return 0;
> + }
> +
> + /*
> + * At this point, we ignore any information that we don't
> + * understand, assuming it to be hints for a heuristic the client
> + * does not currently understand.
> + */
> + return 0;
> +}
> +
> static int find_temp_filename(struct strbuf *name)
> {
> int fd;
> --
> gitgitgadget
User |
User |
This patch series was integrated into seen via git@21b73fd. |
This patch series was integrated into seen via git@9a5b0fe. |
@@ -387,6 +387,8 @@ include::config/branch.txt[] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Teng Long wrote (reply to this):
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
> +> + init_bundle_list(&list);
> + while (strbuf_getline(&sb, stdin) != EOF) {
> + if (bundle_uri_parse_line(&list, sb.buf) < 0)
> + err = error("bad line: '%s'", sb.buf);
> + }
The command to write such a test is useful for people who
want to experiment about the feature, Thanks. On top of that,
I have a little question about the condition:
if (bundle_uri_parse_line(&list, sb.buf) < 0)
"bundle_uri_parse_line" will call "bundle_list_update" inside, and
could get the result of it as "bundle_uri_parse_line"'s return, then
actually "bundle_list_update" could return "1", so I'm not sure but maybe
the line could modified to:
if (bundle_uri_parse_line(&list, sb.buf))
at here.
Thanks.
This patch series was integrated into seen via git@0a5c856. |
This patch series was integrated into seen via git@077d1cb. |
This patch series was integrated into seen via git@c248d22. |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@1f6debc. |
This patch series was integrated into seen via git@fdc2ce0. |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com> |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@5343e9f. |
This patch series was integrated into seen via git@bec6733. |
This patch series was integrated into seen via git@c22ffd8. |
This patch series was integrated into seen via git@273e05b. |
This patch series was integrated into seen via git@00743d0. |
This patch series was integrated into seen via git@3a33d3e. |
This patch series was integrated into seen via git@415696b. |
On the Git mailing list, Derrick Stolee wrote (reply to this): On 10/12/2022 8:52 AM, Derrick Stolee via GitGitGadget wrote:
> This is the third series building the bundle URI feature. It is built on top
> of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is
> a URI to a bundle file. This series adds the capability of downloading and
> parsing a bundle list and then downloading the URIs in that list.
>
> The core functionality of bundle lists is implemented by creating data
> structures from a list of key-value pairs. These pairs can come from a
> plain-text file in Git config format, but in the future, we will support the
> list being supplied by packet lines over Git's protocol v2 in the
> 'bundle-uri' command (reserved for the next series).
This version has been available for a while now without comment. Could
we consider it for merging to 'next' soon?
I want to wait for this series to merge into 'master' before sending
part IV on top, which advertises bundle URIs over protocol v2.
Thanks,
-Stolee |
On the Git mailing list, Junio C Hamano wrote (reply to this): Derrick Stolee <derrickstolee@github.com> writes:
> This version has been available for a while now without comment. Could
> we consider it for merging to 'next' soon?
Could somebody who has reviewed it fully give an Ack (or two)? I
know an earlier rounds had some comments, but after v3 things have
quieted down.
I know the change from v4 to v5 has good improvements, but do not
claim to have read the other parts in detail.
Thanks. |
This patch series was integrated into seen via git@dcec1e4. |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. Will merge to 'next'? source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@985c158. |
This patch series was integrated into seen via git@d8100ee. |
This patch series was integrated into next via git@9d9092b. |
There was a status update in the "Cooking" section about the branch Define the logical elements of a "bundle list", data structure to store them in-core, format to transfer them, and code to parse them. Will merge to 'master'. source: <pull.1333.v5.git.1665579160.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@d32dd8a. |
This patch series was integrated into master via git@d32dd8a. |
This patch series was integrated into next via git@d32dd8a. |
Closed via d32dd8a. |
This is the third series building the bundle URI feature. It is built on top of ds/bundle-uri-clone, which introduced 'git clone --bundle-uri=' where is a URI to a bundle file. This series adds the capability of downloading and parsing a bundle list and then downloading the URIs in that list.
The core functionality of bundle lists is implemented by creating data structures from a list of key-value pairs. These pairs can come from a plain-text file in Git config format, but in the future, we will support the list being supplied by packet lines over Git's protocol v2 in the 'bundle-uri' command (reserved for the next series).
The patches are organized in this way (updated for v4):
Patch 1 is a cleanup from the previous part. This allows us to simplify our bundle list data structure slightly.
Patches 2-3 create the bundle list data structures and the logic for populating the list from key-value pairs.
Patches 4-5 teach Git to parse "key=value" lines to construct a bundle list. Add unit tests that ensure this logic constructs lists correctly. These patches are adapted from Ævar's RFC [1] and were previously seen in my combined RFC [2].
Patch 6 teaches Git to parse Git config files into bundle lists.
Patches 7-9 implement the ability to download a bundle list and recursively download the contained bundles (and possibly the bundle lists within). This is limited by a constant depth to avoid issues with cycles or otherwise incorrectly configured bundle lists. This also fixes a previous bug when running verify_bundle() multiple times in the same process, as it did not clear the
PREREQ_MARK
flag upon leaving (see patch 8).Patches 10-12 suppress unhelpful warnings from user visibility.
[1] https://lore.kernel.org/git/RFC-cover-v2-00.36-00000000000-20220418T165545Z-avarab@gmail.com/
[2] https://lore.kernel.org/git/pull.1234.git.1653072042.gitgitgadget@gmail.com/
At the end of this series, users can bootstrap clones using 'git clone --bundle-uri= ' where points to a bundle list instead of a single bundle file.
As outlined in the design document [1], the next steps after this are:
I have prepared some of this work as pull requests on my personal fork so curious readers can look ahead to where we are going:
[3] https://lore.kernel.org/git/pull.1248.v3.git.1658757188.gitgitgadget@gmail.com
[4] derrickstolee#21
[5] derrickstolee#22
Updates in v5
verify_bundle()
not working multile times in the same process is fixed without removing the revision walk. Instead, more flags needed to be removed as the method cleaned up after itself.Updates in v4
Properly updated the patch outline.
Jonathan Tan asked for more tests, and this revealed some interesting behaviors which I have now either fixed or made explicit:
refs/bundles/*
refs at the end of the tests. This revealed the need for removing the reachability walk fromverify_bundle()
since the writtenrefs/bundles/*
refs were not being picked up by the loose ref cache. Since removing the reachability walk seemed like the faster (for users) option, I went that direction.verify_bundle()
.Updates in v3
Updates in v2
Thank you to all of the voices who chimed in on the previous version. I'm sorry it took so long for me to get a new version.
I've done a rather thorough overhaul to minimize how often later patches rewrite portions of earlier patches.
We no longer use a strbuf in struct remote_bundle_info. Instead, use a 'char *' and only in the patch where it is first used.
The config documentation is more clearly indicating that the bundle.* section has no effect in the repository config (at the moment, which will change in the next series).
The bundle.version value is now parsed using git_parse_int().
The config key is now parsed using parse_config_key().
Commit messages clarify more about the context of the change in the bigger picture of the bundle URI effort.
Some printf()s are correctly changed to fprintf()s.
The test helper CLI is unified across the two modes. They both take a filename now.
The count of downloaded bundles is now only updated after a successful download, allowing the "any" mode to keep trying after a failure.
Thanks,
cc: gitster@pobox.com
cc: me@ttaylorr.com
cc: newren@gmail.com
cc: avarab@gmail.com
cc: mjcheetham@outlook.com
cc: steadmon@google.com
cc: Glen Choo chooglen@google.com
cc: Jonathan Tan jonathantanmy@google.com
cc: Teng Long dyroneteng@gmail.com