Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase to v2.47.0 #692

Merged
merged 240 commits into from
Oct 9, 2024
Merged

Rebase to v2.47.0 #692

merged 240 commits into from
Oct 9, 2024

Conversation

dscho
Copy link
Member

@dscho dscho commented Oct 8, 2024

Range-diff relative to vfs-2.46.2, part 1/2
  • 2: 014e13a = 1: a3d7a50 t: remove advice from some tests

  • 1: 1b91965 = 2: 394eed4 sparse-index.c: fix use of index hashes in expand_index

  • 11: 8b9f007 = 3: 7bcd46a t5300: confirm failure of git index-pack when non-idx suffix requested

  • 3: aa31f50 = 4: f2db492 t1092: add test for untracked files and directories

  • 4: 0a63e54 (upstream: 3a8cd93) < -: ------------- survey: stub in new experimental git-survey command

  • 5: 8b66a8b (upstream: 5bac1ad) < -: ------------- survey: add command line opts to select references

  • 6: 3e43593 < -: ------------- survey: collect the set of requested refs

  • 7: f984f7c < -: ------------- survey: calculate stats on refs and print results

  • 8: c9e2ad6 < -: ------------- survey: stub in treewalk of reachable commits and objects

  • 9: 72491d7 < -: ------------- survey: add traverse callback for commits

  • 13: 2371769 ! 5: 69b5eb8 index-pack: disable rev-index if index file has non .idx suffix

    @@ Commit message
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
      ## builtin/index-pack.c ##
    -@@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char *prefix)
    +@@ builtin/index-pack.c: int cmd_index_pack(int argc,
      	unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
      	int report_end_of_input = 0;
      	int hash_algo = 0;
    @@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char
      
      	/*
      	 * index-pack never needs to fetch missing objects except when
    -@@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char *prefix)
    +@@ builtin/index-pack.c: int cmd_index_pack(int argc,
      				if (index_name || (i+1) >= argc)
      					usage(index_pack_usage);
      				index_name = argv[++i];
    @@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char
      			} else if (starts_with(arg, "--index-version=")) {
      				char *c;
      				opts.version = strtoul(arg + 16, &c, 10);
    -@@ builtin/index-pack.c: int cmd_index_pack(int argc, const char **argv, const char *prefix)
    +@@ builtin/index-pack.c: int cmd_index_pack(int argc,
      		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
      
      	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
  • 14: a6007ea = 6: fd1e3c2 trace2: prefetch value of GIT_TRACE2_DST_DEBUG at startup

  • -: ------------- > 7: 7715abe survey: calculate more stats on refs

  • -: ------------- > 8: 00f402c amend! survey: add --top= option and config

  • -: ------------- > 9: 01146de survey: show some commits/trees/blobs histograms

  • 10: 79f0a6b ! 10: 56703e7 survey: add vector of largest objects for various scaling dimensions

    @@ Commit message
     
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
    - ## Documentation/config.txt ##
    -@@ Documentation/config.txt: include::config/status.txt[]
    - 
    - include::config/submodule.txt[]
    - 
    -+include::config/survey.txt[]
    -+
    - include::config/tag.txt[]
    - 
    - include::config/tar.txt[]
    -
    - ## Documentation/config/survey.txt (new) ##
    -@@
    -+survey.progress::
    -+	Boolean to show/hide progress information.  Defaults to
    -+	true when interactive (stderr is bound to a TTY).
    -+
    -+survey.showBlobSizes::
    -+	A non-negative integer value.  Requests details on the <n>
    -+	largest file blobs by size in bytes.  Provides a default
    -+	value for `--blob-sizes=<n>` in linkgit:git-survey[1].
    -+
    -+survey.showCommitParents::
    -+	A non-negative integer value.  Requests details on the <n>
    -+	commits with the most number of parents.  Provides a default
    -+	value for `--commit-parents=<n>` in linkgit:git-survey[1].
    -+
    -+survey.showCommitSizes::
    -+	A non-negative integer value.  Requests details on the <n>
    -+	largest commits by size in bytes.  Generally, these are the
    -+	commits with the largest commit messages.  Provides a default
    -+	value for `--commit-sizes=<n>` in linkgit:git-survey[1].
    -+
    -+survey.showTreeEntries::
    -+	A non-negative integer value.  Requests details on the <n>
    -+	trees (directories) with the most number of entries (files
    -+	and subdirectories).  Provides a default value for
    -+	`--tree-entries=<n>` in linkgit:git-survey[1].
    -+
    -+survey.showTreeSizes::
    -+	A non-negative integer value.  Requests details on the <n>
    -+	largest trees (directories) by size in bytes.  This will
    -+	set will usually be equal to the `survey.showTreeEntries`
    -+	set, but may be skewed by very long file or subdirectory
    -+	entry names.  Provides a default value for
    -+	`--tree-sizes=<n>` in linkgit:git-survey[1].
    -+
    -+survey.verbose::
    -+	Boolean to show/hide verbose output.  Default to false.
    + ## Documentation/config/survey.txt ##
    +@@ Documentation/config/survey.txt: survey.*::
    + 	top::
    + 		This integer value implies `--top=<N>`, specifying the
    + 		number of entries in the detail tables.
    ++	showBlobSizes::
    ++		A non-negative integer value.  Requests details on the
    ++		<n> largest file blobs by size in bytes.  Provides a
    ++		default value for `--blob-sizes=<n>` in
    ++		linkgit:git-survey[1].
    ++	showCommitParents::
    ++		A non-negative integer value.  Requests details on the
    ++		<n> commits with the most number of parents.  Provides a
    ++		default value for `--commit-parents=<n>` in
    ++		linkgit:git-survey[1].
    ++	showCommitSizes::
    ++		A non-negative integer value.  Requests details on the
    ++		<n> largest commits by size in bytes.  Generally, these
    ++		are the commits with the largest commit messages.
    ++		Provides a default value for `--commit-sizes=<n>` in
    ++		linkgit:git-survey[1].
    ++	showTreeEntries::
    ++		A non-negative integer value.  Requests details on the
    ++		<n> trees (directories) with the most number of entries
    ++		(files and subdirectories).  Provides a default value
    ++		for `--tree-entries=<n>` in linkgit:git-survey[1].
    ++	showTreeSizes::
    ++		A non-negative integer value.  Requests details on the
    ++		<n> largest trees (directories) by size in bytes.  This
    ++		will set will usually be equal to the
    ++		`survey.showTreeEntries` set, but may be skewed by very
    ++		long file or subdirectory entry names.  Provides a
    ++		default value for `--tree-sizes=<n>` in
    ++		linkgit:git-survey[1].
    + --
     
      ## Documentation/git-survey.txt ##
     @@ Documentation/git-survey.txt: only refs for the given options are added.
    @@ Documentation/git-survey.txt: only refs for the given options are added.
      OUTPUT
      ------
      
    - By default, `git survey` will print information about the repository in a
    - human-readable format that includes overviews and tables.
    +@@ Documentation/git-survey.txt: Reachable Object Summary
    + The reachable object summary shows the total number of each kind of Git
    + object, including tags, commits, trees, and blobs.
      
     +CONFIGURATION
     +-------------
    @@ Documentation/git-survey.txt: only refs for the given options are added.
      Part of the linkgit:git[1] suite
     
      ## builtin/survey.c ##
    -@@ builtin/survey.c: static struct survey_refs_wanted refs_if_unspecified = {
    +@@ builtin/survey.c: static struct survey_refs_wanted default_ref_options = {
      struct survey_opts {
      	int verbose;
      	int show_progress;
    @@ builtin/survey.c: static struct survey_refs_wanted refs_if_unspecified = {
     +
     +	int show_largest_blobs_by_size_bytes;
     +
    + 	int top_nr;
      	struct survey_refs_wanted refs;
      };
    - 
    -+#define DEFAULT_SHOW_LARGEST_VALUE (10)
    -+
    - static struct survey_opts survey_opts = {
    - 	.verbose = 0,
    - 	.show_progress = -1, /* defaults to isatty(2) */
    - 
    -+	/*
    -+	 * Show the largest `n` objects for some scaling dimension.
    -+	 * We allow each to be requested independently.
    -+	 */
    -+	.show_largest_commits_by_nr_parents = DEFAULT_SHOW_LARGEST_VALUE,
    -+	.show_largest_commits_by_size_bytes = DEFAULT_SHOW_LARGEST_VALUE,
    -+
    -+	.show_largest_trees_by_nr_entries = DEFAULT_SHOW_LARGEST_VALUE,
    -+	.show_largest_trees_by_size_bytes = DEFAULT_SHOW_LARGEST_VALUE,
    -+
    -+	.show_largest_blobs_by_size_bytes = DEFAULT_SHOW_LARGEST_VALUE,
    -+
    - 	.refs.want_all_refs = 0,
    - 
    - 	.refs.want_branches = -1, /* default these to undefined */
    -@@ builtin/survey.c: static struct option survey_options[] = {
    - 	OPT_BOOL_F(0, "detached", &survey_opts.refs.want_detached, N_("include detached HEAD"),     PARSE_OPT_NONEG),
    - 	OPT_BOOL_F(0, "other",    &survey_opts.refs.want_other,    N_("include notes and stashes"), PARSE_OPT_NONEG),
    - 
    -+	OPT_INTEGER_F(0, "commit-parents", &survey_opts.show_largest_commits_by_nr_parents, N_("show N largest commits by parent count"),  PARSE_OPT_NONEG),
    -+	OPT_INTEGER_F(0, "commit-sizes",   &survey_opts.show_largest_commits_by_size_bytes, N_("show N largest commits by size in bytes"), PARSE_OPT_NONEG),
    -+
    -+	OPT_INTEGER_F(0, "tree-entries",   &survey_opts.show_largest_trees_by_nr_entries,   N_("show N largest trees by entry count"),     PARSE_OPT_NONEG),
    -+	OPT_INTEGER_F(0, "tree-sizes",     &survey_opts.show_largest_trees_by_size_bytes,   N_("show N largest trees by size in bytes"),   PARSE_OPT_NONEG),
    -+
    -+	OPT_INTEGER_F(0, "blob-sizes",     &survey_opts.show_largest_blobs_by_size_bytes,   N_("show N largest blobs by size in bytes"),   PARSE_OPT_NONEG),
    -+
    - 	OPT_END(),
    - };
    - 
    -@@ builtin/survey.c: static int survey_load_config_cb(const char *var, const char *value,
    - 		return 0;
    - 	}
    - 
    -+	if (!strcmp(var, "survey.showcommitparents")) {
    -+		survey_opts.show_largest_commits_by_nr_parents = git_config_ulong(var, value, ctx->kvi);
    -+		return 0;
    -+	}
    -+	if (!strcmp(var, "survey.showcommitsizes")) {
    -+		survey_opts.show_largest_commits_by_size_bytes = git_config_ulong(var, value, ctx->kvi);
    -+		return 0;
    -+	}
    -+
    -+	if (!strcmp(var, "survey.showtreeentries")) {
    -+		survey_opts.show_largest_trees_by_nr_entries = git_config_ulong(var, value, ctx->kvi);
    -+		return 0;
    -+	}
    -+	if (!strcmp(var, "survey.showtreesizes")) {
    -+		survey_opts.show_largest_trees_by_size_bytes = git_config_ulong(var, value, ctx->kvi);
    -+		return 0;
    -+	}
    -+
    -+	if (!strcmp(var, "survey.showblobsizes")) {
    -+		survey_opts.show_largest_blobs_by_size_bytes = git_config_ulong(var, value, ctx->kvi);
    -+		return 0;
    -+	}
    -+
    - 	return git_default_config(var, value, ctx, pvoid);
    - }
    - 
     @@ builtin/survey.c: static void incr_obj_hist_bin(struct obj_hist_bin *pbin,
      	pbin->cnt_seen++;
      }
    @@ builtin/survey.c: static void incr_obj_hist_bin(struct obj_hist_bin *pbin,
     +
     +static void free_large_item_vec(struct large_item_vec *vec)
     +{
    ++	if (!vec)
    ++		return;
    ++
     +	free(vec->dimension_label);
     +	free(vec->item_label);
     +	free(vec);
    @@ builtin/survey.c: struct survey_stats_trees {
     +	 * Keep a vector of the trees with the most number of entries.
     +	 * This gives us a feel for the width of a tree when there are
     +	 * gigantic directories.
    -+	 */
    + 	 */
    +-	uint64_t max_entries; /* max(nr_entries) -- the width of the largest tree */
     +	struct large_item_vec *vec_largest_by_nr_entries;
     +
     +	/*
     +	 * Keep a vector of the trees with the largest size in bytes.
     +	 * The contents of this may or may not match items in the other
     +	 * vector, since entryname length can alter the results.
    - 	 */
    --	uint64_t max_entries; /* max(nr_entries) -- the width of the largest tree */
    ++	 */
     +	struct large_item_vec *vec_largest_by_size_bytes;
      
      	/*
    @@ builtin/survey.c: struct survey_stats_trees {
     +	struct large_item_vec *vec_largest_by_size_bytes;
      };
      
    - struct survey_stats {
    -@@ builtin/survey.c: static int fill_in_base_object(struct survey_stats_base_object *base,
    - static void traverse_commit_cb(struct commit *commit, void *data)
    - {
    - 	struct survey_stats_commits *psc = &survey_stats.commits;
    -+	unsigned long object_length;
    - 	unsigned k;
    - 
    - 	if ((++survey_progress_total % 1000) == 0)
    - 		display_progress(survey_progress, survey_progress_total);
    - 
    --	fill_in_base_object(&psc->base, &commit->object, OBJ_COMMIT, NULL, NULL);
    -+	fill_in_base_object(&psc->base, &commit->object, OBJ_COMMIT, &object_length, NULL);
    + struct survey_report_object_summary {
    +@@ builtin/survey.c: struct survey_context {
      
    - 	k = commit_list_count(commit->parents);
    -+
    -+	maybe_insert_large_item(psc->vec_largest_by_nr_parents, k, &commit->object.oid);
    -+	maybe_insert_large_item(psc->vec_largest_by_size_bytes, object_length, &commit->object.oid);
    -+
    - 	if (k >= PBIN_VEC_LEN)
    - 		k = PBIN_VEC_LEN - 1;
    --
    - 	psc->parent_cnt_pbin[k]++;
    - }
    - 
    -@@ builtin/survey.c: static void traverse_object_cb_tree(struct object *obj)
    - 
    - 	pst->sum_entries += nr_entries;
    - 
    --	if (nr_entries > pst->max_entries)
    --		pst->max_entries = nr_entries;
    -+	maybe_insert_large_item(pst->vec_largest_by_nr_entries, nr_entries, &obj->oid);
    -+	maybe_insert_large_item(pst->vec_largest_by_size_bytes, object_length, &obj->oid);
    - 
    - 	qb = qbin(nr_entries);
    - 	incr_obj_hist_bin(&pst->entry_qbin[qb], object_length, disk_sizep);
    -@@ builtin/survey.c: static void traverse_object_cb_tree(struct object *obj)
    - static void traverse_object_cb_blob(struct object *obj)
    + static void clear_survey_context(struct survey_context *ctx)
      {
    - 	struct survey_stats_blobs *psb = &survey_stats.blobs;
    -+	unsigned long object_length;
    - 
    --	fill_in_base_object(&psb->base, obj, OBJ_BLOB, NULL, NULL);
    -+	fill_in_base_object(&psb->base, obj, OBJ_BLOB, &object_length, NULL);
    -+
    -+	maybe_insert_large_item(psb->vec_largest_by_size_bytes, object_length, &obj->oid);
    ++	free_large_item_vec(ctx->report.reachable_objects.commits.vec_largest_by_nr_parents);
    ++	free_large_item_vec(ctx->report.reachable_objects.commits.vec_largest_by_size_bytes);
    ++	free_large_item_vec(ctx->report.reachable_objects.trees.vec_largest_by_nr_entries);
    ++	free_large_item_vec(ctx->report.reachable_objects.trees.vec_largest_by_size_bytes);
    ++	free_large_item_vec(ctx->report.reachable_objects.blobs.vec_largest_by_size_bytes);
    ++
    + 	ref_array_clear(&ctx->ref_array);
    + 	strvec_clear(&ctx->refs);
      }
    - 
    - static void traverse_object_cb(struct object *obj, const char *name, void *data)
    -@@ builtin/survey.c: static void write_base_object_json(struct json_writer *jw,
    - 	write_hbin_json(jw, "dist_by_size", base->size_hbin);
    +@@ builtin/survey.c: static void survey_report_commit_parents(struct survey_context *ctx)
    + 	clear_table(&table);
      }
      
    -+static void write_large_item_vec_json(struct json_writer *jw,
    -+				      struct large_item_vec *vec)
    ++static void survey_report_largest_vec(struct large_item_vec *vec)
     +{
    ++	struct survey_table table = SURVEY_TABLE_INIT;
    ++	struct strbuf size = STRBUF_INIT;
    ++
     +	if (!vec || !vec->nr_items)
     +		return;
     +
    -+	jw_object_inline_begin_array(jw, vec->dimension_label);
    -+	{
    -+		int k;
    -+
    -+		for (k = 0; k < vec->nr_items; k++) {
    -+			struct large_item *pk = &vec->items[k];
    -+			if (is_null_oid(&pk->oid))
    -+				break;
    -+
    -+			jw_array_inline_begin_object(jw);
    -+			{
    -+				jw_object_intmax(jw, vec->item_label, pk->size);
    -+				jw_object_string(jw, "oid", oid_to_hex(&pk->oid));
    -+			}
    -+			jw_end(jw);
    ++	table.table_name = vec->dimension_label;
    ++	strvec_pushl(&table.header, "Size", "OID", NULL);
    ++
    ++	for (int k = 0; k < vec->nr_items; k++) {
    ++		struct large_item *pk = &vec->items[k];
    ++		if (!is_null_oid(&pk->oid)) {
    ++			strbuf_reset(&size);
    ++			strbuf_addf(&size, "%"PRIuMAX, (uintmax_t)pk->size);
    ++
    ++			insert_table_rowv(&table, size.buf, oid_to_hex(&pk->oid), NULL);
     +		}
     +	}
    -+	jw_end(jw);
    ++	strbuf_release(&size);
    ++
    ++	print_table_plaintext(&table);
    ++	clear_table(&table);
     +}
     +
    - static void json_commits_section(struct json_writer *jw_top, int pretty, int want_trace2)
    + static void survey_report_plaintext_refs(struct survey_context *ctx)
      {
    - 	struct survey_stats_commits *psc = &survey_stats.commits;
    -@@ builtin/survey.c: static void json_commits_section(struct json_writer *jw_top, int pretty, int wan
    - 	{
    - 		write_base_object_json(&jw_commits, &psc->base);
    + 	struct survey_report_ref_summary *refs = &ctx->report.refs;
    +@@ builtin/survey.c: static void survey_report_plaintext(struct survey_context *ctx)
    + 		&ctx->report.top_paths_by_inflate[REPORT_TYPE_TREE]);
    + 	survey_report_plaintext_sorted_size(
    + 		&ctx->report.top_paths_by_inflate[REPORT_TYPE_BLOB]);
    ++
    ++	survey_report_largest_vec(ctx->report.reachable_objects.commits.vec_largest_by_nr_parents);
    ++	survey_report_largest_vec(ctx->report.reachable_objects.commits.vec_largest_by_size_bytes);
    ++	survey_report_largest_vec(ctx->report.reachable_objects.trees.vec_largest_by_nr_entries);
    ++	survey_report_largest_vec(ctx->report.reachable_objects.trees.vec_largest_by_size_bytes);
    ++	survey_report_largest_vec(ctx->report.reachable_objects.blobs.vec_largest_by_size_bytes);
    + }
    + 
    + /*
    +@@ builtin/survey.c: static int survey_load_config_cb(const char *var, const char *value,
    + 		ctx->opts.show_progress = git_config_bool(var, value);
    + 		return 0;
    + 	}
    ++	if (!strcmp(var, "survey.showcommitparents")) {
    ++		ctx->opts.show_largest_commits_by_nr_parents = git_config_ulong(var, value, cctx->kvi);
    ++		return 0;
    ++	}
    ++	if (!strcmp(var, "survey.showcommitsizes")) {
    ++		ctx->opts.show_largest_commits_by_size_bytes = git_config_ulong(var, value, cctx->kvi);
    ++		return 0;
    ++	}
    ++
    ++	if (!strcmp(var, "survey.showtreeentries")) {
    ++		ctx->opts.show_largest_trees_by_nr_entries = git_config_ulong(var, value, cctx->kvi);
    ++		return 0;
    ++	}
    ++	if (!strcmp(var, "survey.showtreesizes")) {
    ++		ctx->opts.show_largest_trees_by_size_bytes = git_config_ulong(var, value, cctx->kvi);
    ++		return 0;
    ++	}
    ++	if (!strcmp(var, "survey.showblobsizes")) {
    ++		ctx->opts.show_largest_blobs_by_size_bytes = git_config_ulong(var, value, cctx->kvi);
    ++		return 0;
    ++	}
    + 	if (!strcmp(var, "survey.top")) {
    + 		ctx->opts.top_nr = git_config_bool(var, value);
    + 		return 0;
    +@@ builtin/survey.c: static void increment_totals(struct survey_context *ctx,
    + 
    + 			ctx->report.reachable_objects.commits.parent_cnt_pbin[k]++;
    + 			base = &ctx->report.reachable_objects.commits.base;
    ++
    ++			maybe_insert_large_item(ctx->report.reachable_objects.commits.vec_largest_by_nr_parents, k, &commit->object.oid);
    ++			maybe_insert_large_item(ctx->report.reachable_objects.commits.vec_largest_by_size_bytes, object_length, &commit->object.oid);
    + 			break;
    + 		}
    + 		case OBJ_TREE: {
    +@@ builtin/survey.c: static void increment_totals(struct survey_context *ctx,
      
    -+		write_large_item_vec_json(&jw_commits, psc->vec_largest_by_nr_parents);
    -+		write_large_item_vec_json(&jw_commits, psc->vec_largest_by_size_bytes);
    -+
    - 		jw_object_inline_begin_object(&jw_commits, "count_by_nr_parents");
    - 		{
    - 			struct strbuf parent_key = STRBUF_INIT;
    -@@ builtin/survey.c: static void json_trees_section(struct json_writer *jw_top, int pretty, int want_
    - 	{
    - 		write_base_object_json(&jw_trees, &pst->base);
    + 				pst->sum_entries += nr_entries;
      
    --		jw_object_intmax(&jw_trees, "max_entries", pst->max_entries);
    - 		jw_object_intmax(&jw_trees, "sum_entries", pst->sum_entries);
    +-				if (nr_entries > pst->max_entries)
    +-					pst->max_entries = nr_entries;
    ++				maybe_insert_large_item(pst->vec_largest_by_nr_entries, nr_entries, &tree->object.oid);
    ++				maybe_insert_large_item(pst->vec_largest_by_size_bytes, object_length, &tree->object.oid);
      
    -+		write_large_item_vec_json(&jw_trees, pst->vec_largest_by_nr_entries);
    -+		write_large_item_vec_json(&jw_trees, pst->vec_largest_by_size_bytes);
    + 				qb = qbin(nr_entries);
    + 				incr_obj_hist_bin(&pst->entry_qbin[qb], object_length, disk_sizep);
    +@@ builtin/survey.c: static void increment_totals(struct survey_context *ctx,
    + 		}
    + 		case OBJ_BLOB:
    + 			base = &ctx->report.reachable_objects.blobs.base;
     +
    - 		write_qbin_json(&jw_trees, "dist_by_nr_entries", pst->entry_qbin);
    - 	}
    - 	jw_end(&jw_trees);
    -@@ builtin/survey.c: static void json_blobs_section(struct json_writer *jw_top, int pretty, int want_
    - 	jw_object_begin(&jw_blobs, pretty);
    - 	{
    - 		write_base_object_json(&jw_blobs, &psb->base);
    ++			maybe_insert_large_item(ctx->report.reachable_objects.blobs.vec_largest_by_size_bytes, object_length, &oids->oid[i]);
    + 			break;
    + 		default:
    + 			continue;
    +@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix, struct repositor
    + 		OPT_BOOL_F(0, "detached", &ctx.opts.refs.want_detached, N_("include detached HEAD"),     PARSE_OPT_NONEG),
    + 		OPT_BOOL_F(0, "other",    &ctx.opts.refs.want_other,    N_("include notes and stashes"), PARSE_OPT_NONEG),
    + 
    ++		OPT_INTEGER_F(0, "commit-parents", &ctx.opts.show_largest_commits_by_nr_parents, N_("show N largest commits by parent count"),  PARSE_OPT_NONEG),
    ++		OPT_INTEGER_F(0, "commit-sizes",   &ctx.opts.show_largest_commits_by_size_bytes, N_("show N largest commits by size in bytes"), PARSE_OPT_NONEG),
     +
    -+		write_large_item_vec_json(&jw_blobs, psb->vec_largest_by_size_bytes);
    - 	}
    - 	jw_end(&jw_blobs);
    ++		OPT_INTEGER_F(0, "tree-entries",   &ctx.opts.show_largest_trees_by_nr_entries,   N_("show N largest trees by entry count"),     PARSE_OPT_NONEG),
    ++		OPT_INTEGER_F(0, "tree-sizes",     &ctx.opts.show_largest_trees_by_size_bytes,   N_("show N largest trees by size in bytes"),   PARSE_OPT_NONEG),
    ++
    ++		OPT_INTEGER_F(0, "blob-sizes",     &ctx.opts.show_largest_blobs_by_size_bytes,   N_("show N largest blobs by size in bytes"),   PARSE_OPT_NONEG),
    ++
    + 		OPT_END(),
    + 	};
    + 
    +@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix, struct repositor
      
    -@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix)
    - 		survey_opts.show_progress = isatty(2);
    - 	fixup_refs_wanted();
    + 	fixup_refs_wanted(&ctx);
      
    -+	if (survey_opts.show_largest_commits_by_nr_parents)
    -+		survey_stats.commits.vec_largest_by_nr_parents =
    ++	if (ctx.opts.show_largest_commits_by_nr_parents)
    ++		ctx.report.reachable_objects.commits.vec_largest_by_nr_parents =
     +			alloc_large_item_vec(
     +				"largest_commits_by_nr_parents",
     +				"nr_parents",
    -+				survey_opts.show_largest_commits_by_nr_parents);
    -+	if (survey_opts.show_largest_commits_by_size_bytes)
    -+		survey_stats.commits.vec_largest_by_size_bytes =
    ++				ctx.opts.show_largest_commits_by_nr_parents);
    ++	if (ctx.opts.show_largest_commits_by_size_bytes)
    ++		ctx.report.reachable_objects.commits.vec_largest_by_size_bytes =
     +			alloc_large_item_vec(
     +				"largest_commits_by_size_bytes",
     +				"size",
    -+				survey_opts.show_largest_commits_by_size_bytes);
    ++				ctx.opts.show_largest_commits_by_size_bytes);
     +
    -+	if (survey_opts.show_largest_trees_by_nr_entries)
    -+		survey_stats.trees.vec_largest_by_nr_entries =
    ++	if (ctx.opts.show_largest_trees_by_nr_entries)
    ++		ctx.report.reachable_objects.trees.vec_largest_by_nr_entries =
     +			alloc_large_item_vec(
     +				"largest_trees_by_nr_entries",
     +				"nr_entries",
    -+				survey_opts.show_largest_trees_by_nr_entries);
    -+	if (survey_opts.show_largest_trees_by_size_bytes)
    -+		survey_stats.trees.vec_largest_by_size_bytes =
    ++				ctx.opts.show_largest_trees_by_nr_entries);
    ++	if (ctx.opts.show_largest_trees_by_size_bytes)
    ++		ctx.report.reachable_objects.trees.vec_largest_by_size_bytes =
     +			alloc_large_item_vec(
     +				"largest_trees_by_size_bytes",
     +				"size",
    -+				survey_opts.show_largest_trees_by_size_bytes);
    ++				ctx.opts.show_largest_trees_by_size_bytes);
     +
    -+	if (survey_opts.show_largest_blobs_by_size_bytes)
    -+		survey_stats.blobs.vec_largest_by_size_bytes =
    ++	if (ctx.opts.show_largest_blobs_by_size_bytes)
    ++		ctx.report.reachable_objects.blobs.vec_largest_by_size_bytes =
     +			alloc_large_item_vec(
     +				"largest_blobs_by_size_bytes",
     +				"size",
    -+				survey_opts.show_largest_blobs_by_size_bytes);
    ++				ctx.opts.show_largest_blobs_by_size_bytes);
     +
    - 	survey_phase_refs(the_repository);
    + 	survey_phase_refs(&ctx);
      
    - 	survey_emit_trace2();
    - 	survey_print_json();
    - 
    - 	strvec_clear(&survey_vec_refs_wanted);
    -+	free_large_item_vec(survey_stats.commits.vec_largest_by_nr_parents);
    -+	free_large_item_vec(survey_stats.commits.vec_largest_by_size_bytes);
    -+	free_large_item_vec(survey_stats.trees.vec_largest_by_nr_entries);
    -+	free_large_item_vec(survey_stats.trees.vec_largest_by_size_bytes);
    -+	free_large_item_vec(survey_stats.blobs.vec_largest_by_size_bytes);
    - 
    - 	return 0;
    - }
    + 	survey_phase_objects(&ctx);
  • 12: ab094c4 < -: ------------- survey: add pathname of blob or tree to large_item_vec

  • 15: 9ea4cce < -: ------------- survey: add commit-oid to large_item detail

  • 16: 5330029 < -: ------------- survey: add commit name-rev lookup to each large_item

  • 17: 0d9f6ae < -: ------------- survey: add --json option and setup for pretty output

  • 18: 50d2203 < -: ------------- survey: add pretty printing of stats

  • 19: acf0691 < -: ------------- t8100: create test for git-survey

  • 20: 440477b < -: ------------- survey: add --no-name-rev option

  • -: ------------- > 11: 6e58ef5 survey: add pathname of blob or tree to large_item_vec

  • -: ------------- > 12: d012ace survey: add commit-oid to large_item detail

  • -: ------------- > 13: 65e4955 survey: add commit name-rev lookup to each large_item

  • -: ------------- > 14: 9d157fd survey: add --no-name-rev option

  • 21: 31ecb35 ! 15: 36931e9 survey: started TODO list at bottom of source file

    @@ Commit message
         survey: started TODO list at bottom of source file
     
      ## builtin/survey.c ##
    -@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix)
    - 
    +@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix, struct repositor
    + 	clear_survey_context(&ctx);
      	return 0;
      }
     +
  • 22: 630e4a6 ! 16: cbc6815 survey: expanded TODO list at the bottom of the source file

    @@ Commit message
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
      ## builtin/survey.c ##
    -@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix)
    +@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix, struct repositor
      }
      
      /*
  • 23: 458e9bc ! 17: 75357a3 survey: expanded TODO with more notes

    @@ Commit message
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
      ## builtin/survey.c ##
    -@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix)
    +@@ builtin/survey.c: int cmd_survey(int argc, const char **argv, const char *prefix, struct repositor
       *    size of the set of "refs/tags/" that we visited while building
       *    the `ref_info` and `ref_array` and not need to ask the remote.
       *
  • 24: cb271c6 < -: ------------- survey: clearly note the experimental nature in the output

  • 25: e45516e = 18: 590aed4 reset --stdin: trim carriage return from the paths

  • 26: 3da93cd ! 19: 5e80a3a Identify microsoft/git via a distinct version suffix

    @@ Commit message
      ## GIT-VERSION-GEN ##
     @@
      GVF=GIT-VERSION-FILE
    - DEF_VER=v2.46.2
    + DEF_VER=v2.47.0
      
     +# Identify microsoft/git via a distinct version suffix
     +DEF_VER=$DEF_VER.vfs.0.0
  • 27: 17050de = 20: a0ff411 gvfs: ensure that the version is based on a GVFS tag

  • 28: 4a42a9e = 21: 6ef6540 gvfs: add a GVFS-specific header file

  • 149: 2f54fc6 ! 22: 030b553 git_config_set_multivar_in_file_gently(): add a lock timeout

    @@ Documentation/config/core.txt: core.WSLCompat::
     +	locked.
     
      ## config.c ##
    -@@ config.c: int git_config_set_multivar_in_file_gently(const char *config_filename,
    - 					   const char *comment,
    - 					   unsigned flags)
    +@@ config.c: int repo_config_set_multivar_in_file_gently(struct repository *r,
    + 					    const char *comment,
    + 					    unsigned flags)
      {
     +	static unsigned long timeout_ms = ULONG_MAX;
      	int fd = -1, in_fd = -1;
      	int ret;
      	struct lock_file lock = LOCK_INIT;
    -@@ config.c: int git_config_set_multivar_in_file_gently(const char *config_filename,
    +@@ config.c: int repo_config_set_multivar_in_file_gently(struct repository *r,
      	if (!config_filename)
    - 		config_filename = filename_buf = git_pathdup("config");
    + 		config_filename = filename_buf = repo_git_path(r, "config");
      
     +	if ((long)timeout_ms < 0 &&
     +	    git_config_get_ulong("core.configWriteLockTimeoutMS", &timeout_ms))
  • 152: e6eb3f7 ! 23: b73befa scalar: set the config write-lock timeout to 150ms

    @@ Commit message
     
      ## scalar.c ##
     @@ scalar.c: static int set_recommended_config(int reconfigure)
    - 		{ "core.autoCRLF", "false" },
      		{ "core.safeCRLF", "false" },
      		{ "fetch.showForcedUpdates", "false" },
    + 		{ "push.usePathWalk", "true" },
     +		{ "core.configWriteLockTimeoutMS", "150" },
      		{ NULL, NULL },
      	};
  • 153: 4d7e22a = 24: d081111 scalar: add docs from microsoft/scalar

  • 154: 7c7c1ad = 25: 7b212db scalar (Windows): use forward slashes as directory separators

  • 155: 781d148 = 26: 8521b1a scalar: add retry logic to run_git()

  • 156: 5fea241 = 27: d97b7b2 scalar: support the config command for backwards compatibility

  • 187: ded2fed = 28: a9de9d0 sequencer: avoid progress when stderr is redirected

  • 29: 6240eb8 ! 29: e3d9c14 gvfs: add the core.gvfs config setting

    @@ environment.c: int grafts_keep_true_parents;
      unsigned long pack_size_limit_cfg;
     
      ## environment.h ##
    -@@ environment.h: int get_shared_repository(void);
    - void reset_shared_repository(void);
    +@@ environment.h: extern unsigned long pack_size_limit_cfg;
    + extern int max_allowed_tree_depth;
      
      extern int core_preload_index;
     +extern int core_gvfs;
    @@ gvfs.c (new)
     +static int core_gvfs_is_bool;
     +
     +static int early_core_gvfs_config(const char *var, const char *value,
    -+				  const struct config_context *ctx, void *cb)
    ++				  const struct config_context *ctx, void *cb UNUSED)
     +{
     +	if (!strcmp(var, "core.gvfs"))
     +		core_gvfs = git_config_bool_or_int("core.gvfs", value, ctx->kvi,
    @@ gvfs.c (new)
     +		struct key_value_info default_kvi = KVI_INIT;
     +		core_gvfs = git_config_bool_or_int("core.gvfs", value, &default_kvi, &core_gvfs_is_bool);
     +	} else if (startup_info->have_repository == 0)
    -+		read_early_config(early_core_gvfs_config, NULL);
    ++		read_early_config(the_repository, early_core_gvfs_config, NULL);
     +	else
     +		repo_config_get_bool_or_int(the_repository, "core.gvfs",
     +					    &core_gvfs_is_bool, &core_gvfs);
  • 30: 3b84be3 ! 30: dd7004d gvfs: add the feature to skip writing the index' SHA-1

    @@ gvfs.h
     
      ## repo-settings.c ##
     @@
    + #include "repo-settings.h"
    + #include "repository.h"
      #include "midx.h"
    - #include "fsmonitor-ipc.h"
    - #include "fsmonitor-settings.h"
     +#include "gvfs.h"
      
      static void repo_cfg_bool(struct repository *r, const char *key, int *dest,
  • 31: 7d85cde ! 31: f9b8c1e gvfs: add the feature that blobs may be missing

    @@ Documentation/config/core.txt: core.gvfs::
     
      ## cache-tree.c ##
     @@
    + 
      #include "git-compat-util.h"
    - #include "environment.h"
      #include "hex.h"
     +#include "gvfs.h"
      #include "lockfile.h"
  • 32: f17a9a5 = 32: c7ee926 gvfs: prevent files to be deleted outside the sparse checkout

  • 33: e2892a4 = 33: d25865c gvfs: optionally skip reachability checks/upload pack during fetch

  • 34: 0643eb6 = 34: 162137d gvfs: ensure all filters and EOL conversions are blocked

  • 35: 8e96976 ! 35: 2516df2 gvfs: allow "virtualizing" objects

    @@ environment.c: int core_gvfs;
      int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
      unsigned long pack_size_limit_cfg;
     +int core_virtualize_objects;
    - enum log_refs_config log_all_ref_updates = LOG_REFS_UNSET;
      int max_allowed_tree_depth =
      #ifdef _MSC_VER
    + 	/*
     
      ## environment.h ##
    -@@ environment.h: struct strvec;
    - extern const char *comment_line_str;
    +@@ environment.h: extern const char *comment_line_str;
    + extern char *comment_line_str_to_free;
      extern int auto_comment_line_char;
      
     +extern int core_virtualize_objects;
    -+
    - /*
    -  * Wrapper of getenv() that returns a strdup value. This value is kept
    -  * in argv to be freed later.
    + # endif /* USE_THE_REPOSITORY_VARIABLE */
    + #endif /* ENVIRONMENT_H */
     
      ## object-file.c ##
     @@
    @@ object-file.c: void disable_obj_read_lock(void)
      	pthread_mutex_destroy(&obj_read_mutex);
      }
      
    -+static int run_read_object_hook(const struct object_id *oid)
    ++static int run_read_object_hook(struct repository *r, const struct object_id *oid)
     +{
     +	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
     +	int ret;
    @@ object-file.c: void disable_obj_read_lock(void)
     +
     +	start = getnanotime();
     +	strvec_push(&opt.args, oid_to_hex(oid));
    -+	ret = run_hooks_opt("read-object", &opt);
    ++	ret = run_hooks_opt(r, "read-object", &opt);
     +	trace_performance_since(start, "run_read_object_hook");
     +
     +	return ret;
    @@ object-file.c: static int do_oid_object_info_extended(struct repository *r,
      				break;
     +			if (core_virtualize_objects && !tried_hook) {
     +				tried_hook = 1;
    -+				if (!run_read_object_hook(oid))
    ++				if (!run_read_object_hook(r, oid))
     +					goto retry;
     +			}
      		}
  • 36: 8aee64b ! 36: 06584ba Hydrate missing loose objects in check_and_freshen()

    @@ object-file.c: int has_alt_odb(struct repository *r)
     +	struct read_object_process *entry;
     +	struct child_process *process;
     +	struct strbuf status = STRBUF_INIT;
    -+	const char *cmd = find_hook("read-object");
    ++	const char *cmd = find_hook(the_repository, "read-object");
     +	uint64_t start;
     +
     +	start = getnanotime();
    @@ object-file.c: void disable_obj_read_lock(void)
      	pthread_mutex_destroy(&obj_read_mutex);
      }
      
    --static int run_read_object_hook(const struct object_id *oid)
    +-static int run_read_object_hook(struct repository *r, const struct object_id *oid)
     -{
     -	struct run_hooks_opt opt = RUN_HOOKS_OPT_INIT;
     -	int ret;
    @@ object-file.c: void disable_obj_read_lock(void)
     -
     -	start = getnanotime();
     -	strvec_push(&opt.args, oid_to_hex(oid));
    --	ret = run_hooks_opt("read-object", &opt);
    +-	ret = run_hooks_opt(r, "read-object", &opt);
     -	trace_performance_since(start, "run_read_object_hook");
     -
     -	return ret;
    @@ object-file.c: static int do_oid_object_info_extended(struct repository *r,
      				break;
      			if (core_virtualize_objects && !tried_hook) {
      				tried_hook = 1;
    --				if (!run_read_object_hook(oid))
    +-				if (!run_read_object_hook(r, oid))
     +				if (!read_object_process(oid))
      					goto retry;
      			}
  • 37: 95454ff ! 37: f3f8f88 sha1_file: when writing objects, skip the read_object_hook

    @@ object-file.c: int has_loose_object_nonlocal(const struct object_id *oid)
      
      static void mmap_limit_check(size_t length)
     @@ object-file.c: static int write_loose_object(const struct object_id *oid, char *hdr,
    - 	return finalize_object_file(tmp_file.buf, filename.buf);
    + 					  FOF_SKIP_COLLISION_CHECK);
      }
      
     -static int freshen_loose_object(const struct object_id *oid)
  • 38: 6d01c9d ! 38: ec080b0 gvfs: add global command pre and post hook procs

    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
     +static int run_post_hook = 0;
     +static int exit_code = -1;
     +
    -+static int run_pre_command_hook(const char **argv)
    ++static int run_pre_command_hook(struct repository *r, const char **argv)
     +{
     +	char *lock;
     +	int ret = 0;
    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
     +	/* call the hook proc */
     +	strvec_pushv(&sargv, argv);
     +	strvec_pushv(&opt.args, sargv.v);
    -+	ret = run_hooks_opt("pre-command", &opt);
    ++	ret = run_hooks_opt(r, "pre-command", &opt);
     +
     +	if (!ret)
     +		run_post_hook = 1;
     +	return ret;
     +}
     +
    -+static int run_post_command_hook(void)
    ++static int run_post_command_hook(struct repository *r)
     +{
     +	char *lock;
     +	int ret = 0;
    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
     +
     +	strvec_pushv(&opt.args, sargv.v);
     +	strvec_pushf(&opt.args, "--exit_code=%u", exit_code);
    -+	ret = run_hooks_opt("post-command", &opt);
    ++	ret = run_hooks_opt(r, "post-command", &opt);
     +
     +	run_post_hook = 0;
     +	strvec_clear(&sargv);
    @@ git.c: static int handle_alias(int *argcp, const char ***argv)
     +
     +static void post_command_hook_atexit(void)
     +{
    -+	run_post_command_hook();
    ++	run_post_command_hook(the_repository);
     +}
     +
    - static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
    + static int run_builtin(struct cmd_struct *p, int argc, const char **argv, struct repository *repo)
      {
      	int status, help;
    -@@ git.c: static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
    +@@ git.c: static int run_builtin(struct cmd_struct *p, int argc, const char **argv, struct
      	if (!help && p->option & NEED_WORK_TREE)
      		setup_work_tree();
      
    -+	if (run_pre_command_hook(argv))
    ++	if (run_pre_command_hook(the_repository, argv))
     +		die("pre-command hook aborted command");
     +
      	trace_argv_printf(argv, "trace: built-in: git");
      	trace2_cmd_name(p->cmd);
      
    - 	validate_cache_entries(the_repository->index);
    --	status = p->fn(argc, argv, prefix);
    -+	exit_code = status = p->fn(argc, argv, prefix);
    - 	validate_cache_entries(the_repository->index);
    + 	validate_cache_entries(repo->index);
    +-	status = p->fn(argc, argv, prefix, (p->option & RUN_SETUP)? repo : NULL);
    ++	exit_code = status = p->fn(argc, argv, prefix, (p->option & RUN_SETUP)? repo : NULL);
    + 	validate_cache_entries(repo->index);
      
      	if (status)
      		return status;
      
    -+	run_post_command_hook();
    ++	run_post_command_hook(the_repository);
     +
      	/* Somebody closed stdout? */
      	if (fstat(fileno(stdout), &st))
    @@ git.c: static void execv_dashed_external(const char **argv)
      	 */
      	trace_argv_printf(cmd.args.v, "trace: exec:");
      
    -+	if (run_pre_command_hook(cmd.args.v))
    ++	if (run_pre_command_hook(the_repository, cmd.args.v))
     +		die("pre-command hook aborted command");
     +
      	/*
    @@ git.c: static void execv_dashed_external(const char **argv)
      	else if (errno != ENOENT)
      		exit(128);
     +
    -+	run_post_command_hook();
    ++	run_post_command_hook(the_repository);
      }
      
      static int run_argv(int *argcp, const char ***argv)
    @@ git.c: int cmd_main(int argc, const char **argv)
      	if (!argc) {
      		/* The user didn't specify a command; give them help */
      		commit_pager_choice();
    -+		if (run_pre_command_hook(argv))
    ++		if (run_pre_command_hook(the_repository, argv))
     +			die("pre-command hook aborted command");
      		printf(_("usage: %s\n\n"), git_usage_string);
      		list_common_cmds_help();
      		printf("\n%s\n", _(git_more_info_string));
     -		exit(1);
     +		exit_code = 1;
    -+		run_post_command_hook();
    ++		run_post_command_hook(the_repository);
     +		exit(exit_code);
      	}
      
    @@ git.c: int cmd_main(int argc, const char **argv)
     
      ## hook.c ##
     @@
    ++#define USE_THE_REPOSITORY_VARIABLE
    ++
      #include "git-compat-util.h"
      #include "abspath.h"
     +#include "environment.h"
    @@ hook.c
      #include "setup.h"
      
     +static int early_hooks_path_config(const char *var, const char *value,
    -+				   const struct config_context *ctx, void *cb)
    ++				   const struct config_context *ctx UNUSED, void *cb)
     +{
     +	if (!strcmp(var, "core.hookspath"))
     +		return git_config_pathname((char **)cb, var, value);
    @@ hook.c
     +			return NULL;
     +		}
     +
    -+		read_early_config(early_hooks_path_config, &early_hooks_dir);
    ++		read_early_config(the_repository, early_hooks_path_config, &early_hooks_dir);
     +		if (!early_hooks_dir)
     +			strbuf_addf(&hooks_dir, "%s/hooks/", commondir.buf);
     +		else {
    @@ hook.c
     +	return result->buf;
     +}
     +
    - const char *find_hook(const char *name)
    + const char *find_hook(struct repository *r, const char *name)
      {
      	static struct strbuf path = STRBUF_INIT;
    -@@ hook.c: const char *find_hook(const char *name)
    +@@ hook.c: const char *find_hook(struct repository *r, const char *name)
      	int found_hook;
      
      	strbuf_reset(&path);
    --	strbuf_git_path(&path, "hooks/%s", name);
    +-	strbuf_repo_git_path(&path, r, "hooks/%s", name);
     +	if (have_git_dir())
    -+		strbuf_git_path(&path, "hooks/%s", name);
    ++		strbuf_repo_git_path(&path, r, "hooks/%s", name);
     +	else if (!hook_path_early(name, &path))
     +		return NULL;
     +
  • 39: 44d9df5 = 39: 348aec0 t0400: verify that the hook is called correctly from a subdirectory

  • 40: 0c9bf08 ! 40: af565b2 Pass PID of git process to hooks.

    @@ Commit message
         Signed-off-by: Alejandro Pauly <alpauly@microsoft.com>
     
      ## git.c ##
    -@@ git.c: static int run_pre_command_hook(const char **argv)
    +@@ git.c: static int run_pre_command_hook(struct repository *r, const char **argv)
      
      	/* call the hook proc */
      	strvec_pushv(&sargv, argv);
     +	strvec_pushf(&sargv, "--git-pid=%"PRIuMAX, (uintmax_t)getpid());
      	strvec_pushv(&opt.args, sargv.v);
    - 	ret = run_hooks_opt("pre-command", &opt);
    + 	ret = run_hooks_opt(r, "pre-command", &opt);
      
     
      ## t/t0400-pre-command-hook.sh ##
  • 41: 517213c ! 41: 0e0264e pre-command: always respect core.hooksPath

    @@ Commit message
         Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
     
      ## hook.c ##
    -@@ hook.c: const char *find_hook(const char *name)
    +@@ hook.c: const char *find_hook(struct repository *r, const char *name)
      	int found_hook;
      
      	strbuf_reset(&path);
    @@ hook.c: const char *find_hook(const char *name)
     +			forced_config = 1;
     +		}
     +
    - 		strbuf_git_path(&path, "hooks/%s", name);
    + 		strbuf_repo_git_path(&path, r, "hooks/%s", name);
     -	else if (!hook_path_early(name, &path))
     +	} else if (!hook_path_early(name, &path))
      		return NULL;
  • 42: 3248536 = 42: 2b67c6a sparse-checkout: update files with a modify/delete conflict

  • 43: a7cbc6b = 43: 3191f6e sparse-checkout: avoid writing entries with the skip-worktree bit

  • 44: e227ef3 = 44: 481bff4 Do not remove files outside the sparse-checkout

  • 45: f074ddd = 45: 31d7cc7 send-pack: do not check for sha1 file when GVFS_MISSING_OK set

  • 46: 32c6f9d = 46: d24c5ab cache-tree: remove use of strbuf_addf in update_one

  • 47: a5121b2 ! 47: f681e1a gvfs: block unsupported commands when running in a GVFS repo

    @@ Commit message
     
      ## builtin/gc.c ##
     @@
    + #include "date.h"
      #include "environment.h"
      #include "hex.h"
    - #include "repository.h"
     +#include "gvfs.h"
      #include "config.h"
      #include "tempfile.h"
      #include "lockfile.h"
    -@@ builtin/gc.c: int cmd_gc(int argc, const char **argv, const char *prefix)
    +@@ builtin/gc.c: struct repository *repo UNUSED)
      	if (quiet)
      		strvec_push(&repack, "-q");
      
    -+	if ((!opts.auto_flag || (opts.auto_flag && gc_auto_threshold > 0)) && gvfs_config_is_set(GVFS_BLOCK_COMMANDS))
    ++	if ((!opts.auto_flag || (opts.auto_flag && cfg.gc_auto_threshold > 0)) && gvfs_config_is_set(GVFS_BLOCK_COMMANDS))
     +		die(_("'git gc' is not supported on a GVFS repo"));
     +
      	if (opts.auto_flag) {
    - 		/*
    - 		 * Auto-gc should be least intrusive as possible.
    + 		if (cfg.detach_auto && opts.detach < 0)
    + 			opts.detach = 1;
     
      ## builtin/update-index.c ##
     @@
       */
    - 
    + #define USE_THE_REPOSITORY_VARIABLE
      #include "builtin.h"
     +#include "gvfs.h"
      #include "bulk-checkin.h"
      #include "config.h"
      #include "environment.h"
    -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
    +@@ builtin/update-index.c: int cmd_update_index(int argc,
      	argc = parse_options_end(&ctx);
      
      	getline_fn = nul_term_line ? strbuf_getline_nul : strbuf_getline_lf;
    @@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const
      		if (preferred_index_format < 0) {
      			printf(_("%d\n"), the_repository->index->version);
      		} else if (preferred_index_format < INDEX_FORMAT_LB ||
    -@@ builtin/update-index.c: int cmd_update_index(int argc, const char **argv, const char *prefix)
    +@@ builtin/update-index.c: int cmd_update_index(int argc,
      	end_odb_transaction();
      
      	if (split_index > 0) {
     +		if (gvfs_config_is_set(GVFS_BLOCK_COMMANDS))
     +			die(_("split index is not supported on a GVFS repo"));
     +
    - 		if (git_config_get_split_index() == 0)
    + 		if (repo_config_get_split_index(the_repository) == 0)
      			warning(_("core.splitIndex is set to false; "
      				  "remove or change it, if you really want to "
     
    @@ git.c
      
      struct cmd_struct {
      	const char *cmd;
    -@@ git.c: static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
    +@@ git.c: static int run_builtin(struct cmd_struct *p, int argc, const char **argv, struct
      	if (!help && p->option & NEED_WORK_TREE)
      		setup_work_tree();
      
     +	if (!help && p->option & BLOCK_ON_GVFS_REPO && gvfs_config_is_set(GVFS_BLOCK_COMMANDS))
     +		die("'git %s' is not supported on a GVFS repo", p->cmd);
     +
    - 	if (run_pre_command_hook(argv))
    + 	if (run_pre_command_hook(the_repository, argv))
      		die("pre-command hook aborted command");
      
     @@ git.c: static struct cmd_struct commands[] = {
  • 48: 83b6856 ! 48: 46eca79 worktree: allow in Scalar repositories

    @@ builtin/worktree.c
      #include "checkout.h"
      #include "config.h"
      #include "copy.h"
    -@@ builtin/worktree.c: int cmd_worktree(int ac, const char **av, const char *prefix)
    +@@ builtin/worktree.c: int cmd_worktree(int ac,
      
      	git_config(git_worktree_config, NULL);
      

To Be Continued...

pks-t and others added 30 commits October 8, 2024 15:38
It was reported on the mailing list that running `git maintenance start`
immediately segfaults starting with b6c3f8e (builtin/maintenance: fix
leak in `get_schedule_cmd()`, 2024-09-26). And indeed, this segfault is
trivial to reproduce up to a point where one is scratching their head
why we didn't catch this regression in our test suite.

The root cause of this error is `get_schedule_cmd()`, which does not
populate the `out` parameter in all cases anymore starting with the
mentioned commit. Callers do assume it to always be populated though and
will e.g. call `strvec_split()` on the returned value, which will of
course segfault when the variable is uninitialized.

So why didn't we catch this trivial regression? The reason is that our
tests always set up the "GIT_TEST_MAINT_SCHEDULER" environment variable
via "t/test-lib.sh", which allows us to override the scheduler command
with a custom one so that we don't accidentally modify the developer's
system. But the faulty code where we don't set the `out` parameter will
only get hit in case that environment variable is _not_ set, which is
never the case when executing our tests.

Fix the regression by again unconditionally allocating the value in the
`out` parameter, if provided. Add a test that unsets the environment
variable to catch future regressions in this area.

Reported-by: Shubham Kanodia <shubham.kanodia10@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
These seem to be custom tests to microsoft/git as they break without
these changes, but these changes are not needed upstream.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
In ac8acb4 (sparse-index: complete partial expansion, 2022-05-23),
'expand_index()' was updated to expand the index to a given pathspec.
However, the 'path_matches_pattern_list()' method used to facilitate this
has the side effect of initializing or updating the index hash variables
('name_hash', 'dir_hash', and 'name_hash_initialized'). This operation is
performed on 'istate', though, not 'full'; as a result, the initialized
hashes are later overwritten when copied from 'full'. To ensure the correct
hashes are in 'istate' after the index expansion, change the arg used in
'path_matches_pattern_list()' from 'istate' to 'full'.

Note that this does not fully solve the problem. If 'istate' does not have
an initialized 'name_hash' when its contents are copied to 'full',
initialized hashes will be copied back into 'istate' but
'name_hash_initialized' will be 0. Therefore, we also need to copy
'full->name_hash_initialized' back to 'istate' after the index expansion is
complete.

Signed-off-by: Victoria Dye <vdye@github.com>
Add test case to demonstrate that `git index-pack -o <idx-path> pack-path`
fails if <idx-path> does not end in ".idx" when `--rev-index` is
enabled.

In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25)
we learned to create `.rev` reverse indexes in addition to `.idx` index
files.  The `.rev` file pathname is constructed by replacing the suffix
on the `.idx` file.  The code assumes a hard-coded "idx" suffix.

In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12)
reverse indexes were enabled by default.

If the `-o <idx-path>` argument is used, the index file may have a
different suffix.  This causes an error when it tries to create the
reverse index pathname.

The test here demonstrates the failure.  (The test forces `--rev-index`
to avoid interaction with `GIT_TEST_NO_WRITE_REV_INDEX` during CI runs.)

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Add a test verifying that sparse-checkout (with and without sparse index
enabled) treat untracked files & directories correctly when changing sparse
patterns. Specifically, it ensures that 'git sparse-checkout set'

* deletes empty directories outside the sparse cone
* does _not_ delete untracked files outside the sparse cone

Signed-off-by: Victoria Dye <vdye@github.com>
Teach index-pack to silently omit the reverse index if the
index file does not have the standard ".idx" suffix.

In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25)
we learned to create `.rev` reverse indexes in addition to `.idx` index
files.  The `.rev` file pathname is constructed by replacing the suffix
on the `.idx` file.  The code assumes a hard-coded "idx" suffix.

In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12)
reverse indexes were enabled by default.

If the `-o <idx-path>` argument is used, the index file may have a
different suffix.  This causes an error when it tries to create the
reverse index pathname.

Since we do not know why the user requested a non-standard suffix for
the index, we cannot guess what the proper corresponding suffix should
be for the reverse index.  So we disable it.

The t5300 test has been updated to verify that we no longer error
out and that the .rev file is not created.

TODO We could warn the user that we skipped it (perhaps only if they
TODO explicitly requested `--rev-index` on the command line).
TODO
TODO Ideally, we should add an `--rev-index-path=<path>` argument
TODO or change `--rev-index` to take a pathname.
TODO
TODO I'll leave these questions for a future series.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before
we try to open any Trace2 destination pathnames.

Normally, Trace2 always silently fails if a destination target
cannot be opened so that it doesn't affect the execution of a
Git command.  The command should run normally, but just not
generate any trace data.  This can make it difficult to debug
a telemetry setup, since the user doesn't know why telemetry
isn't being generated.  If the environment variable
GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print
a warning message with the `errno` to make debugging easier.

However, on Windows, looking up the env variable resets `errno`
so the warning message always ends with `...tracing: No error`
which is not very helpful.

Prefetch the env variable at startup.  This avoids the need
to update each call-site to capture `errno` in the usual
`saved-errno` variable.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Calculate the number of symrefs, loose vs packed, and the
maximal/accumulated length of local vs remote branches.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
survey: add --top=<N> option and config

The 'git survey' builtin provides several detail tables, such as "top
files by on-disk size". The size of these tables defaults to 10,
currently.

Allow the user to specify this number via a new --top=<N> option or the
new survey.top config key.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this commit, we gather statistics about the sizes of commits,
trees, and blobs in the repository, and then present them in the form
of "hexbins", i.e. log(16) histograms that show how many objects fall
into the 0..15 bytes range, the 16..255 range, the 256..4095 range, etc.

For commits, we also show the total count grouped by the number of
parents, and for trees we additionally show the total count grouped by
number of entries in the form of "qbins", i.e. log(4) histograms.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Create `struct large_item` and `struct large_item_vec` to capture the
n largest commits, trees, and blobs under various scaling dimensions,
such as size in bytes, number of commit parents, or number of entries
in a tree.

Each of these have a command line option to set them independently.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Include the pathname of each blob or tree in the large_item_vec
to help identify the file or directory associated with the OID
and size information.

This pathname is computed during the path walk, so it reflects the
first observed pathname seen for that OID during the traversal over
all of the refs.  Since the file or directory could have moved
(without being modified), there may be multiple "correct" pathnames
for a particular OID.  Since we do not control the ref traversal
order, we should consider it to be a "suggested pathname" for the OID.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Computing `git name-rev` on each commit, tree, and blob in each
of the various large_item_vec can be very expensive if there are
too many refs, especially if the user doesn't need the result.
Lets make it optional.

The `--no-name-rev` option can save 50 calls to `git name-rev`
since we have 5 large_item_vec's and each defaults to 10 items.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
This topic branch brings in a new, experimental built-in command to
assess the dimensions of a local repository.

It is experimental and subject to change! It might grow new options,
change its output, or even be moved into `git diagnose --analyze` or
something like that.

The hope is that this command, which was inspired by `git sizer`
(https://github.com/github/git-sizer), will be helpful not only in
diagnosing issues with large repositories, but also in modeling what
shapes and sizes of repositories can be handled by Git (and as a
corollary: where Git needs to improve to be able to accommodate the
natural growth of repositories).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This backports the `ds/advice-sparse-index-expansion` patches into
`microsoft/git` which _just_ missed the v2.46.0 window.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Cherry-pick rev-index fixes from v2.41.0.vfs.0.5 into v2.42.0.*
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we
try to open any Trace2 destination pathnames.

Normally, Trace2 always silently fails if a destination target cannot be
opened so that it doesn't affect the execution of a Git command. The
command should run normally, but just not generate any trace data. This
can make it difficult to debug a telemetry setup, since the user doesn't
know why telemetry isn't being generated. If the environment variable
GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning
message with the `errno` to make debugging easier.

However, on Windows, looking up the env variable resets `errno` so the
warning message always ends with `...tracing: No error` which is not
very helpful.

Prefetch the env variable at startup. This avoids the need to update
each call-site to capture `errno` in the usual `saved-errno` variable.
…sitories (#667)

This command is inspired by [`git
sizer`](https://github.com/github/git-sizer), having the advantage of
being much closer to the internals of Git.

The intention is to provide a built-in command that can be used to
analyze large repositories for performance and scaling problems, for
growth over time, and to correlate with other measurements (in
particular with Trace2 data collected e.g. via
https://github.com/git-ecosystem/trace2receiver/).
While using the reset --stdin feature on windows path added may have a
\r at the end of the path that wasn't getting removed so didn't match
the path in the index and wasn't reset.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
It has been a long-standing practice in Git for Windows to append
`.windows.<n>`, and in microsoft/git to append `.vfs.0.0`. Let's keep
doing that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since we really want to be based on a `.vfs.*` tag, let's make sure that
there was a new-enough one, i.e. one that agrees with the first three
version numbers of the recorded default version.

This prevents e.g. v2.22.0.vfs.0.<some-huge-number>.<commit> from being
used when the current release train was not yet tagged.

It is important to get the first three numbers of the version right
because e.g. Scalar makes decisions depending on those (such as assuming
that the `git maintenance` built-in is not available, even though it
actually _is_ available).

Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
This header file will accumulate GVFS-specific definitions.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
In particular when multiple processes want to write to the config
simultaneously, it would come in handy to not fail immediately when
another process locked the config, but to gently try again.

This will help with Scalar's functional test suite which wants to
register multiple repositories for maintenance semi-simultaneously.

As not all code paths calling this function read the config (e.g. `git
config`), we have to read the config setting via
`git_config_get_ulong()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
By default, Git fails immediately when locking a config file for writing
fails due to an existing lock. With this change, Scalar-registered
repositories will fall back to trying a couple times within a 150ms
timeout.

Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
This is needed because on macOS, `uintmax_t` is `unsigned long`, whereas
`uint64_t` is `unsigned long long`... Tsk, tsk.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
D'oh. Must end the varargs with a `NULL`...

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a nasty rebase, but I agree that the three reasons are easy to identify.

I created my own tentative/vfs-2.47.0 in my fork with two fixups that will fix some bugs in the current version:

builtin/survey.c Outdated
Comment on lines 1561 to 1562


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hyper-nit: double newline

@derrickstolee
Copy link
Collaborator

Also be sure to cherry-pick 6138534 from git-for-windows#5198 or else all scalar clone commands will fail when git maintenance start segfaults.

@derrickstolee
Copy link
Collaborator

derrickstolee commented Oct 9, 2024

Also be sure to cherry-pick 6138534 from git-for-windows#5198 or else all scalar clone commands will fail when git maintenance start segfaults.

Of course, that commit doesn't pass the test on Windows. One option would be to cut the test from the patch and let upstream work out that test. Update: I figured out a way to make the test not run on macOS and Windows. See the PR for details.

derrickstolee and others added 15 commits October 8, 2024 23:00
Signed-off-by: Derrick Stolee <stolee@gmail.com>
We must ensure that the path to the index file is available when
validating the index.

This fixes t7522.{2,3,4,5,7,8,9,10,11,12,13}.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This commit adds a double empty line. Since this line has been removed
later on, in another commit, let's add it back so that we can fix up
the commit properly.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Only add a single empty line, not two of 'em.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This backports git-for-windows#5198 to
`microsoft/git` so that v2.47.0 will have it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
It is sometimes difficult to support users who are hitting issues with
sparse index expansion because it is unclear why the index needs to expand
from logs alone. It is too invasive to set up a debugging scenario on the
user's machine, so let's improve the logging.

Create a new ensure_full_index_with_reason() method that takes a formatting
string and parameters. If the index is not fully expanded, then apply the
formatting logic to create the logged string and log it before calling
ensure_full_index(). This should assist with discovering why an index is
expanded from trace2 logs alone.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These locations that previously called ensure_full_index() are now
updated to call the ..._with_reason() varation using fixed strings that
should be enough to identify the reason for the expansion.

This will help users use tracing to determine why the index is expanding
in their scenarios.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
These cases that call ensure_full_index() are likely to be due to a data
shape issue on a user's machine, so take the extra time to format a
message that can be placed in their trace2 output and hopefully identify
the problem that is leading to this slow behavior.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
For safety, areas of code that iterate over the cache entries in the
index were guarded with ensure_full_index() and labeled with a comment.
Replace these with a macro that calls ensure_full_index_with_reason()
using the line number of the caller to help identify the situation that
is causing the index expansion.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
The recent changes update the callers of ensure_full_index() to call
variants that will log extra information. This should assist developers
assisting users who are hitting the sparse index expansion message.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The clear_skip_worktree_from_present_files_sparse() method attempts to
clear the skip worktree bit from cache entries in the index depending on
when they exist in the workdir. When this comes across a sparse
directory that actually exists in the workdir, then this method fails
and signals that the index needs expansion.

The index expansion already logs a reason, but this reason is separate
from the path that caused this failure.

Add logging to demonstrate this situation for full clarity.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
I will intend to send this upstream after the 2.47.0 release cycle, but
this should get to our microsoft/git users for maximum impact.

Customers have been struggling with explaining why the sparse index
expansion advice message is showing up. The advice to run 'git clean'
has not always helped folks, and sometimes it is very unclear why we are
running into trouble.

These changes introduce a way to log a reason for the expansion into the
trace2 logs so it can be found by requesting that a user enable tracing.

While testing this, I created the most standard case that happens, which
is to have an existing directory match a sparse directory in the index.
In this case, it showed that two log messages were required. See the
last commit for this new log message. Together, these two places show
this kind of message in the `GIT_TRACE2_PERF` output (trimmed for
clarity):

```
region_enter | index        | label:clear_skip_worktree_from_present_files_sparse
data         | sparse-index | ..skip-worktree sparsedir:<my-sparse-path>/
data         | index        | ..sparse_path_count:362
data         | index        | ..sparse_lstat_count:732
region_leave | index        | label:clear_skip_worktree_from_present_files_sparse
data         | sparse-index | expansion-reason:failed to clear skip-worktree while sparse
```

I added some tests to demonstrate that these logs are recorded, but it
also seems difficult to hit some of these cases.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
An internal customer reported a segfault when running `git
sparse-checkout set` with the `index.sparse` config enabled. I was
unable to reproduce it locally, but with their help we debugged into the
failing process and discovered the following stacktrace:

```
#0  0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125
#1  0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247
#2  0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122
#3  0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638
#4  0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255
#5  0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570)    at sparse-index.c:307
#6  0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46
#7  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#8  0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0,    fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80
#9  0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422
#10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456
#11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0,    search_mode=EXPAND_SPARSE) at read-cache.c:556
#12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566
#13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756
#14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860
#15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063
#16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548
#17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808
#18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877
#19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017
#20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 
```

The very bottom of the stack being the `rehash()` method from
`hashmap.c` as called within the `name-hash` API made me look at where
these hashmaps were being used in the sparse index logic. These were
being copied across indexes, which seems dangerous. Indeed, clearing
these hashmaps and setting them as not initialized fixes the segfault.

The second commit is a response to a test failure that happens in
`t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to
fail because the underlying `git checkout-index` process fails due to
colliding files. Passing the `-f` flag appears to work, but it's unclear
why this name-hash change causes that change in behavior.
Copy link
Collaborator

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this on my Linux machine. I look forward to testing the draft release bits on Windows (and a little on macOS).

@dscho dscho merged commit 26ff51d into vfs-2.47.0 Oct 9, 2024
119 checks passed
@dscho dscho deleted the tentative/vfs-2.47.0 branch October 9, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.