diff --git a/.gitignore b/.gitignore index ea4054e74a9179..b12472ee23fc0a 100644 --- a/.gitignore +++ b/.gitignore @@ -34,6 +34,7 @@ /git-clone /git-column /git-commit +/git-commit-graph /git-commit-tree /git-config /git-count-objects diff --git a/Documentation/config.txt b/Documentation/config.txt index 883d702a939e96..67c2f6ca032e55 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -994,6 +994,9 @@ core.gvfs:: core.midx:: Enable "multi-pack-index" feature. Set to true to read and write MIDX files. +core.commitGraph:: + Enable git commit graph feature. Allows reading from the commit-graph file. + core.sparseCheckout:: Enable "sparse checkout" feature. See section "Sparse checkout" in linkgit:git-read-tree[1] for more information. diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt new file mode 100644 index 00000000000000..c21fcbff9011a2 --- /dev/null +++ b/Documentation/git-commit-graph.txt @@ -0,0 +1,95 @@ +git-commit-graph(1) +=================== + +NAME +---- +git-commit-graph - Write and verify Git commit graph files + + +SYNOPSIS +-------- +[verse] +'git commit-graph read' [--object-dir ] +'git commit-graph write' [--object-dir ] [--append] + [--stdin-commits|--stdin-packs] + + +DESCRIPTION +----------- + +Manage the serialized commit graph file. + + +OPTIONS +------- +--object-dir:: + Use given directory for the location of packfiles and commit graph + file. This parameter exists to specify the location of an alternate + that only has the objects directory, not a full .git directory. The + commit graph file is expected to be at /info/commit-graph and + the packfiles are expected to be in /pack. + + +COMMANDS +-------- +'write':: + +Write a commit graph file based on the commits found in packfiles. ++ +With the `--stdin-packs` option, generate the new commit graph by +walking objects only in the specified pack-indexes. (Cannot be combined +with --stdin-commits.) ++ +With the `--stdin-commits` option, generate the new commit graph by +walking commits starting at the commits specified in stdin as a list +of OIDs in hex, one OID per line. (Cannot be combined with +--stdin-packs.) ++ +With the `--append` option, include all commits that are present in the +existing commit-graph file. + +'read':: + +Read a graph file given by the commit-graph file and output basic +details about the graph file. Used for debugging purposes. + + +EXAMPLES +-------- + +* Write a commit graph file for the packed commits in your local .git folder. ++ +------------------------------------------------ +$ git commit-graph write +------------------------------------------------ + +* Write a graph file, extending the current graph file using commits +* in . ++ +------------------------------------------------ +$ echo | git commit-graph write --stdin-packs +------------------------------------------------ + +* Write a graph file containing all reachable commits. ++ +------------------------------------------------ +$ git show-ref -s | git commit-graph write --stdin-commits +------------------------------------------------ + +* Write a graph file containing all commits in the current +* commit-graph file along with those reachable from HEAD. ++ +------------------------------------------------ +$ git rev-parse HEAD | git commit-graph write --stdin-commits --append +------------------------------------------------ + +* Read basic information from the commit-graph file. ++ +------------------------------------------------ +$ git commit-graph read +------------------------------------------------ + + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt new file mode 100644 index 00000000000000..ad6af8105cf68e --- /dev/null +++ b/Documentation/technical/commit-graph-format.txt @@ -0,0 +1,97 @@ +Git commit graph format +======================= + +The Git commit graph stores a list of commit OIDs and some associated +metadata, including: + +- The generation number of the commit. Commits with no parents have + generation number 1; commits with parents have generation number + one more than the maximum generation number of its parents. We + reserve zero as special, and can be used to mark a generation + number invalid or as "not computed". + +- The root tree OID. + +- The commit date. + +- The parents of the commit, stored using positional references within + the graph file. + +These positional references are stored as unsigned 32-bit integers +corresponding to the array position withing the list of commit OIDs. We +use the most-significant bit for special purposes, so we can store at most +(1 << 31) - 1 (around 2 billion) commits. + +== Commit graph files have the following format: + +In order to allow extensions that add extra data to the graph, we organize +the body into "chunks" and provide a binary lookup table at the beginning +of the body. The header includes certain values, such as number of chunks +and hash type. + +All 4-byte numbers are in network order. + +HEADER: + + 4-byte signature: + The signature is: {'C', 'G', 'P', 'H'} + + 1-byte version number: + Currently, the only valid version is 1. + + 1-byte Hash Version (1 = SHA-1) + We infer the hash length (H) from this value. + + 1-byte number (C) of "chunks" + + 1-byte (reserved for later use) + Current clients should ignore this value. + +CHUNK LOOKUP: + + (C + 1) * 12 bytes listing the table of contents for the chunks: + First 4 bytes describe the chunk id. Value 0 is a terminating label. + Other 8 bytes provide the byte-offset in current file for chunk to + start. (Chunks are ordered contiguously in the file, so you can infer + the length using the next chunk position if necessary.) Each chunk + ID appears at most once. + + The remaining data in the body is described one chunk at a time, and + these chunks may be given in any order. Chunks are required unless + otherwise specified. + +CHUNK DATA: + + OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes) + The ith entry, F[i], stores the number of OIDs with first + byte at most i. Thus F[255] stores the total + number of commits (N). + + OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes) + The OIDs for all commits in the graph, sorted in ascending order. + + Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes) + * The first H bytes are for the OID of the root tree. + * The next 8 bytes are for the positions of the first two parents + of the ith commit. Stores value 0xffffffff if no parent in that + position. If there are more than two parents, the second value + has its most-significant bit on and the other bits store an array + position into the Large Edge List chunk. + * The next 8 bytes store the generation number of the commit and + the commit time in seconds since EPOCH. The generation number + uses the higher 30 bits of the first 4 bytes, while the commit + time uses the 32 bits of the second 4 bytes, along with the lowest + 2 bits of the lowest byte, storing the 33rd and 34th bit of the + commit time. + + Large Edge List (ID: {'E', 'D', 'G', 'E'}) [Optional] + This list of 4-byte values store the second through nth parents for + all octopus merges. The second parent value in the commit data stores + an array position within this list along with the most-significant bit + on. Starting at that array position, iterate through this list of commit + positions for the parents until reaching a value with the most-significant + bit on. The other bits correspond to the position of the last parent. + +TRAILER: + + H-byte HASH-checksum of all of the above. diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt new file mode 100644 index 00000000000000..0550c6d0dc3173 --- /dev/null +++ b/Documentation/technical/commit-graph.txt @@ -0,0 +1,163 @@ +Git Commit Graph Design Notes +============================= + +Git walks the commit graph for many reasons, including: + +1. Listing and filtering commit history. +2. Computing merge bases. + +These operations can become slow as the commit count grows. The merge +base calculation shows up in many user-facing commands, such as 'merge-base' +or 'status' and can take minutes to compute depending on history shape. + +There are two main costs here: + +1. Decompressing and parsing commits. +2. Walking the entire graph to satisfy topological order constraints. + +The commit graph file is a supplemental data structure that accelerates +commit graph walks. If a user downgrades or disables the 'core.commitGraph' +config setting, then the existing ODB is sufficient. The file is stored +as "commit-graph" either in the .git/objects/info directory or in the info +directory of an alternate. + +The commit graph file stores the commit graph structure along with some +extra metadata to speed up graph walks. By listing commit OIDs in lexi- +cographic order, we can identify an integer position for each commit and +refer to the parents of a commit using those integer positions. We use +binary search to find initial commits and then use the integer positions +for fast lookups during the walk. + +A consumer may load the following info for a commit from the graph: + +1. The commit OID. +2. The list of parents, along with their integer position. +3. The commit date. +4. The root tree OID. +5. The generation number (see definition below). + +Values 1-4 satisfy the requirements of parse_commit_gently(). + +Define the "generation number" of a commit recursively as follows: + + * A commit with no parents (a root commit) has generation number one. + + * A commit with at least one parent has generation number one more than + the largest generation number among its parents. + +Equivalently, the generation number of a commit A is one more than the +length of a longest path from A to a root commit. The recursive definition +is easier to use for computation and observing the following property: + + If A and B are commits with generation numbers N and M, respectively, + and N <= M, then A cannot reach B. That is, we know without searching + that B is not an ancestor of A because it is further from a root commit + than A. + + Conversely, when checking if A is an ancestor of B, then we only need + to walk commits until all commits on the walk boundary have generation + number at most N. If we walk commits using a priority queue seeded by + generation numbers, then we always expand the boundary commit with highest + generation number and can easily detect the stopping condition. + +This property can be used to significantly reduce the time it takes to +walk commits and determine topological relationships. Without generation +numbers, the general heuristic is the following: + + If A and B are commits with commit time X and Y, respectively, and + X < Y, then A _probably_ cannot reach B. + +This heuristic is currently used whenever the computation is allowed to +violate topological relationships due to clock skew (such as "git log" +with default order), but is not used when the topological order is +required (such as merge base calculations, "git log --graph"). + +In practice, we expect some commits to be created recently and not stored +in the commit graph. We can treat these commits as having "infinite" +generation number and walk until reaching commits with known generation +number. + +Design Details +-------------- + +- The commit graph file is stored in a file named 'commit-graph' in the + .git/objects/info directory. This could be stored in the info directory + of an alternate. + +- The core.commitGraph config setting must be on to consume graph files. + +- The file format includes parameters for the object ID hash function, + so a future change of hash algorithm does not require a change in format. + +Future Work +----------- + +- The commit graph feature currently does not honor commit grafts. This can + be remedied by duplicating or refactoring the current graft logic. + +- The 'commit-graph' subcommand does not have a "verify" mode that is + necessary for integration with fsck. + +- The file format includes room for precomputed generation numbers. These + are not currently computed, so all generation numbers will be marked as + 0 (or "uncomputed"). A later patch will include this calculation. + +- After computing and storing generation numbers, we must make graph + walks aware of generation numbers to gain the performance benefits they + enable. This will mostly be accomplished by swapping a commit-date-ordered + priority queue with one ordered by generation number. The following + operations are important candidates: + + - paint_down_to_common() + - 'log --topo-order' + +- Currently, parse_commit_gently() requires filling in the root tree + object for a commit. This passes through lookup_tree() and consequently + lookup_object(). Also, it calls lookup_commit() when loading the parents. + These method calls check the ODB for object existence, even if the + consumer does not need the content. For example, we do not need the + tree contents when computing merge bases. Now that commit parsing is + removed from the computation time, these lookup operations are the + slowest operations keeping graph walks from being fast. Consider + loading these objects without verifying their existence in the ODB and + only loading them fully when consumers need them. Consider a method + such as "ensure_tree_loaded(commit)" that fully loads a tree before + using commit->tree. + +- The current design uses the 'commit-graph' subcommand to generate the graph. + When this feature stabilizes enough to recommend to most users, we should + add automatic graph writes to common operations that create many commits. + For example, one could compute a graph on 'clone', 'fetch', or 'repack' + commands. + +- A server could provide a commit graph file as part of the network protocol + to avoid extra calculations by clients. This feature is only of benefit if + the user is willing to trust the file, because verifying the file is correct + is as hard as computing it from scratch. + +Related Links +------------- +[0] https://bugs.chromium.org/p/git/issues/detail?id=8 + Chromium work item for: Serialized Commit Graph + +[1] https://public-inbox.org/git/20110713070517.GC18566@sigill.intra.peff.net/ + An abandoned patch that introduced generation numbers. + +[2] https://public-inbox.org/git/20170908033403.q7e6dj7benasrjes@sigill.intra.peff.net/ + Discussion about generation numbers on commits and how they interact + with fsck. + +[3] https://public-inbox.org/git/20170908034739.4op3w4f2ma5s65ku@sigill.intra.peff.net/ + More discussion about generation numbers and not storing them inside + commit objects. A valuable quote: + + "I think we should be moving more in the direction of keeping + repo-local caches for optimizations. Reachability bitmaps have been + a big performance win. I think we should be doing the same with our + properties of commits. Not just generation numbers, but making it + cheap to access the graph structure without zlib-inflating whole + commit objects (i.e., packv4 or something like the "metapacks" I + proposed a few years ago)." + +[4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u + A patch to remove the ahead-behind calculation from 'status'. diff --git a/Makefile b/Makefile index f4e93eb30d7966..7f76d2286365ac 100644 --- a/Makefile +++ b/Makefile @@ -775,6 +775,7 @@ LIB_OBJS += color.o LIB_OBJS += column.o LIB_OBJS += combine-diff.o LIB_OBJS += commit.o +LIB_OBJS += commit-graph.o LIB_OBJS += compat/obstack.o LIB_OBJS += compat/terminal.o LIB_OBJS += config.o @@ -955,6 +956,7 @@ BUILTIN_OBJS += builtin/clone.o BUILTIN_OBJS += builtin/column.o BUILTIN_OBJS += builtin/commit-tree.o BUILTIN_OBJS += builtin/commit.o +BUILTIN_OBJS += builtin/commit-graph.o BUILTIN_OBJS += builtin/config.o BUILTIN_OBJS += builtin/count-objects.o BUILTIN_OBJS += builtin/credential.o diff --git a/alloc.c b/alloc.c index 12afadfacdd609..cf4f8b61e126c0 100644 --- a/alloc.c +++ b/alloc.c @@ -93,6 +93,7 @@ void *alloc_commit_node(void) struct commit *c = alloc_node(&commit_state, sizeof(struct commit)); c->object.type = OBJ_COMMIT; c->index = alloc_commit_index(); + c->graph_pos = COMMIT_NOT_FROM_GRAPH; return c; } diff --git a/builtin.h b/builtin.h index 880383e341472d..a1f587124e304e 100644 --- a/builtin.h +++ b/builtin.h @@ -149,6 +149,7 @@ extern int cmd_clone(int argc, const char **argv, const char *prefix); extern int cmd_clean(int argc, const char **argv, const char *prefix); extern int cmd_column(int argc, const char **argv, const char *prefix); extern int cmd_commit(int argc, const char **argv, const char *prefix); +extern int cmd_commit_graph(int argc, const char **argv, const char *prefix); extern int cmd_commit_tree(int argc, const char **argv, const char *prefix); extern int cmd_config(int argc, const char **argv, const char *prefix); extern int cmd_count_objects(int argc, const char **argv, const char *prefix); diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c new file mode 100644 index 00000000000000..41c4f76caf8cf0 --- /dev/null +++ b/builtin/commit-graph.c @@ -0,0 +1,171 @@ +#include "builtin.h" +#include "config.h" +#include "dir.h" +#include "lockfile.h" +#include "parse-options.h" +#include "commit-graph.h" + +static char const * const builtin_commit_graph_usage[] = { + N_("git commit-graph [--object-dir ]"), + N_("git commit-graph read [--object-dir ]"), + N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), + NULL +}; + +static const char * const builtin_commit_graph_read_usage[] = { + N_("git commit-graph read [--object-dir ]"), + NULL +}; + +static const char * const builtin_commit_graph_write_usage[] = { + N_("git commit-graph write [--object-dir ] [--append] [--stdin-packs|--stdin-commits]"), + NULL +}; + +static struct opts_commit_graph { + const char *obj_dir; + int stdin_packs; + int stdin_commits; + int append; +} opts; + +static int graph_read(int argc, const char **argv) +{ + struct commit_graph *graph = 0; + char *graph_name; + + static struct option builtin_commit_graph_read_options[] = { + OPT_STRING(0, "object-dir", &opts.obj_dir, + N_("dir"), + N_("The object directory to store the graph")), + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, + builtin_commit_graph_read_options, + builtin_commit_graph_read_usage, 0); + + if (!opts.obj_dir) + opts.obj_dir = get_object_directory(); + + graph_name = get_commit_graph_filename(opts.obj_dir); + graph = load_commit_graph_one(graph_name); + + if (!graph) + die("graph file %s does not exist", graph_name); + FREE_AND_NULL(graph_name); + + printf("header: %08x %d %d %d %d\n", + ntohl(*(uint32_t*)graph->data), + *(unsigned char*)(graph->data + 4), + *(unsigned char*)(graph->data + 5), + *(unsigned char*)(graph->data + 6), + *(unsigned char*)(graph->data + 7)); + printf("num_commits: %u\n", graph->num_commits); + printf("chunks:"); + + if (graph->chunk_oid_fanout) + printf(" oid_fanout"); + if (graph->chunk_oid_lookup) + printf(" oid_lookup"); + if (graph->chunk_commit_data) + printf(" commit_metadata"); + if (graph->chunk_large_edges) + printf(" large_edges"); + printf("\n"); + + return 0; +} + +static int graph_write(int argc, const char **argv) +{ + const char **pack_indexes = NULL; + int packs_nr = 0; + const char **commit_hex = NULL; + int commits_nr = 0; + const char **lines = NULL; + int lines_nr = 0; + int lines_alloc = 0; + + static struct option builtin_commit_graph_write_options[] = { + OPT_STRING(0, "object-dir", &opts.obj_dir, + N_("dir"), + N_("The object directory to store the graph")), + OPT_BOOL(0, "stdin-packs", &opts.stdin_packs, + N_("scan pack-indexes listed by stdin for commits")), + OPT_BOOL(0, "stdin-commits", &opts.stdin_commits, + N_("start walk at commits listed by stdin")), + OPT_BOOL(0, "append", &opts.append, + N_("include all commits already in the commit-graph file")), + OPT_END(), + }; + + argc = parse_options(argc, argv, NULL, + builtin_commit_graph_write_options, + builtin_commit_graph_write_usage, 0); + + if (opts.stdin_packs && opts.stdin_commits) + die(_("cannot use both --stdin-commits and --stdin-packs")); + if (!opts.obj_dir) + opts.obj_dir = get_object_directory(); + + if (opts.stdin_packs || opts.stdin_commits) { + struct strbuf buf = STRBUF_INIT; + lines_nr = 0; + lines_alloc = 128; + ALLOC_ARRAY(lines, lines_alloc); + + while (strbuf_getline(&buf, stdin) != EOF) { + ALLOC_GROW(lines, lines_nr + 1, lines_alloc); + lines[lines_nr++] = strbuf_detach(&buf, NULL); + } + + if (opts.stdin_packs) { + pack_indexes = lines; + packs_nr = lines_nr; + } + if (opts.stdin_commits) { + commit_hex = lines; + commits_nr = lines_nr; + } + } + + write_commit_graph(opts.obj_dir, + pack_indexes, + packs_nr, + commit_hex, + commits_nr, + opts.append); + + return 0; +} + +int cmd_commit_graph(int argc, const char **argv, const char *prefix) +{ + static struct option builtin_commit_graph_options[] = { + OPT_STRING(0, "object-dir", &opts.obj_dir, + N_("dir"), + N_("The object directory to store the graph")), + OPT_END(), + }; + + if (argc == 2 && !strcmp(argv[1], "-h")) + usage_with_options(builtin_commit_graph_usage, + builtin_commit_graph_options); + + git_config(git_default_config, NULL); + argc = parse_options(argc, argv, prefix, + builtin_commit_graph_options, + builtin_commit_graph_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + + if (argc > 0) { + if (!strcmp(argv[0], "read")) + return graph_read(argc, argv); + if (!strcmp(argv[0], "write")) + return graph_write(argc, argv); + } + + usage_with_options(builtin_commit_graph_usage, + builtin_commit_graph_options); +} diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 4c51aec81f374d..901650d3d603c6 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1270,7 +1270,7 @@ static void conclude_pack(int fix_thin_pack, const char *curr_pack, unsigned cha nr_objects - nr_objects_initial); stop_progress_msg(&progress, msg.buf); strbuf_release(&msg); - sha1close(f, tail_sha1, 0); + finalize_hashfile(f, tail_sha1, 0); hashcpy(read_sha1, pack_sha1); fixup_pack_header_footer(output_fd, pack_sha1, curr_pack, nr_objects, diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 6b9cfc289d87b5..e39f9f17b657b8 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -834,11 +834,11 @@ static void write_pack_file(void) * If so, rewrite it like in fast-import */ if (pack_to_stdout) { - sha1close(f, oid.hash, CSUM_CLOSE); + finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE); } else if (nr_written == nr_remaining) { - sha1close(f, oid.hash, CSUM_FSYNC); + finalize_hashfile(f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE); } else { - int fd = sha1close(f, oid.hash, 0); + int fd = finalize_hashfile(f, oid.hash, 0); fixup_pack_header_footer(fd, oid.hash, pack_tmp_name, nr_written, oid.hash, offset); close(fd); diff --git a/bulk-checkin.c b/bulk-checkin.c index 3310fd210a1515..7b9962b6b7f7e5 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -35,9 +35,9 @@ static void finish_bulk_checkin(struct bulk_checkin_state *state) unlink(state->pack_tmp_name); goto clear_exit; } else if (state->nr_written == 1) { - sha1close(state->f, oid.hash, CSUM_FSYNC); + finalize_hashfile(state->f, oid.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE); } else { - int fd = sha1close(state->f, oid.hash, 0); + int fd = finalize_hashfile(state->f, oid.hash, 0); fixup_pack_header_footer(fd, oid.hash, state->pack_tmp_name, state->nr_written, oid.hash, state->offset); diff --git a/cache.h b/cache.h index 04de11c7b3ad47..3cae7607e63120 100644 --- a/cache.h +++ b/cache.h @@ -848,6 +848,7 @@ extern char *git_replace_ref_base; extern int fsync_object_files; extern int core_preload_index; +extern int core_commit_graph; extern int core_apply_sparse_checkout; extern int core_gvfs; extern int core_midx; diff --git a/command-list.txt b/command-list.txt index a7b9412182de46..4e41de46c62014 100644 --- a/command-list.txt +++ b/command-list.txt @@ -34,6 +34,7 @@ git-clean mainporcelain git-clone mainporcelain init git-column purehelpers git-commit mainporcelain history +git-commit-graph plumbingmanipulators git-commit-tree plumbingmanipulators git-config ancillarymanipulators git-count-objects ancillaryinterrogators diff --git a/commit-graph.c b/commit-graph.c new file mode 100644 index 00000000000000..d5241bf66af0a2 --- /dev/null +++ b/commit-graph.c @@ -0,0 +1,735 @@ +#include "cache.h" +#include "config.h" +#include "git-compat-util.h" +#include "lockfile.h" +#include "pack.h" +#include "packfile.h" +#include "commit.h" +#include "object.h" +#include "revision.h" +#include "sha1-lookup.h" +#include "commit-graph.h" + +#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */ +#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */ +#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */ +#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */ +#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */ + +#define GRAPH_DATA_WIDTH 36 + +#define GRAPH_VERSION_1 0x1 +#define GRAPH_VERSION GRAPH_VERSION_1 + +#define GRAPH_OID_VERSION_SHA1 1 +#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ +#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1 +#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1 + +#define GRAPH_OCTOPUS_EDGES_NEEDED 0x80000000 +#define GRAPH_PARENT_MISSING 0x7fffffff +#define GRAPH_EDGE_LAST_MASK 0x7fffffff +#define GRAPH_PARENT_NONE 0x70000000 + +#define GRAPH_LAST_EDGE 0x80000000 + +#define GRAPH_FANOUT_SIZE (4 * 256) +#define GRAPH_CHUNKLOOKUP_WIDTH 12 +#define GRAPH_MIN_SIZE (5 * GRAPH_CHUNKLOOKUP_WIDTH + GRAPH_FANOUT_SIZE + \ + GRAPH_OID_LEN + 8) + +char *get_commit_graph_filename(const char *obj_dir) +{ + return xstrfmt("%s/info/commit-graph", obj_dir); +} + +static struct commit_graph *alloc_commit_graph(void) +{ + struct commit_graph *g = xcalloc(1, sizeof(*g)); + g->graph_fd = -1; + + return g; +} + +struct commit_graph *load_commit_graph_one(const char *graph_file) +{ + void *graph_map; + const unsigned char *data, *chunk_lookup; + size_t graph_size; + struct stat st; + uint32_t i; + struct commit_graph *graph; + int fd = git_open(graph_file); + uint64_t last_chunk_offset; + uint32_t last_chunk_id; + uint32_t graph_signature; + unsigned char graph_version, hash_version; + + if (fd < 0) + return NULL; + if (fstat(fd, &st)) { + close(fd); + return NULL; + } + graph_size = xsize_t(st.st_size); + + if (graph_size < GRAPH_MIN_SIZE) { + close(fd); + die("graph file %s is too small", graph_file); + } + graph_map = xmmap(NULL, graph_size, PROT_READ, MAP_PRIVATE, fd, 0); + data = (const unsigned char *)graph_map; + + graph_signature = get_be32(data); + if (graph_signature != GRAPH_SIGNATURE) { + error("graph signature %X does not match signature %X", + graph_signature, GRAPH_SIGNATURE); + goto cleanup_fail; + } + + graph_version = *(unsigned char*)(data + 4); + if (graph_version != GRAPH_VERSION) { + error("graph version %X does not match version %X", + graph_version, GRAPH_VERSION); + goto cleanup_fail; + } + + hash_version = *(unsigned char*)(data + 5); + if (hash_version != GRAPH_OID_VERSION) { + error("hash version %X does not match version %X", + hash_version, GRAPH_OID_VERSION); + goto cleanup_fail; + } + + graph = alloc_commit_graph(); + + graph->hash_len = GRAPH_OID_LEN; + graph->num_chunks = *(unsigned char*)(data + 6); + graph->graph_fd = fd; + graph->data = graph_map; + graph->data_len = graph_size; + + last_chunk_id = 0; + last_chunk_offset = 8; + chunk_lookup = data + 8; + for (i = 0; i < graph->num_chunks; i++) { + uint32_t chunk_id = get_be32(chunk_lookup + 0); + uint64_t chunk_offset = get_be64(chunk_lookup + 4); + int chunk_repeated = 0; + + chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; + + if (chunk_offset > graph_size - GIT_MAX_RAWSZ) { + error("improper chunk offset %" PRIx64, chunk_offset); + goto cleanup_fail; + } + + switch (chunk_id) { + case GRAPH_CHUNKID_OIDFANOUT: + if (graph->chunk_oid_fanout) + chunk_repeated = 1; + else + graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset); + break; + + case GRAPH_CHUNKID_OIDLOOKUP: + if (graph->chunk_oid_lookup) + chunk_repeated = 1; + else + graph->chunk_oid_lookup = data + chunk_offset; + break; + + case GRAPH_CHUNKID_DATA: + if (graph->chunk_commit_data) + chunk_repeated = 1; + else + graph->chunk_commit_data = data + chunk_offset; + break; + + case GRAPH_CHUNKID_LARGEEDGES: + if (graph->chunk_large_edges) + chunk_repeated = 1; + else + graph->chunk_large_edges = data + chunk_offset; + break; + } + + if (chunk_repeated) { + error("chunk id %08x appears multiple times", chunk_id); + goto cleanup_fail; + } + + if (last_chunk_id == GRAPH_CHUNKID_OIDLOOKUP) + { + graph->num_commits = (chunk_offset - last_chunk_offset) + / graph->hash_len; + } + + last_chunk_id = chunk_id; + last_chunk_offset = chunk_offset; + } + + return graph; + +cleanup_fail: + munmap(graph_map, graph_size); + close(fd); + exit(1); +} + +/* global storage */ +struct commit_graph *commit_graph = NULL; + +static void prepare_commit_graph_one(const char *obj_dir) +{ + char *graph_name; + + if (commit_graph) + return; + + graph_name = get_commit_graph_filename(obj_dir); + commit_graph = load_commit_graph_one(graph_name); + + FREE_AND_NULL(graph_name); +} + +static int prepare_commit_graph_run_once = 0; +static void prepare_commit_graph(void) +{ + struct alternate_object_database *alt; + char *obj_dir; + + if (prepare_commit_graph_run_once) + return; + prepare_commit_graph_run_once = 1; + + obj_dir = get_object_directory(); + prepare_commit_graph_one(obj_dir); + prepare_alt_odb(); + for (alt = alt_odb_list; !commit_graph && alt; alt = alt->next) + prepare_commit_graph_one(alt->path); +} + +static void close_commit_graph(void) +{ + if (!commit_graph) + return; + + if (commit_graph->graph_fd >= 0) { + munmap((void *)commit_graph->data, commit_graph->data_len); + commit_graph->data = NULL; + close(commit_graph->graph_fd); + } + + FREE_AND_NULL(commit_graph); +} + +static int bsearch_graph(struct commit_graph *g, struct object_id *oid, uint32_t *pos) +{ + return bsearch_hash(oid->hash, g->chunk_oid_fanout, + g->chunk_oid_lookup, g->hash_len, pos); +} + +static struct commit_list **insert_parent_or_die(struct commit_graph *g, + uint64_t pos, + struct commit_list **pptr) +{ + struct commit *c; + struct object_id oid; + hashcpy(oid.hash, g->chunk_oid_lookup + g->hash_len * pos); + c = lookup_commit(&oid); + if (!c) + BUG("could not find commit %s", oid_to_hex(&oid)); + c->graph_pos = pos; + return &commit_list_insert(c, pptr)->next; +} + +static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos) +{ + struct object_id oid; + uint32_t edge_value; + uint32_t *parent_data_ptr; + uint64_t date_low, date_high; + struct commit_list **pptr; + const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * pos; + + item->object.parsed = 1; + item->graph_pos = pos; + + hashcpy(oid.hash, commit_data); + item->tree = lookup_tree(&oid); + + date_high = get_be32(commit_data + g->hash_len + 8) & 0x3; + date_low =get_be32(commit_data + g->hash_len + 12); + item->date = (timestamp_t)((date_high << 32) | date_low); + + pptr = &item->parents; + + edge_value = get_be32(commit_data + g->hash_len); + if (edge_value == GRAPH_PARENT_NONE) + return 1; + pptr = insert_parent_or_die(g, edge_value, pptr); + + edge_value = get_be32(commit_data + g->hash_len + 4); + if (edge_value == GRAPH_PARENT_NONE) + return 1; + if (!(edge_value & GRAPH_OCTOPUS_EDGES_NEEDED)) { + pptr = insert_parent_or_die(g, edge_value, pptr); + return 1; + } + + parent_data_ptr = (uint32_t*)(g->chunk_large_edges + + 4 * (uint64_t)(edge_value & GRAPH_EDGE_LAST_MASK)); + do { + edge_value = get_be32(parent_data_ptr); + pptr = insert_parent_or_die(g, + edge_value & GRAPH_EDGE_LAST_MASK, + pptr); + parent_data_ptr++; + } while (!(edge_value & GRAPH_LAST_EDGE)); + + return 1; +} + +int parse_commit_in_graph(struct commit *item) +{ + if (!core_commit_graph) + return 0; + if (item->object.parsed) + return 1; + + prepare_commit_graph(); + if (commit_graph) { + uint32_t pos; + int found; + if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) { + pos = item->graph_pos; + found = 1; + } else { + found = bsearch_graph(commit_graph, &(item->object.oid), &pos); + } + + if (found) + return fill_commit_in_graph(item, commit_graph, pos); + } + + return 0; +} + +static void write_graph_chunk_fanout(struct sha1file *f, + struct commit **commits, + int nr_commits) +{ + int i, count = 0; + struct commit **list = commits; + + /* + * Write the first-level table (the list is sorted, + * but we use a 256-entry lookup to be able to avoid + * having to do eight extra binary search iterations). + */ + for (i = 0; i < 256; i++) { + while (count < nr_commits) { + if ((*list)->object.oid.hash[0] != i) + break; + count++; + list++; + } + + sha1write_be32(f, count); + } +} + +static void write_graph_chunk_oids(struct sha1file *f, int hash_len, + struct commit **commits, int nr_commits) +{ + struct commit **list = commits; + int count; + for (count = 0; count < nr_commits; count++, list++) + sha1write(f, (*list)->object.oid.hash, (int)hash_len); +} + +static const unsigned char *commit_to_sha1(size_t index, void *table) +{ + struct commit **commits = table; + return commits[index]->object.oid.hash; +} + +static void write_graph_chunk_data(struct sha1file *f, int hash_len, + struct commit **commits, int nr_commits) +{ + struct commit **list = commits; + struct commit **last = commits + nr_commits; + uint32_t num_extra_edges = 0; + + while (list < last) { + struct commit_list *parent; + int edge_value; + uint32_t packedDate[2]; + + parse_commit(*list); + sha1write(f, (*list)->tree->object.oid.hash, hash_len); + + parent = (*list)->parents; + + if (!parent) + edge_value = GRAPH_PARENT_NONE; + else { + edge_value = sha1_pos(parent->item->object.oid.hash, + commits, + nr_commits, + commit_to_sha1); + + if (edge_value < 0) + edge_value = GRAPH_PARENT_MISSING; + } + + sha1write_be32(f, edge_value); + + if (parent) + parent = parent->next; + + if (!parent) + edge_value = GRAPH_PARENT_NONE; + else if (parent->next) + edge_value = GRAPH_OCTOPUS_EDGES_NEEDED | num_extra_edges; + else { + edge_value = sha1_pos(parent->item->object.oid.hash, + commits, + nr_commits, + commit_to_sha1); + if (edge_value < 0) + edge_value = GRAPH_PARENT_MISSING; + } + + sha1write_be32(f, edge_value); + + if (edge_value & GRAPH_OCTOPUS_EDGES_NEEDED) { + do { + num_extra_edges++; + parent = parent->next; + } while (parent); + } + + if (sizeof((*list)->date) > 4) + packedDate[0] = htonl(((*list)->date >> 32) & 0x3); + else + packedDate[0] = 0; + + packedDate[1] = htonl((*list)->date); + sha1write(f, packedDate, 8); + + list++; + } +} + +static void write_graph_chunk_large_edges(struct sha1file *f, + struct commit **commits, + int nr_commits) +{ + struct commit **list = commits; + struct commit **last = commits + nr_commits; + struct commit_list *parent; + + while (list < last) { + int num_parents = 0; + for (parent = (*list)->parents; num_parents < 3 && parent; + parent = parent->next) + num_parents++; + + if (num_parents <= 2) { + list++; + continue; + } + + /* Since num_parents > 2, this initializer is safe. */ + for (parent = (*list)->parents->next; parent; parent = parent->next) { + int edge_value = sha1_pos(parent->item->object.oid.hash, + commits, + nr_commits, + commit_to_sha1); + + if (edge_value < 0) + edge_value = GRAPH_PARENT_MISSING; + else if (!parent->next) + edge_value |= GRAPH_LAST_EDGE; + + sha1write_be32(f, edge_value); + } + + list++; + } +} + +static int commit_compare(const void *_a, const void *_b) +{ + const struct object_id *a = (const struct object_id *)_a; + const struct object_id *b = (const struct object_id *)_b; + return oidcmp(a, b); +} + +struct packed_commit_list { + struct commit **list; + int nr; + int alloc; +}; + +struct packed_oid_list { + struct object_id *list; + int nr; + int alloc; +}; + +static int add_packed_commits(const struct object_id *oid, + struct packed_git *pack, + uint32_t pos, + void *data) +{ + struct packed_oid_list *list = (struct packed_oid_list*)data; + enum object_type type; + off_t offset = nth_packed_object_offset(pack, pos); + struct object_info oi = OBJECT_INFO_INIT; + + oi.typep = &type; + if (packed_object_info(pack, offset, &oi) < 0) + die("unable to get type of object %s", oid_to_hex(oid)); + + if (type != OBJ_COMMIT) + return 0; + + ALLOC_GROW(list->list, list->nr + 1, list->alloc); + oidcpy(&(list->list[list->nr]), oid); + list->nr++; + + return 0; +} + +static void add_missing_parents(struct packed_oid_list *oids, struct commit *commit) +{ + struct commit_list *parent; + for (parent = commit->parents; parent; parent = parent->next) { + if (!(parent->item->object.flags & UNINTERESTING)) { + ALLOC_GROW(oids->list, oids->nr + 1, oids->alloc); + oidcpy(&oids->list[oids->nr], &(parent->item->object.oid)); + oids->nr++; + parent->item->object.flags |= UNINTERESTING; + } + } +} + +static void close_reachable(struct packed_oid_list *oids) +{ + int i; + struct commit *commit; + + for (i = 0; i < oids->nr; i++) { + commit = lookup_commit(&oids->list[i]); + if (commit) + commit->object.flags |= UNINTERESTING; + } + + /* + * As this loop runs, oids->nr may grow, but not more + * than the number of missing commits in the reachable + * closure. + */ + for (i = 0; i < oids->nr; i++) { + commit = lookup_commit(&oids->list[i]); + if (commit && !parse_commit(commit)) + add_missing_parents(oids, commit); + } + + for (i = 0; i < oids->nr; i++) { + commit = lookup_commit(&oids->list[i]); + if (commit) + commit->object.flags &= ~UNINTERESTING; + } +} + +void write_commit_graph(const char *obj_dir, + const char **pack_indexes, + int nr_packs, + const char **commit_hex, + int nr_commits, + int append) +{ + struct packed_oid_list oids; + struct packed_commit_list commits; + struct sha1file *f; + uint32_t i, count_distinct = 0; + char *graph_name; + int fd; + struct lock_file lk = LOCK_INIT; + uint32_t chunk_ids[5]; + uint64_t chunk_offsets[5]; + int num_chunks; + int num_extra_edges; + struct commit_list *parent; + + oids.nr = 0; + oids.alloc = approximate_object_count() / 4; + + if (append) { + prepare_commit_graph_one(obj_dir); + if (commit_graph) + oids.alloc += commit_graph->num_commits; + } + + if (oids.alloc < 1024) + oids.alloc = 1024; + ALLOC_ARRAY(oids.list, oids.alloc); + + if (append && commit_graph) { + for (i = 0; i < commit_graph->num_commits; i++) { + const unsigned char *hash = commit_graph->chunk_oid_lookup + + commit_graph->hash_len * i; + hashcpy(oids.list[oids.nr++].hash, hash); + } + } + + if (pack_indexes) { + struct strbuf packname = STRBUF_INIT; + int dirlen; + strbuf_addf(&packname, "%s/pack/", obj_dir); + dirlen = packname.len; + for (i = 0; i < nr_packs; i++) { + struct packed_git *p; + strbuf_setlen(&packname, dirlen); + strbuf_addstr(&packname, pack_indexes[i]); + p = add_packed_git(packname.buf, packname.len, 1); + if (!p) + die("error adding pack %s", packname.buf); + if (open_pack_index(p)) + die("error opening index for %s", packname.buf); + for_each_object_in_pack(p, add_packed_commits, &oids); + close_pack(p); + } + strbuf_release(&packname); + } + + if (commit_hex) { + for (i = 0; i < nr_commits; i++) { + const char *end; + struct object_id oid; + struct commit *result; + + if (commit_hex[i] && parse_oid_hex(commit_hex[i], &oid, &end)) + continue; + + result = lookup_commit_reference_gently(&oid, 1); + + if (result) { + ALLOC_GROW(oids.list, oids.nr + 1, oids.alloc); + oidcpy(&oids.list[oids.nr], &(result->object.oid)); + oids.nr++; + } + } + } + + if (!pack_indexes && !commit_hex) + for_each_packed_object(add_packed_commits, &oids, 0); + + close_reachable(&oids); + + QSORT(oids.list, oids.nr, commit_compare); + + count_distinct = 1; + for (i = 1; i < oids.nr; i++) { + if (oidcmp(&oids.list[i-1], &oids.list[i])) + count_distinct++; + } + + if (count_distinct >= GRAPH_PARENT_MISSING) + die(_("the commit graph format cannot write %d commits"), count_distinct); + + commits.nr = 0; + commits.alloc = count_distinct; + ALLOC_ARRAY(commits.list, commits.alloc); + + num_extra_edges = 0; + for (i = 0; i < oids.nr; i++) { + int num_parents = 0; + if (i > 0 && !oidcmp(&oids.list[i-1], &oids.list[i])) + continue; + + commits.list[commits.nr] = lookup_commit(&oids.list[i]); + parse_commit(commits.list[commits.nr]); + + for (parent = commits.list[commits.nr]->parents; + parent; parent = parent->next) + num_parents++; + + if (num_parents > 2) + num_extra_edges += num_parents - 1; + + commits.nr++; + } + num_chunks = num_extra_edges ? 4 : 3; + + if (commits.nr >= GRAPH_PARENT_MISSING) + die(_("too many commits to write graph")); + + graph_name = get_commit_graph_filename(obj_dir); + fd = hold_lock_file_for_update(&lk, graph_name, 0); + + if (fd < 0) { + struct strbuf folder = STRBUF_INIT; + strbuf_addstr(&folder, graph_name); + strbuf_setlen(&folder, strrchr(folder.buf, '/') - folder.buf); + + if (mkdir(folder.buf, 0777) < 0) + die_errno(_("cannot mkdir %s"), folder.buf); + strbuf_release(&folder); + + fd = hold_lock_file_for_update(&lk, graph_name, LOCK_DIE_ON_ERROR); + + if (fd < 0) + die_errno("unable to create '%s'", graph_name); + } + + f = sha1fd(lk.tempfile->fd, lk.tempfile->filename.buf); + + sha1write_be32(f, GRAPH_SIGNATURE); + + sha1write_u8(f, GRAPH_VERSION); + sha1write_u8(f, GRAPH_OID_VERSION); + sha1write_u8(f, num_chunks); + sha1write_u8(f, 0); /* unused padding byte */ + + chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; + chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; + chunk_ids[2] = GRAPH_CHUNKID_DATA; + if (num_extra_edges) + chunk_ids[3] = GRAPH_CHUNKID_LARGEEDGES; + else + chunk_ids[3] = 0; + chunk_ids[4] = 0; + + chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; + chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; + chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr; + chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr; + chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges; + + for (i = 0; i <= num_chunks; i++) { + uint32_t chunk_write[3]; + + chunk_write[0] = htonl(chunk_ids[i]); + chunk_write[1] = htonl(chunk_offsets[i] >> 32); + chunk_write[2] = htonl(chunk_offsets[i] & 0xffffffff); + sha1write(f, chunk_write, 12); + } + + write_graph_chunk_fanout(f, commits.list, commits.nr); + write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr); + write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr); + write_graph_chunk_large_edges(f, commits.list, commits.nr); + + close_commit_graph(); + finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC); + commit_lock_file(&lk); + + free(oids.list); + oids.alloc = 0; + oids.nr = 0; +} diff --git a/commit-graph.h b/commit-graph.h new file mode 100644 index 00000000000000..6bb54767f8c097 --- /dev/null +++ b/commit-graph.h @@ -0,0 +1,47 @@ +#ifndef COMMIT_GRAPH_H +#define COMMIT_GRAPH_H + +#include "git-compat-util.h" + +char *get_commit_graph_filename(const char *obj_dir); + +/* + * Given a commit struct, try to fill the commit struct info, including: + * 1. tree object + * 2. date + * 3. parents. + * + * Returns 1 if and only if the commit was found in the packed graph. + * + * See parse_commit_buffer() for the fallback after this call. + */ +int parse_commit_in_graph(struct commit *item); + +struct commit_graph { + int graph_fd; + + const unsigned char *data; + size_t data_len; + + unsigned char hash_len; + unsigned char num_chunks; + uint32_t num_commits; + struct object_id oid; + + const uint32_t *chunk_oid_fanout; + const unsigned char *chunk_oid_lookup; + const unsigned char *chunk_commit_data; + const unsigned char *chunk_large_edges; +}; + +struct commit_graph *load_commit_graph_one(const char *graph_file); + +void write_commit_graph(const char *obj_dir, + const char **pack_indexes, + int nr_packs, + const char **commit_hex, + int nr_commits, + int append); + +#endif + diff --git a/commit.c b/commit.c index ff51c9f34a975f..78cccbd8ec84eb 100644 --- a/commit.c +++ b/commit.c @@ -1,6 +1,7 @@ #include "cache.h" #include "tag.h" #include "commit.h" +#include "commit-graph.h" #include "pkt-line.h" #include "utf8.h" #include "diff.h" @@ -385,6 +386,8 @@ int parse_commit_gently(struct commit *item, int quiet_on_missing) return -1; if (item->object.parsed) return 0; + if (parse_commit_in_graph(item)) + return 0; buffer = read_sha1_file(item->object.oid.hash, &type, &size); if (!buffer) return quiet_on_missing ? -1 : diff --git a/commit.h b/commit.h index 425f4027752fb4..563776793fd859 100644 --- a/commit.h +++ b/commit.h @@ -9,6 +9,8 @@ #include "string-list.h" #include "pretty.h" +#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF + struct commit_list { struct commit *item; struct commit_list *next; @@ -21,6 +23,7 @@ struct commit { timestamp_t date; struct commit_list *parents; struct tree *tree; + uint32_t graph_pos; }; extern int save_commit_buffer; diff --git a/config.c b/config.c index 9016ab1d64c906..8c7223afe8d1b4 100644 --- a/config.c +++ b/config.c @@ -1234,6 +1234,11 @@ static int git_default_core_config(const char *var, const char *value, void *cb) return 0; } + if (!strcmp(var, "core.commitgraph")) { + core_commit_graph = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "core.sparsecheckout")) { core_apply_sparse_checkout = git_config_bool(var, value); return 0; diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index 030b1f99b94c70..9c31cfdfe9eee1 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -841,6 +841,7 @@ __git_list_porcelain_commands () check-ref-format) : plumbing;; checkout-index) : plumbing;; column) : internal helper;; + commit-graph) : plumbing;; commit-tree) : plumbing;; count-objects) : infrequent;; credential) : credentials;; @@ -2419,6 +2420,7 @@ _git_config () core.bigFileThreshold core.checkStat core.commentChar + core.commitGraph core.compression core.createObject core.deltaBaseCacheLimit diff --git a/csum-file.c b/csum-file.c index 2adae04073816a..4940733b2eab9c 100644 --- a/csum-file.c +++ b/csum-file.c @@ -53,7 +53,7 @@ void sha1flush(struct sha1file *f) } } -int sha1close(struct sha1file *f, unsigned char *result, unsigned int flags) +int finalize_hashfile(struct sha1file *f, unsigned char *result, unsigned int flags) { int fd; @@ -61,11 +61,11 @@ int sha1close(struct sha1file *f, unsigned char *result, unsigned int flags) git_SHA1_Final(f->buffer, &f->ctx); if (result) hashcpy(result, f->buffer); - if (flags & (CSUM_CLOSE | CSUM_FSYNC)) { - /* write checksum and close fd */ + if (flags & CSUM_HASH_IN_STREAM) flush(f, f->buffer, 20); - if (flags & CSUM_FSYNC) - fsync_or_die(f->fd, f->name); + if (flags & CSUM_FSYNC) + fsync_or_die(f->fd, f->name); + if (flags & CSUM_CLOSE) { if (close(f->fd)) die_errno("%s: sha1 file error on close", f->name); fd = 0; diff --git a/csum-file.h b/csum-file.h index 7530927d774562..050a13cbfd5214 100644 --- a/csum-file.h +++ b/csum-file.h @@ -26,14 +26,15 @@ struct sha1file_checkpoint { extern void sha1file_checkpoint(struct sha1file *, struct sha1file_checkpoint *); extern int sha1file_truncate(struct sha1file *, struct sha1file_checkpoint *); -/* sha1close flags */ -#define CSUM_CLOSE 1 -#define CSUM_FSYNC 2 +/* finalize_hashfile flags */ +#define CSUM_CLOSE 1 +#define CSUM_FSYNC 2 +#define CSUM_HASH_IN_STREAM 4 extern struct sha1file *sha1fd(int fd, const char *name); extern struct sha1file *sha1fd_check(const char *name); extern struct sha1file *sha1fd_throughput(int fd, const char *name, struct progress *tp); -extern int sha1close(struct sha1file *, unsigned char *, unsigned int); +extern int finalize_hashfile(struct sha1file *, unsigned char *, unsigned int); extern void sha1write(struct sha1file *, const void *, unsigned int); extern void sha1flush(struct sha1file *f); extern void crc32_begin(struct sha1file *); diff --git a/environment.c b/environment.c index 271f78b8f3e2c5..089568ad0b7a86 100644 --- a/environment.c +++ b/environment.c @@ -60,6 +60,7 @@ enum push_default_type push_default = PUSH_DEFAULT_UNSPECIFIED; enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE; char *notes_ref_name; int grafts_replace_parents = 1; +int core_commit_graph; int core_apply_sparse_checkout; int core_gvfs; int core_midx; diff --git a/fast-import.c b/fast-import.c index e7da955e161843..ded84ddee79e1a 100644 --- a/fast-import.c +++ b/fast-import.c @@ -979,7 +979,7 @@ static void end_packfile(void) struct tag *t; close_pack_windows(pack_data); - sha1close(pack_file, cur_pack_oid.hash, 0); + finalize_hashfile(pack_file, cur_pack_oid.hash, 0); fixup_pack_header_footer(pack_data->pack_fd, pack_data->sha1, pack_data->pack_name, object_count, cur_pack_oid.hash, pack_size); diff --git a/git.c b/git.c index 5a95ba39548cca..aa37e1fea09949 100644 --- a/git.c +++ b/git.c @@ -460,6 +460,7 @@ static struct cmd_struct commands[] = { { "clone", cmd_clone }, { "column", cmd_column, RUN_SETUP_GENTLY }, { "commit", cmd_commit, RUN_SETUP | NEED_WORK_TREE }, + { "commit-graph", cmd_commit_graph, RUN_SETUP }, { "commit-tree", cmd_commit_tree, RUN_SETUP }, { "config", cmd_config, RUN_SETUP_GENTLY }, { "count-objects", cmd_count_objects, RUN_SETUP }, diff --git a/midx.c b/midx.c index ed19656d74a352..10c8c033478f2a 100644 --- a/midx.c +++ b/midx.c @@ -828,7 +828,7 @@ const char *write_midx_file(const char *pack_dir, } } - sha1close(f, final_hash, CSUM_CLOSE | CSUM_FSYNC); + finalize_hashfile(f, final_hash, CSUM_HASH_IN_STREAM | CSUM_CLOSE | CSUM_FSYNC); if (rename_needed) { diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index a8df5ce2ab67bf..9bb778101d3ad7 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -535,7 +535,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index, if (options & BITMAP_OPT_HASH_CACHE) write_hash_cache(f, index, index_nr); - sha1close(f, NULL, CSUM_FSYNC); + finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_FSYNC | CSUM_CLOSE); if (adjust_shared_perm(tmp_file.buf)) die_errno("unable to make temporary bitmap file readable"); diff --git a/pack-write.c b/pack-write.c index fea62841920c96..2e724384f790a1 100644 --- a/pack-write.c +++ b/pack-write.c @@ -170,8 +170,9 @@ const char *write_idx_file(const char *index_name, struct pack_idx_entry **objec } sha1write(f, sha1, 20); - sha1close(f, NULL, ((opts->flags & WRITE_IDX_VERIFY) - ? CSUM_CLOSE : CSUM_FSYNC)); + finalize_hashfile(f, NULL, CSUM_HASH_IN_STREAM | CSUM_CLOSE | + ((opts->flags & WRITE_IDX_VERIFY) + ? 0 : CSUM_FSYNC)); return index_name; } diff --git a/packfile.c b/packfile.c index c9af55f93b9559..e5281a2c97878c 100644 --- a/packfile.c +++ b/packfile.c @@ -1817,11 +1817,8 @@ off_t find_pack_entry_one(const unsigned char *sha1, { const uint32_t *level1_ofs = p->index_data; const unsigned char *index = p->index_data; - unsigned hi, lo, stride; - static int debug_lookup = -1; - - if (debug_lookup < 0) - debug_lookup = !!getenv("GIT_DEBUG_LOOKUP"); + unsigned stride; + uint32_t result; if (!index) { if (open_pack_index(p)) @@ -1834,8 +1831,6 @@ off_t find_pack_entry_one(const unsigned char *sha1, index += 8; } index += 4 * 256; - hi = ntohl(level1_ofs[*sha1]); - lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1])); if (p->index_version > 1) { stride = 20; } else { @@ -1843,24 +1838,8 @@ off_t find_pack_entry_one(const unsigned char *sha1, index += 4; } - if (debug_lookup) - printf("%02x%02x%02x... lo %u hi %u nr %"PRIu32"\n", - sha1[0], sha1[1], sha1[2], lo, hi, p->num_objects); - - while (lo < hi) { - unsigned mi = lo + (hi - lo) / 2; - int cmp = hashcmp(index + mi * stride, sha1); - - if (debug_lookup) - printf("lo %u hi %u rg %u mi %u\n", - lo, hi, hi - lo, mi); - if (!cmp) - return nth_packed_object_offset(p, mi); - if (cmp > 0) - hi = mi; - else - lo = mi+1; - } + if (bsearch_hash(sha1, level1_ofs, index, stride, &result)) + return nth_packed_object_offset(p, result); return 0; } @@ -1968,7 +1947,7 @@ int has_pack_index(const unsigned char *sha1) return 1; } -static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data) +int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data) { uint32_t i; int r = 0; diff --git a/packfile.h b/packfile.h index baa99404f01ac4..66d5570a648f66 100644 --- a/packfile.h +++ b/packfile.h @@ -70,7 +70,7 @@ extern void close_pack_index(struct packed_git *); extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, unsigned long *); extern void close_pack_windows(struct packed_git *); -extern void close_pack(struct packed_git *p); +extern void close_pack(struct packed_git *); extern void close_all_packs(void); extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); @@ -143,6 +143,7 @@ typedef int each_packed_object_fn(const struct object_id *oid, struct packed_git *pack, uint32_t pos, void *data); +extern int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn, void *data); extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags); #endif diff --git a/sha1-lookup.c b/sha1-lookup.c index 4cf3ebd9212f6d..8d0b1db3e27c99 100644 --- a/sha1-lookup.c +++ b/sha1-lookup.c @@ -99,3 +99,31 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr, } while (lo < hi); return -lo-1; } + +int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo, + const unsigned char *table, size_t stride, uint32_t *result) +{ + uint32_t hi, lo; + + hi = ntohl(fanout_nbo[*sha1]); + lo = ((*sha1 == 0x0) ? 0 : ntohl(fanout_nbo[*sha1 - 1])); + + while (lo < hi) { + unsigned mi = lo + (hi - lo) / 2; + int cmp = hashcmp(table + mi * stride, sha1); + + if (!cmp) { + if (result) + *result = mi; + return 1; + } + if (cmp > 0) + hi = mi; + else + lo = mi + 1; + } + + if (result) + *result = lo; + return 0; +} diff --git a/sha1-lookup.h b/sha1-lookup.h index cf5314f402ce78..7678b23b36c291 100644 --- a/sha1-lookup.h +++ b/sha1-lookup.h @@ -7,4 +7,26 @@ extern int sha1_pos(const unsigned char *sha1, void *table, size_t nr, sha1_access_fn fn); + +/* + * Searches for sha1 in table, using the given fanout table to determine the + * interval to search, then using binary search. Returns 1 if found, 0 if not. + * + * Takes the following parameters: + * + * - sha1: the hash to search for + * - fanout_nbo: a 256-element array of NETWORK-order 32-bit integers; the + * integer at position i represents the number of elements in table whose + * first byte is less than or equal to i + * - table: a sorted list of hashes with optional extra information in between + * - stride: distance between two consecutive elements in table (should be + * GIT_MAX_RAWSZ or greater) + * - result: if not NULL, this function stores the element index of the + * position found (if the search is successful) or the index of the least + * element that is greater than sha1 (if the search is not successful) + * + * This function does not verify the validity of the fanout table. + */ +int bsearch_hash(const unsigned char *sha1, const uint32_t *fanout_nbo, + const unsigned char *table, size_t stride, uint32_t *result); #endif diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh new file mode 100755 index 00000000000000..7a96fd528b5564 --- /dev/null +++ b/t/t5318-commit-graph.sh @@ -0,0 +1,223 @@ +#!/bin/sh + +test_description='commit graph' +. ./test-lib.sh + +test_expect_success 'setup full repo' ' + mkdir full && + cd "$TRASH_DIRECTORY/full" && + git init && + git config core.commitGraph true && + objdir=".git/objects" +' + +test_expect_success 'write graph with no packs' ' + cd "$TRASH_DIRECTORY/full" && + git commit-graph write --object-dir . && + test_path_is_file info/commit-graph +' + +test_expect_success 'create commits and repack' ' + cd "$TRASH_DIRECTORY/full" && + for i in 1 2 3 + do + test_commit commit$i + done && + git repack +' + +graph_git_two_modes() { + git -c core.graph=true $1 >output + git -c core.graph=false $1 >expect + test_cmp output expect +} + +graph_git_behavior() { + MSG=$1 + DIR=$2 + BRANCH=$3 + COMPARE=$4 + test_expect_success "check normal git operations: $MSG" ' + cd "$TRASH_DIRECTORY/$DIR" && + graph_git_two_modes "log --oneline $BRANCH" && + graph_git_two_modes "log --topo-order $BRANCH" && + graph_git_two_modes "log --graph $COMPARE..$BRANCH" && + graph_git_two_modes "branch -vv" && + graph_git_two_modes "merge-base -a $BRANCH $COMPARE" + ' +} + +graph_git_behavior 'no graph' full commit3 commit1 + +graph_read_expect() { + OPTIONAL="" + NUM_CHUNKS=3 + if test ! -z $2 + then + OPTIONAL=" $2" + NUM_CHUNKS=$((3 + $(echo "$2" | wc -w))) + fi + cat >expect <<- EOF + header: 43475048 1 1 $NUM_CHUNKS 0 + num_commits: $1 + chunks: oid_fanout oid_lookup commit_metadata$OPTIONAL + EOF + git commit-graph read >output && + test_cmp expect output +} + +test_expect_success 'write graph' ' + cd "$TRASH_DIRECTORY/full" && + graph1=$(git commit-graph write) && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "3" +' + +graph_git_behavior 'graph exists' full commit3 commit1 + +test_expect_success 'Add more commits' ' + cd "$TRASH_DIRECTORY/full" && + git reset --hard commit1 && + for i in 4 5 + do + test_commit $i && + git branch commit$i + done && + git reset --hard commit2 && + for i in 6 7 + do + test_commit $i && + git branch commit$i + done && + git reset --hard commit2 && + git merge commit4 && + git branch merge/1 && + git reset --hard commit4 && + git merge commit6 && + git branch merge/2 && + git reset --hard commit3 && + git merge commit5 commit7 && + git branch merge/3 && + git repack +' + +# Current graph structure: +# +# __M3___ +# / | \ +# 3 M1 5 M2 7 +# |/ \|/ \| +# 2 4 6 +# |___/____/ +# 1 + +test_expect_success 'write graph with merges' ' + cd "$TRASH_DIRECTORY/full" && + git commit-graph write && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "10" "large_edges" +' + +graph_git_behavior 'merge 1 vs 2' full merge/1 merge/2 +graph_git_behavior 'merge 1 vs 3' full merge/1 merge/3 +graph_git_behavior 'merge 2 vs 3' full merge/2 merge/3 + +test_expect_success 'Add one more commit' ' + cd "$TRASH_DIRECTORY/full" && + test_commit 8 && + git branch commit8 && + ls $objdir/pack | grep idx >existing-idx && + git repack && + ls $objdir/pack| grep idx | grep -v --file=existing-idx >new-idx +' + +# Current graph structure: +# +# 8 +# | +# __M3___ +# / | \ +# 3 M1 5 M2 7 +# |/ \|/ \| +# 2 4 6 +# |___/____/ +# 1 + +graph_git_behavior 'mixed mode, commit 8 vs merge 1' full commit8 merge/1 +graph_git_behavior 'mixed mode, commit 8 vs merge 2' full commit8 merge/2 + +test_expect_success 'write graph with new commit' ' + cd "$TRASH_DIRECTORY/full" && + git commit-graph write && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "11" "large_edges" +' + +graph_git_behavior 'full graph, commit 8 vs merge 1' full commit8 merge/1 +graph_git_behavior 'full graph, commit 8 vs merge 2' full commit8 merge/2 + +test_expect_success 'write graph with nothing new' ' + cd "$TRASH_DIRECTORY/full" && + git commit-graph write && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "11" "large_edges" +' + +graph_git_behavior 'cleared graph, commit 8 vs merge 1' full commit8 merge/1 +graph_git_behavior 'cleared graph, commit 8 vs merge 2' full commit8 merge/2 + +test_expect_success 'build graph from latest pack with closure' ' + cd "$TRASH_DIRECTORY/full" && + cat new-idx | git commit-graph write --stdin-packs && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "9" "large_edges" +' + +graph_git_behavior 'graph from pack, commit 8 vs merge 1' full commit8 merge/1 +graph_git_behavior 'graph from pack, commit 8 vs merge 2' full commit8 merge/2 + +test_expect_success 'build graph from commits with closure' ' + cd "$TRASH_DIRECTORY/full" && + git tag -a -m "merge" tag/merge merge/2 && + git rev-parse tag/merge >commits-in && + git rev-parse merge/1 >>commits-in && + cat commits-in | git commit-graph write --stdin-commits && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "6" +' + +graph_git_behavior 'graph from commits, commit 8 vs merge 1' full commit8 merge/1 +graph_git_behavior 'graph from commits, commit 8 vs merge 2' full commit8 merge/2 + +test_expect_success 'build graph from commits appendly' ' + cd "$TRASH_DIRECTORY/full" && + git rev-parse merge/3 | git commit-graph write --stdin-commits --append && + test_path_is_file $objdir/info/commit-graph && + graph_read_expect "10" "large_edges" +' + +graph_git_behavior 'append graph, commit 8 vs merge 1' full commit8 merge/1 +graph_git_behavior 'append graph, commit 8 vs merge 2' full commit8 merge/2 + +test_expect_success 'setup bare repo' ' + cd "$TRASH_DIRECTORY" && + git clone --bare --no-local full bare && + cd bare && + git config core.commitGraph true && + baredir="./objects" +' + +graph_git_behavior 'bare repo, commit 8 vs merge 1' bare commit8 merge/1 +graph_git_behavior 'bare repo, commit 8 vs merge 2' bare commit8 merge/2 + +test_expect_success 'write graph in bare repo' ' + cd "$TRASH_DIRECTORY/bare" && + git commit-graph write && + test_path_is_file $baredir/info/commit-graph && + graph_read_expect "11" "large_edges" +' + +graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commit8 merge/1 +graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commit8 merge/2 + +test_done