Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Consolidate reachability logic #8

Closed
wants to merge 18 commits into from

Commits on Jul 20, 2018

  1. commit-reach: move walk methods from commit.c

    There are several commit walks in the codebase. Group them together into
    a new commit-reach.c file and corresponding header. After we group these
    walks into one place, we can reduce duplicate logic by calling
    equivalent methods.
    
    The method declarations in commit.h are not touched by this commit and
    will be moved in a following commit. Many consumers need to point to
    commit-reach.h and that would bloat this commit.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    56b7ca5 View commit details
    Browse the repository at this point in the history
  2. commit.h: remove method declarations

    These methods are now declared in commit-reach.h. Remove them from
    commit.h and add new include statements in all files that require these
    declarations.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    ca88ec3 View commit details
    Browse the repository at this point in the history
  3. commit-reach: move ref_newer from remote.c

    There are several commit walks in the codebase. Group them together into
    a new commit-reach.c file and corresponding header. After we group these
    walks into one place, we can reduce duplicate logic by calling
    equivalent methods.
    
    The ref_newer() method is used by 'git push -f' to check if a force-push
    is necessary. By making the method public, we make it possible to test
    the method directly without setting up an envieronment where a 'git
    push' call makes sense.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    12349f5 View commit details
    Browse the repository at this point in the history
  4. commit-reach: move commit_contains from ref-filter

    There are several commit walks in the codebase. Group them together into
    a new commit-reach.c file and corresponding header. After we group these
    walks into one place, we can reduce duplicate logic by calling
    equivalent methods.
    
    All methods are direct moves, except we also make the commit_contains()
    method public so its consumers in ref-filter.c can still call it. We can
    also test this method in a test-tool in a later commit.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    f548989 View commit details
    Browse the repository at this point in the history
  5. upload-pack: make reachable() more generic

    In anticipation of moving the reachable() method to commit-reach.c,
    modify the prototype to be more generic to flags known outside of
    upload-pack.c. Also rename 'want' to 'from' to make the statement
    more clear outside of the context of haves/wants negotiation.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    e822fd8 View commit details
    Browse the repository at this point in the history
  6. upload-pack: refactor ok_to_give_up()

    In anticipation of consolidating all commit reachability algorithms,
    refactor ok_to_give_up() in order to allow splitting its logic into
    an external method.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    08dcd4e View commit details
    Browse the repository at this point in the history
  7. upload-pack: generalize commit date cutoff

    The ok_to_give_up() method uses the commit date as a cutoff to avoid
    walking the entire reachble set of commits. Before moving the
    reachable() method to commit-reach.c, pull out the dependence on the
    global constant 'oldest_have' with a 'min_commit_date' parameter.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    a71ec7f View commit details
    Browse the repository at this point in the history
  8. commit-reach: move can_all_from_reach_with_flags

    There are several commit walks in the codebase. Group them together into
    a new commit-reach.c file and corresponding header. After we group these
    walks into one place, we can reduce duplicate logic by calling
    equivalent methods.
    
    The can_all_from_reach_with_flags method is used in a stateful way by
    upload-pack.c. The parameters are very flexible, so we will be able to
    use its commit walking logic for many other callers.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    aa96c80 View commit details
    Browse the repository at this point in the history
  9. test-reach: create new test tool for ref_newer

    As we prepare to change the behavior of the algorithms in
    commit-reach.c, create a new test-tool subcommand 'reach' to test these
    methods on interesting commit-graph shapes.
    
    To use the new test-tool, use 'test-tool reach <method>' and provide
    input to stdin that describes the inputs to the method. Currently, we
    only implement the ref_newer method, which requires two commits. Use
    lines "A:<committish>" and "B:<committish>" for the two inputs. We will
    expand this input later to accommodate methods that take lists of
    commits.
    
    The test t6600-test-reach.sh creates a repo whose commits form a
    two-dimensional grid. This grid makes it easy for us to determine
    reachability because commit-A-B can reach commit-X-Y if and only if A is
    at least X and B is at least Y. This helps create interesting test cases
    for each result of the methods in commit-reach.c.
    
    We test all methods in three different states of the commit-graph file:
    Non-existent (no generation numbers), fully computed, and mixed (some
    commits have generation numbers and others do not).
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    2da4fe8 View commit details
    Browse the repository at this point in the history
  10. test-reach: test in_merge_bases

    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    10bda49 View commit details
    Browse the repository at this point in the history
  11. test-reach: test is_descendant_of

    The is_descendant_of method takes a single commit as its first parameter
    and a list of commits as its second parameter. Extend the input of the
    'test-tool reach' command to take multiple lines of the form
    "X:<committish>" to construct a list of commits. Pass these to
    is_descendant_of and create tests that check each result.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    9ab9511 View commit details
    Browse the repository at this point in the history
  12. test-reach: test get_merge_bases_many

    The get_merge_bases_many method returns a list of merge bases for a
    single commit (A) against a list of commits (X). Some care is needed in
    constructing the expected behavior because the result is not the
    expected merge-base for an octopus merge with those parents but instead
    the set of maximal commits that are reachable from A and at least one of
    the commits in X.
    
    Add get_merge_bases_many to 'test-tool reach' and create a test that
    demonstrates that this output returns multiple results. Specifically, we
    select a list of three commits such that we output two commits that are
    reachable from one of the first two, respectively, and none are
    reachable from the third.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    d3dfbfc View commit details
    Browse the repository at this point in the history
  13. test-reach: test reduce_heads

    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    91a6297 View commit details
    Browse the repository at this point in the history
  14. test-reach: test can_all_from_reach_with_flags

    The can_all_from_reach_with_flags method is used by ok_to_give_up in
    upload-pack.c to see if we have done enough negotiation during a fetch.
    This method is intentionally created to preserve state between calls to
    assist with stateful negotiation, such as over SSH.
    
    To make this method testable, add a new can_all_from_reach method that
    does the initial setup and final tear-down. We will later use this
    method in production code. Call the method from 'test-tool reach' for
    now.
    
    Since this is a many-to-many reachability query, add a new type of input
    to the 'test-tool reach' input format. Lines "Y:<committish>" create a
    list of commits to be the reachability targets from the commits in the
    'X' list. In the context of fetch negotiation, the 'X' commits are the
    'want' commits and the 'Y' commits are the 'have' commits.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    0557fed View commit details
    Browse the repository at this point in the history
  15. test-reach: test commit_contains

    The commit_contains method has two modes which depend on the given
    ref_filter struct. We have the "normal" algorithm (which is also the
    typically-slow operation) and the "tag" algorithm. This difference is
    essentially what changes performance for 'git branch --contains' versus
    'git tag --contains'. There are thoughts that the data shapes used by
    these two applications justify the different implementations.
    
    Create tests using 'test-tool reach commit_contains [--tag]' to cover
    both methods.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    02d0386 View commit details
    Browse the repository at this point in the history
  16. commit-reach: replace ref_newer logic

    The ref_newer method is used by 'git push' to check if a force-push is
    required. This method does not use any kind of cutoff when walking, so
    in the case of a force-push will walk all reachable commits.
    
    The is_descendant_of method already uses paint_down_to_common along with
    cutoffs. By translating the ref_newer arguments into the commit and
    commit_list required by is_descendant_of, we can have one fewer commit
    walk and also improve our performance!
    
    For a copy of the Linux repository, 'test-tool reach ref_newer' presents
    the following improvements with the specified input. In the case that
    ref_newer returns 1, there is no improvement. The improvement is in the
    second case where ref_newer returns 0.
    
    Input:
    A:v4.9
    B:v3.19
    
    Before: 0.09 s
     After: 0.09 s
    
    To test the negative case, add a new commit with parent v3.19,
    regenerate the commit-graph, and then run with B pointing at that
    commit.
    
    Before: 0.43 s
     After: 0.09 s
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    3b44044 View commit details
    Browse the repository at this point in the history
  17. commit-reach: make can_all_from_reach... linear

    The can_all_from_reach_with_flags() algorithm is currently quadratic in
    the worst case, because it calls the reachable() method for every 'from'
    without tracking which commits have already been walked or which can
    already reach a commit in 'to'.
    
    Rewrite the algorithm to walk each commit a constant number of times.
    
    We also add some optimizations that should work for the main consumer of
    this method: fetch negotitation (haves/wants).
    
    The first step includes using a depth-first-search (DFS) from each
    'from' commit, sorted by ascending generation number. We do not walk
    beyond the minimum generation number or the minimum commit date. This
    DFS is likely to be faster than the existing reachable() method because
    we expect previous ref values to be along the first-parent history.
    
    If we find a target commit, then we mark everything in the DFS stack as
    a RESULT. This expands the set of targets for the other 'from' commits.
    We also mark the visited commits using 'assign_flag' to prevent re-
    walking the same commits.
    
    We still need to clear our flags at the end, which is why we will have a
    total of three visits to each commit.
    
    Performance was measured on the Linux repository using
    'test-tool reach can_all_from_reach'. The input included rows seeded by
    tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
    v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
    v3 releases and want all major v4 releases." The "large" case included
    X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
    tags to the set, which does not greatly increase the number of objects
    that are considered, but does increase the number of 'from' commits,
    demonstrating the quadratic nature of the previous code.
    
    Small Case:
    
    Before: 1.52 s
     After: 0.26 s
    
    Large Case:
    
    Before: 3.50 s
     After: 0.27 s
    
    Note how the time increases between the two cases in the two versions.
    The new code increases relative to the number of commits that need to be
    walked, but not directly relative to the number of 'from' commits.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    ae6fc62 View commit details
    Browse the repository at this point in the history
  18. commit-reach: use can_all_from_reach

    The is_descendant_of method previously used in_merge_bases() to check if
    the commit can reach any of the commits in the provided list. This had
    two performance problems:
    
    1. The performance is quadratic in worst-case.
    
    2. A single in_merge_bases() call requires walking beyond the target
       commit in order to find the full set of boundary commits that may be
       merge-bases.
    
    The can_all_from_reach method avoids this quadratic behavior and can
    limit the search beyond the target commits using generation numbers. It
    requires a small prototype adjustment to stop using commit-date as a
    cutoff, as that optimization is no longer appropriate here.
    
    Since in_merge_bases() uses paint_down_to_common(), is_descendant_of()
    naturally found cutoffs to avoid walking the entire commit graph. Since
    we want to always return the correct result, we cannot use the
    min_commit_date cutoff in can_all_from_reach. We then rely on generation
    numbers to provide the cutoff.
    
    Since not all repos will have a commit-graph file, nor will we always
    have generation numbers computed for a commit-graph file, create a new
    method, generation_numbers_enabled(), that checks for a commit-graph
    file and sees if the first commit in the file has a non-zero generation
    number. In the case that we do not have generation numbers, use the old
    logic for is_descendant_of().
    
    Performance was meausured on a copy of the Linux repository using the
    'test-tool reach is_descendant_of' command using this input:
    
    A:v4.9
    X:v4.10
    X:v4.11
    X:v4.12
    X:v4.13
    X:v4.14
    X:v4.15
    X:v4.16
    X:v4.17
    X.v3.0
    
    Note that this input is tailored to demonstrate the quadratic nature of
    the previous method, as it will compute merge-bases for v4.9 versus all
    of the later versions before checking against v4.1.
    
    Before: 0.26 s
     After: 0.21 s
    
    Since we previously used the is_descendant_of method in the ref_newer
    method, we also measured performance there using
    'test-tool reach ref_newer' with this input:
    
    A:v4.9
    B:v3.19
    
    Before: 0.10 s
     After: 0.08 s
    
    By adding a new commit with parent v3.19, we test the non-reachable case
    of ref_newer:
    
    Before: 0.09 s
     After: 0.08 s
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Jul 20, 2018
    Configuration menu
    Copy the full SHA
    f40f6f4 View commit details
    Browse the repository at this point in the history