Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse index: delete ignored files outside sparse cone #1009

Commits on Aug 24, 2021

  1. t7519: rewrite sparse index test

    The sparse index is tested with the FS Monitor hook and extension since
    f8fe49e (fsmonitor: integrate with sparse index, 2021-07-14). This test
    was very fragile because it shared an index across sparse and non-sparse
    behavior. Since that expansion and contraction could cause the index to
    lose its FS Monitor bitmap and token, behavior is fragile to changes in
    'git sparse-checkout set'.
    
    Rewrite the test to use two clones of the original repo: full and
    sparse. This allows us to also keep the test files (actual, expect,
    trace2.txt) out of the repos we are testing with 'git status'.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Aug 24, 2021
    Configuration menu
    Copy the full SHA
    c407b2c View commit details
    Browse the repository at this point in the history
  2. sparse-index: silently return when not using cone-mode patterns

    While the sparse-index is only enabled when core.sparseCheckoutCone is
    also enabled, it is possible for the user to modify the sparse-checkout
    file manually in a way that does not match cone-mode patterns. In this
    case, we should refuse to convert an index into a sparse index, since
    the sparse_checkout_patterns will not be initialized with recursive and
    parent path hashsets.
    
    Also silently return if there are no cache entries, which is a simple
    case: there are no paths to make sparse!
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Aug 24, 2021
    Configuration menu
    Copy the full SHA
    8660877 View commit details
    Browse the repository at this point in the history

Commits on Sep 7, 2021

  1. unpack-trees: fix nested sparse-dir search

    The iterated search in find_cache_entry() was recently modified to
    include a loop that searches backwards for a sparse directory entry that
    matches the given traverse_info and name_entry. However, the string
    comparison failed to actually concatenate those two strings, so this
    failed to find a sparse directory when it was not a top-level directory.
    
    This caused some errors in rare cases where a 'git checkout' spanned a
    diff that modified files within the sparse directory entry, but we could
    not correctly find the entry.
    
    Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
    Helped-by: René Scharfe <l.s.r@web.de>
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    edb00d3 View commit details
    Browse the repository at this point in the history
  2. sparse-index: silently return when cache tree fails

    If cache_tree_update() returns a non-zero value, then it could not
    create the cache tree. This is likely due to a path having a merge
    conflict. Since we are already returning early, let's return silently to
    avoid making it seem like we failed to write the index at all.
    
    If we remove our dependence on the cache tree within
    convert_to_sparse(), then we could still recover from this scenario and
    have a sparse index.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    c8620de View commit details
    Browse the repository at this point in the history
  3. sparse-index: use WRITE_TREE_MISSING_OK

    When updating the cache tree in convert_to_sparse(), the
    WRITE_TREE_MISSING_OK flag indicates that trees might be computed that
    do not already exist within the object database. This happens in cases
    such as 'git add' creating new trees that it wants to store in
    anticipation of a following 'git commit'. If this flag is not specified,
    then it might trigger a promisor fetch or a failure due to the object
    not existing locally.
    
    Use WRITE_TREE_MISSING_OK during convert_to_sparse() to avoid these
    possible reasons for the cache_tree_update() to fail.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    3717161 View commit details
    Browse the repository at this point in the history
  4. sparse-checkout: create helper methods

    As we integrate the sparse index into more builtins, we occasionally
    need to check the sparse-checkout patterns to see if a path is within
    the sparse-checkout cone. Create some helper methods that help
    initialize the patterns and check for pattern matching to make this
    easier.
    
    The existing callers of commands like get_sparse_checkout_patterns() use
    a custom 'struct pattern_list' that is not necessarily the one in the
    'struct index_state', so there are not many previous uses that could
    adopt these helpers. There are just two in builtin/add.c and
    sparse-index.c that can use path_in_sparse_checkout().
    
    We add a path_in_cone_mode_sparse_checkout() as well that will only
    return false if the path is outside of the sparse-checkout definition
    _and_ the sparse-checkout patterns are in cone mode.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    98b4cae View commit details
    Browse the repository at this point in the history
  5. attr: be careful about sparse directories

    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    6ec3cb2 View commit details
    Browse the repository at this point in the history
  6. sparse-index: add SPARSE_INDEX_MEMORY_ONLY flag

    The convert_to_sparse() method checks for the GIT_TEST_SPARSE_INDEX
    environment variable or the "index.sparse" config setting before
    converting the index to a sparse one. This is for ease of use since all
    current consumers are preparing to compress the index before writing it
    to disk. If these settings are not enabled, then convert_to_sparse()
    silently returns without doing anything.
    
    We will add a consumer in the next change that wants to use the sparse
    index as an in-memory data structure, regardless of whether the on-disk
    format should be sparse.
    
    To that end, create the SPARSE_INDEX_MEMORY_ONLY flag that will skip
    these config checks when enabled. All current consumers are modified to
    pass '0' in the new 'flags' parameter.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    d57f48c View commit details
    Browse the repository at this point in the history
  7. sparse-checkout: clear tracked sparse dirs

    When changing the scope of a sparse-checkout using cone mode, we might
    have some tracked directories go out of scope. The current logic removes
    the tracked files from within those directories, but leaves the ignored
    files within those directories. This is a bit unexpected to users who
    have given input to Git saying they don't need those directories
    anymore.
    
    This is something that is new to the cone mode pattern type: the user
    has explicitly said "I want these directories and _not_ those
    directories." The typical sparse-checkout patterns more generally apply
    to "I want files with with these patterns" so it is natural to leave
    ignored files as they are. This focus on directories in cone mode
    provides us an opportunity to change the behavior.
    
    Leaving these ignored files in the sparse directories makes it
    impossible to gain performance benefits in the sparse index. When we
    track into these directories, we need to know if the files are ignored
    or not, which might depend on the _tracked_ .gitignore file(s) within
    the sparse directory. This depends on the indexed version of the file,
    so the sparse directory must be expanded.
    
    We must take special care to look for untracked, non-ignored files in
    these directories before deleting them. We do not want to delete any
    meaningful work that the users were doing in those directories and
    perhaps forgot to add and commit before switching sparse-checkout
    definitions. Since those untracked files might be code files that
    generated ignored build output, also do not delete any ignored files
    from these directories in that case. The users can recover their state
    by resetting their sparse-checkout definition to include that directory
    and continue. Alternatively, they can see the warning that is presented
    and delete the directory themselves to regain the performance they
    expect.
    
    By deleting the sparse directories when changing scope (or running 'git
    sparse-checkout reapply') we regain these performance benefits as if the
    repository was in a clean state.
    
    Since these ignored files are frequently build output or helper files
    from IDEs, the users should not need the files now that the tracked
    files are removed. If the tracked files reappear, then they will have
    newer timestamps than the build artifacts, so the artifacts will need to
    be regenerated anyway.
    
    Use the sparse-index as a data structure in order to find the sparse
    directories that can be safely deleted. Re-expand the index to a full
    one if it was full before.
    
    Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
    derrickstolee committed Sep 7, 2021
    Configuration menu
    Copy the full SHA
    91b53f2 View commit details
    Browse the repository at this point in the history