Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintenance: use random minute for scheduling #597

Commits on Aug 11, 2023

  1. maintenance: add get_random_minute()

    When we initially created background maintenance -- with its hourly,
    daily, and weekly schedules -- we considered the effects of all clients
    launching fetches to the server every hour on the hour. The worry of
    DDoSing server hosts was noted, but left as something we would consider
    for a future update.
    
    As background maintenance has gained more adoption over the past three
    years, our worries about DDoSing the big Git hosts has been unfounded.
    Those systems, especially those serving public repositories, are already
    resilient to thundering herds of much smaller scale.
    
    However, sometimes organizations spin up specific custom server
    infrastructure either in addition to or on top of their Git host. Some
    of these technologies are built for a different range of scale, and can
    hit concurrency limits sooner. Organizations with such custom
    infrastructures are more likely to recommend tools like `scalar` which
    furthers their adoption of background maintenance.
    
    To help solve for this, create get_random_minute() as a method to help
    Git select a random minute when creating schedules in the future. The
    integrations with this method do not yet exist, but will follow in
    future changes.
    
    To avoid multiple sources of randomness in the Git codebase, create a
    new helper function, git_rand(), that returns a random uint32_t. This is
    similar to how rand() returns a random nonnegative value, except it is
    based on csprng_bytes() which is cryptographic and will return values
    larger than RAND_MAX.
    
    One thing that is important for testability is that we notice when we
    are under a test scenario and return a predictable result. The schedules
    themselves are not checked for this value, but at least one launchctl
    test checks that we do not unnecessarily reboot the schedule if it has
    not changed from a previous version.
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    272a52b View commit details
    Browse the repository at this point in the history
  2. maintenance: use random minute in launchctl scheduler

    The get_random_minute() method was created to allow maintenance
    schedules to be fixed to a random minute of the hour. This randomness is
    only intended to spread out the load from a number of clients, but each
    client should have an hour between each maintenance cycle.
    
    Use get_random_minute() when constructing the schedules for launchctl.
    
    The format already includes a 'Minute' key which is modified from 0 to
    the random minute.
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    0d7ca26 View commit details
    Browse the repository at this point in the history
  3. maintenance: use random minute in Windows scheduler

    The get_random_minute() method was created to allow maintenance
    schedules to be fixed to a random minute of the hour. This randomness is
    only intended to spread out the load from a number of clients, but each
    client should have an hour between each maintenance cycle.
    
    Add this random minute to the Windows scheduler integration.
    
    We need only to modify the minute value for the 'StartBoundary' tag
    across the three schedules.
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    71d7425 View commit details
    Browse the repository at this point in the history
  4. maintenance: use random minute in cron scheduler

    The get_random_minute() method was created to allow maintenance
    schedules to be fixed to a random minute of the hour. This randomness is
    only intended to spread out the load from a number of clients, but each
    client should have an hour between each maintenance cycle.
    
    Add this random minute to the cron integration.
    
    The cron schedule specification starts with a minute indicator, which
    was previously inserted as the "0" string but now takes the given minute
    as an integer parameter.
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    81104fa View commit details
    Browse the repository at this point in the history
  5. maintenance: swap method locations

    The systemd_timer_write_unit_templates() method writes a single template
    that is then used to start the hourly, daily, and weekly schedules with
    systemd.
    
    However, in order to schedule systemd maintenance on a given minute,
    these templates need to be replaced with specific schedules for each of
    these jobs.
    
    Before modifying the schedules, move the writing method above the
    systemd_timer_enable_unit() method, so we can write a specific schedule
    for each unit.
    
    The diff is computed smaller by showing systemd_timer_enable_unit() and
    systemd_timer_delete_units()  move instead of
    systemd_timer_write_unit_templates() and
    systemd_timer_delete_unit_templates().
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    16f5836 View commit details
    Browse the repository at this point in the history
  6. maintenance: use random minute in systemd scheduler

    The get_random_minute() method was created to allow maintenance
    schedules to be fixed to a random minute of the hour. This randomness is
    only intended to spread out the load from a number of clients, but each
    client should have an hour between each maintenance cycle.
    
    Add this random minute to the systemd integration.
    
    This integration is more complicated than similar changes for other
    schedulers because of a neat trick that systemd allows: templating.
    
    The previous implementation generated two template files with names
    of the form 'git-maintenance@.(timer|service)'. The '.timer' or
    '.service' indicates that this is a template that is picked up when we
    later specify '...@<schedule>.timer' or '...@<schedule>.service'. The
    '<schedule>' string is then used to insert into the template both the
    'OnCalendar' schedule setting and the '--schedule' parameter of the
    'git maintenance run' command.
    
    In order to set these schedules to a given minute, we can no longer use
    the 'hourly', 'daily', or 'weekly' strings for '<schedule>' and instead
    need to abandon the template model for the .timer files. We can still
    use templates for the .service files. For this reason, we split these
    writes into two methods.
    
    Modify the template with a custom schedule in the 'OnCalendar' setting.
    This schedule has some interesting differences from cron-like patterns,
    but is relatively easy to figure out from context. The one that might be
    confusing is that '*-*-*' is a date-based pattern, but this must be
    omitted when using 'Mon' to signal that we care about the day of the
    week. Monday is used since that matches the day used for the 'weekly'
    schedule used previously.
    
    Now that the timer files are not templates, we might want to abandon the
    '@' symbol in the file names. However, this would cause users with
    existing schedules to get two competing schedules due to different
    names. The work to remove the old schedule name is one thing that we can
    avoid by keeping the '@' symbol in our unit names. Since we are locked
    into this name, it makes sense that we keep the template model for the
    .service files.
    
    The rest of the change involves making sure we are writing these .timer
    and .service files before initializing the schedule with 'systemctl' and
    deleting the files when we are done. Some changes are also made to share
    the random minute along with a single computation of the execution path
    of the current Git executable.
    
    In addition, older Git versions may have written a
    'git-maintenance@.timer' template file. Be sure to remove this when
    successfully enabling maintenance (or disabling maintenance).
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    77529c4 View commit details
    Browse the repository at this point in the history
  7. maintenance: fix systemd schedule overlaps

    The 'git maintenance run' command prevents concurrent runs in the same
    repository using a 'maintenance.lock' file. However, when using systemd
    the hourly maintenance runs the same time as the daily and weekly runs.
    (Similarly, daily maintenance runs at the same time as weekly
    maintenance.) These competing commands result in some maintenance not
    actually being run.
    
    This overlap was something we could not fix until we made the recent
    change to not use the builting 'hourly', 'daily', and 'weekly' schedules
    in systemd. We can adjust the schedules such that:
    
     1. Hourly runs avoid the 0th hour.
     2. Daily runs avoid Monday.
    
    This will keep maintenance runs from colliding when using systemd.
    
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    f51b849 View commit details
    Browse the repository at this point in the history
  8. maintenance: update schedule before config

    When running 'git maintenance start', the current pattern is to
    configure global config settings to enable maintenance on the current
    repository and set 'maintenance.auto' to false and _then_ to set up the
    schedule with the system scheduler.
    
    This has a problematic error condition: if the scheduler fails to
    initialize, the repository still will not use automatic maintenance due
    to the 'maintenance.auto' setting.
    
    Fix this gap by swapping the order of operations. If Git fails to
    initialize maintenance, then the config changes should never happen.
    
    Reported-by: Phillip Wood <phillip.wood123@gmail.com>
    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    derrickstolee committed Aug 11, 2023
    Configuration menu
    Copy the full SHA
    c5cd555 View commit details
    Browse the repository at this point in the history