-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maintenance: schedule maintenance on a random minute #1567
maintenance: schedule maintenance on a random minute #1567
Conversation
2fa1a2b
to
14e340b
Compare
/submit |
Submitted as pull.1567.git.1691434300.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -1708,6 +1708,23 @@ static int get_schedule_cmd(const char **cmd, int *is_available) | |||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Mon, Aug 07, 2023 at 06:51:35PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/gc.c b/builtin/gc.c
> index f3942188a61..66a972bc292 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -1708,6 +1708,23 @@ static int get_schedule_cmd(const char **cmd, int *is_available)
> return 1;
> }
>
> +MAYBE_UNUSED
> +static int get_random_minute(void)
> +{
> + static int random_initialized = 0;
> +
> + /* Use a static value when under tests. */
> + if (!getenv("GIT_TEST_MAINTENANCE_SCHEDULER"))
> + return 13;
> +
> + if (!random_initialized) {
> + srand((unsigned int)getpid());
> + random_initialized = 1;
> + }
I was wondering where else we call srand() within Git, and it looks like
the only other spot is in `lock_file_timeout()`.
I doubt it, but is there a chance that that code depends on only calling
srand() once? I think the answer is "no", since we only use rand()
within that function to generate a random-ish backoff period, so I think
the repeatability of it doesn't matter all that much.
So I think this is kind of outside the scope of your series, but I
wonder if we should have a git_rand() that automatically initializes the
PRNG with the value of getpid()? Then multiple callers can grab random
values at will without reinitializing the PRNG.
Thanks,
Taylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Taylor Blau <me@ttaylorr.com> writes:
> I was wondering where else we call srand() within Git, and it looks like
> the only other spot is in `lock_file_timeout()`.
lock_file_timeout() should be updated to match git_mkstemps_mode(),
which was taught to use the csprng_bytes() function with 47efda96
(wrapper: use a CSPRNG to generate random file names, 2022-01-17),
and this new caller may want to do so as well, perhaps? I dunno,
but the caller then does not have to worry about "initializing it
just once".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Junio C Hamano <gitster@pobox.com> writes:
> Taylor Blau <me@ttaylorr.com> writes:
>
>> I was wondering where else we call srand() within Git, and it looks like
>> the only other spot is in `lock_file_timeout()`.
>
> lock_file_timeout() should be updated to match git_mkstemps_mode(),
> which was taught to use the csprng_bytes() function with 47efda96
> (wrapper: use a CSPRNG to generate random file names, 2022-01-17),
> and this new caller may want to do so as well, perhaps? I dunno,
> but the caller then does not have to worry about "initializing it
> just once".
Of course, the obvious downside is that crypto-secure one may be,
unlike for its use in mkstemps(), way overkill for lockfiles and
cron dispersion purposes, as these codepaths are not on the target
surface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Mon, Aug 07, 2023 at 05:22:49PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> >> I was wondering where else we call srand() within Git, and it looks like
> >> the only other spot is in `lock_file_timeout()`.
> >
> > lock_file_timeout() should be updated to match git_mkstemps_mode(),
> > which was taught to use the csprng_bytes() function with 47efda96
> > (wrapper: use a CSPRNG to generate random file names, 2022-01-17),
> > and this new caller may want to do so as well, perhaps? I dunno,
> > but the caller then does not have to worry about "initializing it
> > just once".
>
> Of course, the obvious downside is that crypto-secure one may be,
> unlike for its use in mkstemps(), way overkill for lockfiles and
> cron dispersion purposes, as these codepaths are not on the target
> surface.
I think that's an acceptable price to pay here, since we can drop the
code to remember whether or not srand() has been called or not. Here's a
patch that we could take in that direction:
--- 8< ---
Subject: [PATCH] lockfile.c: use a CSPRNG to generate backoff milliseconds
Upon failing to acquire the lockfile, `lock_file_timeout()` will try
again with an exponential backoff. This backoff includes some noise as a
multiplier over the default backoff behavior ranging from [0.75, 1.25].
It generates this noise via rand(3). Using a non-cryptographic source of
randomness here is OK, since a more trivial attack vector (holding the
file open via an external process for longer than the value of
`timeout_ms`) is easier to exploit.
That all said, `lock_file_timeout()` initializes the PRNG with
`srand()`. This has a couple of downsides:
- lock_file_timeout() needs to remember whether or not the PRNG has
or hasn't been seeded.
- If a different function also calls `srand()`, the PRNG may start
generating repeated values (if that caller also initialized the PRNG
with `getpid()`).
Let's avoid both of these by using `csprng_bytes()`, in a similar spirit
as 47efda967c (wrapper: use a CSPRNG to generate random file names,
2022-01-17).
Using a CSPRNG is definitely overkill for noising a backoff window, but
it avoids the concerns about calling `srand()`, so let's use it here,
too.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
lockfile.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/lockfile.c b/lockfile.c
index 1d5ed01682..6587d407f4 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -107,22 +107,17 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
int n = 1;
int multiplier = 1;
long remaining_ms = 0;
- static int random_initialized = 0;
if (timeout_ms == 0)
return lock_file(lk, path, flags, mode);
- if (!random_initialized) {
- srand((unsigned int)getpid());
- random_initialized = 1;
- }
-
if (timeout_ms > 0)
remaining_ms = timeout_ms;
while (1) {
long backoff_ms, wait_ms;
int fd;
+ uint64_t rand;
fd = lock_file(lk, path, flags, mode);
@@ -135,7 +130,10 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
backoff_ms = multiplier * INITIAL_BACKOFF_MS;
/* back off for between 0.75*backoff_ms and 1.25*backoff_ms */
- wait_ms = (750 + rand() % 500) * backoff_ms / 1000;
+ if (csprng_bytes(&rand, sizeof(uint64_t)) < 0)
+ return error_errno(_("unable to get random bytes for"
+ "lockfile backoff"));
+ wait_ms = (750 + rand % 500) * backoff_ms / 1000;
sleep_millisec(wait_ms);
remaining_ms -= wait_ms;
--
2.42.0.rc0.26.g802d811bac.dirty
--- >8 ---
Thanks,
Taylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Taylor Blau <me@ttaylorr.com> writes:
> I think that's an acceptable price to pay here, since we can drop the
> code to remember whether or not srand() has been called or not. Here's a
> patch that we could take in that direction:
> ...
> - If a different function also calls `srand()`, the PRNG may start
> generating repeated values (if that caller also initialized the PRNG
> with `getpid()`).
Hmph, I didn't think about that one. I do not think it can be a
viable attack vector to attack _this_ code, but the other function
might be. But if the other function is worth attacking and
attackable, it ought to be using crypto-secure one and not srand(),
so this argument may not give us a good justification X-<.
Provided if code simplification is a good enough rationale, the
patch looks sensible, but I find its use of u64 a bit questionable
(though not wrong). I would have expected that the type of "rand"
would be the same as backoff_ms and wait_ms, two variables involved
in the same expression.
Thanks.
> diff --git a/lockfile.c b/lockfile.c
> index 1d5ed01682..6587d407f4 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -107,22 +107,17 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
> int n = 1;
> int multiplier = 1;
> long remaining_ms = 0;
> - static int random_initialized = 0;
>
> if (timeout_ms == 0)
> return lock_file(lk, path, flags, mode);
>
> - if (!random_initialized) {
> - srand((unsigned int)getpid());
> - random_initialized = 1;
> - }
> -
> if (timeout_ms > 0)
> remaining_ms = timeout_ms;
>
> while (1) {
> long backoff_ms, wait_ms;
> int fd;
> + uint64_t rand;
>
> fd = lock_file(lk, path, flags, mode);
>
> @@ -135,7 +130,10 @@ static int lock_file_timeout(struct lock_file *lk, const char *path,
>
> backoff_ms = multiplier * INITIAL_BACKOFF_MS;
> /* back off for between 0.75*backoff_ms and 1.25*backoff_ms */
> - wait_ms = (750 + rand() % 500) * backoff_ms / 1000;
> + if (csprng_bytes(&rand, sizeof(uint64_t)) < 0)
> + return error_errno(_("unable to get random bytes for"
> + "lockfile backoff"));
> + wait_ms = (750 + rand % 500) * backoff_ms / 1000;
> sleep_millisec(wait_ms);
> remaining_ms -= wait_ms;
>
> --
> 2.42.0.rc0.26.g802d811bac.dirty
>
> --- >8 ---
>
> Thanks,
> Taylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Tue, Aug 08, 2023 at 09:49:30AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Provided if code simplification is a good enough rationale, the
> > patch looks sensible, but I find its use of u64 a bit questionable
> > (though not wrong). I would have expected that the type of "rand"
> > would be the same as backoff_ms and wait_ms, two variables involved
> > in the same expression.
>
> Ah, not so fast. The use of modulo means we need to be careful
> about about the fuzzing factor going negative, and use of unsigned
> type allows us to forget about it.
>
> (fuzz % 250), when fuzz is of a signed random integral type, ranges
> between -250 and +250 and because we want the center of our
> distribution at 1000, so I think the following is equivalent.
[...]
> - wait_ms = (750 + rand % 500) * backoff_ms / 1000;
> + wait_ms = (1000 + fuzz % 250) * backoff_ms / 1000;
I was going to say that having rand be an unsigned type was the right
behavior, but with this change it can (and should) be signed. Thanks for
the simplification :-).
Thanks,
Taylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Tue, Aug 08, 2023 at 01:28:50PM -0400, Derrick Stolee wrote:
> > So I think this is kind of outside the scope of your series, but I
> > wonder if we should have a git_rand() that automatically initializes the
> > PRNG with the value of getpid()? Then multiple callers can grab random
> > values at will without reinitializing the PRNG.
>
> I see you're moving ahead with removing the srand() from the lockfile code,
> so I'll focus on creating a `git_rand()` that centralizes the use of
> srand(), but won't touch the code in the lockfile so your patch applies
> independently.
That thread may have progressed a little since you last looked at it.
Instead of using srand() and rand() (which would make sense to wrap with
git_rand() as you propose), we can simplify our lives by using a CSPRNG,
which only gets initialized once, as is already the case with
csprng_bytes().
I think Junio is picking up a lightly modified version of my patch
there, see [1].
Thanks,
Taylor
[1]: https://lore.kernel.org/git/ZNKfKs1mLQhnybvF@nand.local/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/8/2023 4:04 PM, Taylor Blau wrote:
> On Tue, Aug 08, 2023 at 01:28:50PM -0400, Derrick Stolee wrote:
>> I see you're moving ahead with removing the srand() from the lockfile code,
> That thread may have progressed a little since you last looked at it.
I think this part of my summary is still correct.
>> so I'll focus on creating a `git_rand()` that centralizes the use of
>> srand(), but won't touch the code in the lockfile so your patch applies
>> independently.
> Instead of using srand() and rand() (which would make sense to wrap with
> git_rand() as you propose), we can simplify our lives by using a CSPRNG,
> which only gets initialized once, as is already the case with
> csprng_bytes().
So the idea is to use csprng_bytes() everywhere instead of srand()/rand().
I can adjust my local patch to still create git_rand(), but base it on
csprng_bytes() and not collide with your patch. Mimicking rand()'s behavior
is a simpler interface to consume.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Derrick Stolee <derrickstolee@github.com> writes:
>> Instead of using srand() and rand() (which would make sense to wrap with
>> git_rand() as you propose), we can simplify our lives by using a CSPRNG,
>> which only gets initialized once, as is already the case with
>> csprng_bytes().
>
> So the idea is to use csprng_bytes() everywhere instead of srand()/rand().
>
> I can adjust my local patch to still create git_rand(), but base it on
> csprng_bytes() and not collide with your patch. Mimicking rand()'s behavior
> is a simpler interface to consume.
I am still ambivalent about wasting entropy for something that
srand() would suffice, so git_rand() interface may be an welcome
addition anyway, that serves an extra layer of indirection to
insulate the callers from the implementation.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Wed, Aug 09, 2023 at 11:50:38AM -0700, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
>
> >> Instead of using srand() and rand() (which would make sense to wrap with
> >> git_rand() as you propose), we can simplify our lives by using a CSPRNG,
> >> which only gets initialized once, as is already the case with
> >> csprng_bytes().
> >
> > So the idea is to use csprng_bytes() everywhere instead of srand()/rand().
> >
> > I can adjust my local patch to still create git_rand(), but base it on
> > csprng_bytes() and not collide with your patch. Mimicking rand()'s behavior
> > is a simpler interface to consume.
>
> I am still ambivalent about wasting entropy for something that
> srand() would suffice, so git_rand() interface may be an welcome
> addition anyway, that serves an extra layer of indirection to
> insulate the callers from the implementation.
Sounds good to me, I'm not particularly attached to one implementation
over another. Thanks, both.
Thanks,
Taylor
User |
@@ -1708,6 +1708,22 @@ static int get_schedule_cmd(const char **cmd, int *is_available) | |||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Mon, Aug 07, 2023 at 06:51:36PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> The get_random_minute() method was created to allow maintenance
> schedules to be fixed to a random minute of the hour. This randomness is
> only intended to spread out the load from a number of clients, but each
> client should have an hour between each maintenance cycle.
>
> Use get_random_minute() when constructing the schedules for launchctl.
>
> The format already includes a 'Minute' key which is modified from 0 to
> the random minute.
All looks good. I was wondering if this is something that we'd want to
allow users to opt out of, but I think that the built-in schedules are
sufficiently vague that we can afford to have a fudge factor on the
exact minute.
Thanks,
Taylor
@@ -2276,70 +2299,6 @@ static char *xdg_config_home_systemd(const char *filename) | |||
return xdg_config_home_for("systemd/user", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Mon, Aug 07, 2023 at 06:51:39PM +0000, Derrick Stolee via GitGitGadget wrote:
> The diff is computed smaller by showing systemd_timer_enable_unit() move
> instead of systemd_timer_write_unit_templates().
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> builtin/gc.c | 128 +++++++++++++++++++++++++--------------------------
> 1 file changed, 64 insertions(+), 64 deletions(-)
This looks like a straightforward code movement, and diffing with
`--color-moved` verifies it as such. LGTM.
Thanks,
Taylor
@@ -2276,77 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename) | |||
return xdg_config_home_for("systemd/user", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Mon, Aug 07, 2023 at 06:51:40PM +0000, Derrick Stolee via GitGitGadget wrote:
> In order to set these schedules to a given minute, we can no longer use
> the 'hourly', 'daily', or 'weekly' strings for '<schedule>' and instead
> need to abandon the template model.
Makes sense.
> Modify the template with a custom schedule in the 'OnCalendar' setting.
> This schedule has some interesting differences from cron-like patterns,
> but is relatively easy to figure out from context. The one that might be
> confusing is that '*-*-*' is a date-based pattern, but this must be
> omitted when using 'Mon' to signal that we care about the day of the
> week. Monday is used since that matches the day used for the 'weekly'
> schedule used previously.
I think the launchd version (which uses "0" for the day of the week)
runs on Sunday, if I remember correctly. I don't think that these two
necessarily need to run on the same day of the week when configured to
run weekly.
But I figured I'd raise the question in case you did mean for them to
both run on either Sunday or Monday.
> The rest of the change involves making sure we are writing these .timer
> and .service files before initializing the schedule with 'systemctl' and
> deleting the files when we are done. Some changes are also made to share
> the random minute along with a single computation of the execution path
> of the current Git executable.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> builtin/gc.c | 82 ++++++++++++++++++++++++++++++++----------
> t/t7900-maintenance.sh | 4 ++-
> 2 files changed, 66 insertions(+), 20 deletions(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index b3ef95b10aa..5f5bb95641f 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -2299,13 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename)
> return xdg_config_home_for("systemd/user", filename);
> }
>
> -static int systemd_timer_write_unit_templates(const char *exec_path)
> +static int systemd_timer_write_unit_template(enum schedule_priority schedule,
> + const char *exec_path,
> + int minute)
> {
> char *filename;
> FILE *file;
> const char *unit;
> + char *schedule_pattern = NULL;
You should be able to drop the NULL initialization, since you assign
this value unconditionally in the switch statement below (or BUG() on an
unknown schedule type).
> + const char *frequency = get_frequency(schedule);
> + char *local_timer_name = xstrfmt("git-maintenance@%s.timer", frequency);
> + char *local_service_name = xstrfmt("git-maintenance@%s.service", frequency);
> +
> + filename = xdg_config_home_systemd(local_timer_name);
>
> - filename = xdg_config_home_systemd("git-maintenance@.timer");
> if (safe_create_leading_directories(filename)) {
> error(_("failed to create directories for '%s'"), filename);
> goto error;
> @@ -2314,6 +2321,23 @@ static int systemd_timer_write_unit_templates(const char *exec_path)
> if (!file)
> goto error;
>
> + switch (schedule) {
> + case SCHEDULE_HOURLY:
> + schedule_pattern = xstrfmt("*-*-* *:%02d:00", minute);
> + break;
> +
> + case SCHEDULE_DAILY:
> + schedule_pattern = xstrfmt("*-*-* 0:%02d:00", minute);
> + break;
> +
> + case SCHEDULE_WEEKLY:
> + schedule_pattern = xstrfmt("Mon 0:%02d:00", minute);
> + break;
> +
> + default:
> + BUG("Unhandled schedule_priority");
> + }
> +
> unit = "# This file was created and is maintained by Git.\n"
> "# Any edits made in this file might be replaced in the future\n"
> "# by a Git command.\n"
> @@ -2322,12 +2346,12 @@ static int systemd_timer_write_unit_templates(const char *exec_path)
> "Description=Optimize Git repositories data\n"
> "\n"
> "[Timer]\n"
> - "OnCalendar=%i\n"
> + "OnCalendar=%s\n"
> "Persistent=true\n"
> "\n"
> "[Install]\n"
> "WantedBy=timers.target\n";
> - if (fputs(unit, file) == EOF) {
> + if (fprintf(file, unit, schedule_pattern) < 0) {
OK, this is the templating part that you were mentioning earlier. I was
wondering what we were doing fputs()-ing a string with "%i" in it
without a formatting value to fill it in with. But that "%i" pertains to
systemd's instance value, IIUC.
The rest all looks good, thanks.
Thanks,
Taylor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/7/2023 5:31 PM, Taylor Blau wrote:
> On Mon, Aug 07, 2023 at 06:51:40PM +0000, Derrick Stolee via GitGitGadget wrote:
>> Modify the template with a custom schedule in the 'OnCalendar' setting.
>> This schedule has some interesting differences from cron-like patterns,
>> but is relatively easy to figure out from context. The one that might be
>> confusing is that '*-*-*' is a date-based pattern, but this must be
>> omitted when using 'Mon' to signal that we care about the day of the
>> week. Monday is used since that matches the day used for the 'weekly'
>> schedule used previously.
>
> I think the launchd version (which uses "0" for the day of the week)
> runs on Sunday, if I remember correctly. I don't think that these two
> necessarily need to run on the same day of the week when configured to
> run weekly.
>
> But I figured I'd raise the question in case you did mean for them to
> both run on either Sunday or Monday.
I don't intend to change the day that is run as part of this change.
I think all other schedulers run on Sunday, but systemd running on Monday
is a particular detail of its "weekly" schedule.
>> -static int systemd_timer_write_unit_templates(const char *exec_path)
>> +static int systemd_timer_write_unit_template(enum schedule_priority schedule,
>> + const char *exec_path,
>> + int minute)
>> {
>> char *filename;
>> FILE *file;
>> const char *unit;
>> + char *schedule_pattern = NULL;
>
> You should be able to drop the NULL initialization, since you assign
> this value unconditionally in the switch statement below (or BUG() on an
> unknown schedule type).
Unfortunately, GCC complained about a possibly-unassigned value when I
left this unset during development, so this actually is necessary for
that compiler.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Tue, Aug 08, 2023 at 09:49:40AM -0400, Derrick Stolee wrote:
> > But I figured I'd raise the question in case you did mean for them to
> > both run on either Sunday or Monday.
>
> I don't intend to change the day that is run as part of this change.
>
> I think all other schedulers run on Sunday, but systemd running on Monday
> is a particular detail of its "weekly" schedule.
No problem -- I figured that you didn't intend on changing the days
around here, just wanted to make sure it was known that the systemd
scheduler picks a different day of the week for its weekly schedule than
the others do.
> >> -static int systemd_timer_write_unit_templates(const char *exec_path)
> >> +static int systemd_timer_write_unit_template(enum schedule_priority schedule,
> >> + const char *exec_path,
> >> + int minute)
> >> {
> >> char *filename;
> >> FILE *file;
> >> const char *unit;
> >> + char *schedule_pattern = NULL;
> >
> > You should be able to drop the NULL initialization, since you assign
> > this value unconditionally in the switch statement below (or BUG() on an
> > unknown schedule type).
>
> Unfortunately, GCC complained about a possibly-unassigned value when I
> left this unset during development, so this actually is necessary for
> that compiler.
Ah, I would have thought that GCC would be smart enough to figure out
that schedule_pattern is unconditionally initialized via the switch
statement, but I guess not. Makes sense.
Thanks,
Taylor
This branch is now known as |
This patch series was integrated into seen via git@876d065. |
@@ -2276,77 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename) | |||
return xdg_config_home_for("systemd/user", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/8/2023 5:53 AM, Phillip Wood wrote:> Hi Stolee
>
> On 07/08/2023 19:51, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>> + switch (schedule) {
>> + case SCHEDULE_HOURLY:
>> + schedule_pattern = xstrfmt("*-*-* *:%02d:00", minute);
>> + break;
>> +
>> + case SCHEDULE_DAILY:
>> + schedule_pattern = xstrfmt("*-*-* 0:%02d:00", minute);
>> + break;
>> +
>> + case SCHEDULE_WEEKLY:
>> + schedule_pattern = xstrfmt("Mon 0:%02d:00", minute);
>> + break;
>
> This is not a new issue with this patch but we run the hourly job even
> when we want to run the daily job or the weekly job and we run the daily
> job when we want to run the weekly job.
This is an excellent point! Thanks for bringing it up.
> So only one of these jobs will succeed. The cron entries are careful to
> only run one job at a time, I think it would be worth doing the same
> thing here. I think the using the following format strings would fix this.
>
> Hourly: "Tue..Sun *-*-* 1..23:00:%02d"
> Daily: "Tue..Sun *-*-* 00:00:%02d"
> Weekly: "Mon *-*-* 00:00:%02d"
I would modify this with dropping the "Tue..Sun" from the hourly schedule,
as we still want 23 runs on Mondays.
> It looks like the launchctl schedule has the same issue.
I will take a look at this and consider some additional patches to correct
these issues across both schedulers. Thank you for the attention to detail!
> One thing I've been wondering about which is related to maintenance but
> totally off-topic for this patch is that I think when auto maintenance
> is enabled we stop automatically running "gc" so how do the reflogs get
> expired?
The default maintenance schedule does not include a 'gc' run as it does
not intend to remove any data. Reflog expiration could be considered as a
separate maintenance task that might be useful in the default maintenance
schedule.
Thanks,
-Stolee
@@ -2276,77 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename) | |||
return xdg_config_home_for("systemd/user", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Stolee
On 07/08/2023 19:51, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> + switch (schedule) {
> + case SCHEDULE_HOURLY:
> + schedule_pattern = xstrfmt("*-*-* *:%02d:00", minute);
> + break;
> +
> + case SCHEDULE_DAILY:
> + schedule_pattern = xstrfmt("*-*-* 0:%02d:00", minute);
> + break;
> +
> + case SCHEDULE_WEEKLY:
> + schedule_pattern = xstrfmt("Mon 0:%02d:00", minute);
> + break;
This is not a new issue with this patch but we run the hourly job even when we want to run the daily job or the weekly job and we run the daily job when we want to run the weekly job. maintenance_run_tasks() contains
if (hold_lock_file_for_update(&lk, lock_path, LOCK_NO_DEREF) < 0) {
/*
* Another maintenance command is running.
*
* If --auto was provided, then it is likely due to a
* recursive process stack. Do not report an error in
* that case.
*/
if (!opts->auto_flag && !opts->quiet)
warning(_("lock file '%s' exists, skipping maintenance"),
lock_path);
free(lock_path);
return 0;
}
So only one of these jobs will succeed. The cron entries are careful to only run one job at a time, I think it would be worth doing the same thing here. I think the using the following format strings would fix this.
Hourly: "Tue..Sun *-*-* 1..23:00:%02d"
Daily: "Tue..Sun *-*-* 00:00:%02d"
Weekly: "Mon *-*-* 00:00:%02d"
It looks like the launchctl schedule has the same issue.
One thing I've been wondering about which is related to maintenance but totally off-topic for this patch is that I think when auto maintenance is enabled we stop automatically running "gc" so how do the reflogs get expired?
Best Wishes
Phillip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 08/08/2023 10:53, Phillip Wood wrote:
> Hourly: "Tue..Sun *-*-* 1..23:00:%02d"
> Daily: "Tue..Sun *-*-* 00:00:%02d"
> Weekly: "Mon *-*-* 00:00:%02d"
Thinking about it some more, this only fixes the problem if the computer is actually on at midnight. If the computer is switched off overnight then we still try to start three maintenance jobs at the same time when the user turns their computer on on Tuesday morning. We could stop marking the hourly job as persistent on the assumption that it will be run soon anyway but that does not solve the problem of the daily and weekly jobs running concurrently on a Tuesday morning.
Best Wishes
Phillip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/8/2023 9:56 AM, Derrick Stolee wrote:
> On 8/8/2023 5:53 AM, Phillip Wood wrote:> Hi Stolee
>> So only one of these jobs will succeed. The cron entries are careful to
>> only run one job at a time, I think it would be worth doing the same
>> thing here. I think the using the following format strings would fix this.
>>
>> Hourly: "Tue..Sun *-*-* 1..23:00:%02d"
>> Daily: "Tue..Sun *-*-* 00:00:%02d"
>> Weekly: "Mon *-*-* 00:00:%02d"
>
> I would modify this with dropping the "Tue..Sun" from the hourly schedule,
> as we still want 23 runs on Mondays.
>
>> It looks like the launchctl schedule has the same issue.
>
> I will take a look at this and consider some additional patches to correct
> these issues across both schedulers. Thank you for the attention to detail!
Taking a look, it seems that launchctl does not have this same problem.
The schedule is set via an <array> of <dict>s as follows:
case SCHEDULE_HOURLY:
repeat = "<dict>\n"
"<key>Hour</key><integer>%d</integer>\n"
"<key>Minute</key><integer>%d</integer>\n"
"</dict>\n";
for (i = 1; i <= 23; i++)
strbuf_addf(&plist, repeat, i, minute);
break;
case SCHEDULE_DAILY:
repeat = "<dict>\n"
"<key>Day</key><integer>%d</integer>\n"
"<key>Hour</key><integer>0</integer>\n"
"<key>Minute</key><integer>%d</integer>\n"
"</dict>\n";
for (i = 1; i <= 6; i++)
strbuf_addf(&plist, repeat, i, minute);
break;
This means that we create an hourly schedule for each hour 1..23
(no 0th hour means no collision with daily or weekly schedule) and
a daily schedule for each day 1..6 (no 0th day means no collision
with the weekly schedule).
Does this match your understanding?
The overlap _definitely_ exists in systemd, which I will fix.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 08/08/2023 18:24, Derrick Stolee wrote:
> On 8/8/2023 9:56 AM, Derrick Stolee wrote:
>> On 8/8/2023 5:53 AM, Phillip Wood wrote:> Hi Stolee
> >>> So only one of these jobs will succeed. The cron entries are careful to
>>> only run one job at a time, I think it would be worth doing the same
>>> thing here. I think the using the following format strings would fix this.
>>>
>>> Hourly: "Tue..Sun *-*-* 1..23:00:%02d"
>>> Daily: "Tue..Sun *-*-* 00:00:%02d"
>>> Weekly: "Mon *-*-* 00:00:%02d"
>>
>> I would modify this with dropping the "Tue..Sun" from the hourly schedule,
>> as we still want 23 runs on Mondays.
Oops, well spotted
>>> It looks like the launchctl schedule has the same issue.
>>
>> I will take a look at this and consider some additional patches to correct
>> these issues across both schedulers. Thank you for the attention to detail!
> > Taking a look, it seems that launchctl does not have this same problem.
> > The schedule is set via an <array> of <dict>s as follows:
> > case SCHEDULE_HOURLY:
> repeat = "<dict>\n"
> "<key>Hour</key><integer>%d</integer>\n"
> "<key>Minute</key><integer>%d</integer>\n"
> "</dict>\n";
> for (i = 1; i <= 23; i++)
> strbuf_addf(&plist, repeat, i, minute);
> break;
> > case SCHEDULE_DAILY:
> repeat = "<dict>\n"
> "<key>Day</key><integer>%d</integer>\n"
> "<key>Hour</key><integer>0</integer>\n"
> "<key>Minute</key><integer>%d</integer>\n"
> "</dict>\n";
> for (i = 1; i <= 6; i++)
> strbuf_addf(&plist, repeat, i, minute);
> break;
> > This means that we create an hourly schedule for each hour 1..23
> (no 0th hour means no collision with daily or weekly schedule) and
> a daily schedule for each day 1..6 (no 0th day means no collision
> with the weekly schedule).
> > Does this match your understanding?
Yes, having read it again - sorry I'd misunderstood it yesterday.
> The overlap _definitely_ exists in systemd, which I will fix.
That's great
Best Wishes
Phillip
User |
@@ -2276,77 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename) | |||
return xdg_config_home_for("systemd/user", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 07/08/2023 19:51, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> > The get_random_minute() method was created to allow maintenance
> schedules to be fixed to a random minute of the hour. This randomness is
> only intended to spread out the load from a number of clients, but each
> client should have an hour between each maintenance cycle.
> > Add this random minute to the systemd integration.
I think it makes sense to keep the random minute implementation the same across all the schedulers, but we could use RandomizedDelaySec (possibly combined with FixedRandomDelay) to randomize when the job is actually run.
> This integration is more complicated than similar changes for other
> schedulers because of a neat trick that systemd allows: templating.
> > The previous implementation generated two template files with names
> of the form 'git-maintenance@.(timer|service)'. The '.timer' or
> '.service' indicates that this is a template that is picked up when we
> later specify '...@<schedule>.timer' or '...@<schedule>.service'. The
> '<schedule>' string is then used to insert into the template both the
> 'OnCalendar' schedule setting and the '--schedule' parameter of the
> 'git maintenance run' command.
> > In order to set these schedules to a given minute, we can no longer use
> the 'hourly', 'daily', or 'weekly' strings for '<schedule>' and instead
> need to abandon the template model.
I've left some comments about this below.
> @@ -2299,13 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename)
> return xdg_config_home_for("systemd/user", filename);
> }
> > -static int systemd_timer_write_unit_templates(const char *exec_path)
> +static int systemd_timer_write_unit_template(enum schedule_priority schedule,
As we're not writing template files any more I think we should rename this to systemd_timer_write_unit_file()
> + const char *exec_path,
> + int minute)
> {
> char *filename;
> FILE *file;
> const char *unit;
> + char *schedule_pattern = NULL;
> + const char *frequency = get_frequency(schedule);
> + char *local_timer_name = xstrfmt("git-maintenance@%s.timer", frequency);
The "@" in the name signifies that it is a template unit which it isn't anymore so I think we want to change this to "git-maintenance-%s.timer"
> + char *local_service_name = xstrfmt("git-maintenance@%s.service", frequency);
Same change to the name here. I wonder if we could still use a template service file but that would complicate the implementation as we'd need to write three timer files but only one service file.
> [...]
> @@ -2375,13 +2399,16 @@ static int systemd_timer_write_unit_templates(const char *exec_path)
> return 0;
> > error:
> + free(schedule_pattern);
> + free(local_timer_name);
> free(filename);
> - systemd_timer_delete_unit_templates();
This looks like a change in behavior as previously we'd remove any files if there was an error rather than leaving behind a timer file without a corresponding unit file.
Looking at maintenance_start() we call maintenance_register() which disables "gc --auto" before we get to this point so if we fail to write the files we'll end up disabling any form of gc in the repository.
> [...]
> -static int systemd_timer_delete_unit_templates(void)
> +static int systemd_timer_delete_unit_template(enum schedule_priority priority)
Same suggestion as above to rename this to ..._unit_file()
> {
> + const char *frequency = get_frequency(priority);
> + char *local_timer_name = xstrfmt("git-maintenance@%s.timer", frequency);
> + char *local_service_name = xstrfmt("git-maintenance@%s.service", frequency);
I'm not sure it is worth it but we could perhaps
#define SYSTEMD_UNIT_FORMAT "git-maintenance-%s.%s"
above and then these lines and the ones in systemd_timer_write_unit_file() would become
char *local_timer_name = xstrfmt(SYSTEMD_UNIT_FORMAT, frequency, "timer");
char *local_service = xstrfmt(SYSTEMD_UNIT_FORMAT, frequency, "service");
> [...]
> +static int systemd_timer_delete_unit_templates(void)
Naming again.
Best Wishes
Phillip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/8/2023 8:08 AM, Phillip Wood wrote:
> On 07/08/2023 19:51, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>
>>
>> The get_random_minute() method was created to allow maintenance
>> schedules to be fixed to a random minute of the hour. This randomness is
>> only intended to spread out the load from a number of clients, but each
>> client should have an hour between each maintenance cycle.
>>
>> Add this random minute to the systemd integration.
>
> I think it makes sense to keep the random minute implementation the same across all the schedulers, but we could use RandomizedDelaySec (possibly combined with FixedRandomDelay) to randomize when the job is actually run.
That's an interesting suggestion, but I also think it is valuable to
have consistent gaps between maintenance activities on the client, but
RandomizedDelaySec would present the possibility of 60-3540 seconds between
"hourly" maintenance runs (if I understand the option correctly).
>> @@ -2299,13 +2299,20 @@ static char *xdg_config_home_systemd(const char *filename)
>> return xdg_config_home_for("systemd/user", filename);
>> }
>> -static int systemd_timer_write_unit_templates(const char *exec_path)
>> +static int systemd_timer_write_unit_template(enum schedule_priority schedule,
>
> As we're not writing template files any more I think we should rename this to systemd_timer_write_unit_file()
Good point. I have adjusted all the names in my next version.
>> + const char *exec_path,
>> + int minute)
>> {
>> char *filename;
>> FILE *file;
>> const char *unit;
>> + char *schedule_pattern = NULL;
>> + const char *frequency = get_frequency(schedule);
>> + char *local_timer_name = xstrfmt("git-maintenance@%s.timer", frequency);
>
> The "@" in the name signifies that it is a template unit which it isn't anymore so I think we want to change this to "git-maintenance-%s.timer"
I'll also take your SYSTEMD_UNIT_FORMAT macro suggestion to simplify things.
>> + char *local_service_name = xstrfmt("git-maintenance@%s.service", frequency);
>
> Same change to the name here. I wonder if we could still use a template service file but that would complicate the implementation as we'd need to write three timer files but only one service file.
Since we execute systemctl only passing the .timer filename, I think we'd
need to keep the '@' symbol in the name if we wanted to use .schedule
templates. Best to keep things simple and abandon templates completely.
>> [...]
>> @@ -2375,13 +2399,16 @@ static int systemd_timer_write_unit_templates(const char *exec_path)
>> return 0;
>> error:
>> + free(schedule_pattern);
>> + free(local_timer_name);
>> free(filename);
>> - systemd_timer_delete_unit_templates();
>
> This looks like a change in behavior as previously we'd remove any files if there was an error rather than leaving behind a timer file without a corresponding unit file.
The callers take care of deleting the unit files, but there was one place
where a short-circuit could have avoided this deletion. I'll clean that up.
> Looking at maintenance_start() we call maintenance_register() which disables "gc --auto" before we get to this point so if we fail to write the files we'll end up disabling any form of gc in the repository.
Adding this to the list of cleanups at the end. Thanks.
I appreciate the careful review!
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 8/8/2023 1:06 PM, Derrick Stolee wrote:
> On 8/8/2023 8:08 AM, Phillip Wood wrote:
>> On 07/08/2023 19:51, Derrick Stolee via GitGitGadget wrote:
>>> + char *local_timer_name = xstrfmt("git-maintenance@%s.timer", frequency);
>>
>> The "@" in the name signifies that it is a template unit which it isn't anymore so I think we want to change this to "git-maintenance-%s.timer"
>
> I'll also take your SYSTEMD_UNIT_FORMAT macro suggestion to simplify things.
As I was checking things, it turns out that we _should_ keep the '@' symbol
if only to make sure that our new schedule overwrites the old schedule.
The alternative is that we manually try to delete the old schedule, but that
feels like an inefficient way to do it, leaving some cruft around long-term.
For completeness, here is what I did to check:
$ systemctl --user list-timers
NEXT LEFT LAST PASSED UNIT ACTIVATES
Tue 2023-08-08 13:13:00 EDT 6s left n/a n/a git-maintenance-hourly.timer git-maintenance-hourly.service
Tue 2023-08-08 13:50:00 EDT 37min left Tue 2023-08-08 12:50:10 EDT 22min ago git-maintenance@hourly.timer git-maintenance@hourly.service
Wed 2023-08-09 00:13:00 EDT 11h left n/a n/a git-maintenance-daily.timer git-maintenance-daily.service
Wed 2023-08-09 00:50:00 EDT 11h left Tue 2023-08-08 09:35:31 EDT 3h 37min ago git-maintenance@daily.timer git-maintenance@daily.service
Mon 2023-08-14 00:13:00 EDT 5 days left n/a n/a git-maintenance-weekly.timer git-maintenance-weekly.service
Mon 2023-08-14 00:50:00 EDT 5 days left Mon 2023-08-07 10:28:10 EDT 1 day 2h ago git-maintenance@weekly.timer git-maintenance@weekly.service
Do you have an alternative idea how to handle that?
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 08/08/2023 18:14, Derrick Stolee wrote:
> On 8/8/2023 1:06 PM, Derrick Stolee wrote:
>> On 8/8/2023 8:08 AM, Phillip Wood wrote:
>>> On 07/08/2023 19:51, Derrick Stolee via GitGitGadget wrote:
> >>>> + char *local_timer_name = xstrfmt("git-maintenance@%s.timer", frequency);
>>>
>>> The "@" in the name signifies that it is a template unit which it isn't anymore so I think we want to change this to "git-maintenance-%s.timer"
>>
>> I'll also take your SYSTEMD_UNIT_FORMAT macro suggestion to simplify things.
> > As I was checking things, it turns out that we _should_ keep the '@' symbol
> if only to make sure that our new schedule overwrites the old schedule.
Oh, so if the user already has scheduled maintenance set up then running "git maintenance start" adds a new set of timers. I'd not though about that.
> The alternative is that we manually try to delete the old schedule, but that
> feels like an inefficient way to do it, leaving some cruft around long-term.
This patch still changes the names of the files we write. Currently we write
$XDG_CONFIG_HOME/systemd/user/git-maintenance@.service
$XDG_CONFIG_HOME/systemd/user/git-maintenance@.timer
and this patch changes that to
$XDG_CONFIG_HOME/systemd/user/git-maintenance@hourly.service
$XDG_CONFIG_HOME/systemd/user/git-maintenance@daily.service
$XDG_CONFIG_HOME/systemd/user/git-maintenance@weekly.service
$XDG_CONFIG_HOME/systemd/user/git-maintenance@hourly.timer
$XDG_CONFIG_HOME/systemd/user/git-maintenance@daily.timer
$XDG_CONFIG_HOME/systemd/user/git-maintenance@weekly.timer
If the user has already enabled maintenance then
$XDG_CONFIG_HOME/systemd/user/timers.target.wants/git-maintenance@hourly.timer
$XDG_CONFIG_HOME/systemd/user/timers.target.wants/git-maintenance@daily.timer
$XDG_CONFIG_HOME/systemd/user/timers.target.wants/git-maintenance@weekly.timer
will exist and are all symlinks pointing to
$XDG_CONFIG_HOME/systemd/user/git-maintenance@.timer
After this patch if the user runs "git maintenance start" again then systemctl will update the symlinks tot point to the matching unit files rather than the old template file. That means the user will pick up the new schedule but we leave behind the original files that are unused.
> For completeness, here is what I did to check:
> > $ systemctl --user list-timers
> NEXT LEFT LAST PASSED UNIT ACTIVATES
> Tue 2023-08-08 13:13:00 EDT 6s left n/a n/a git-maintenance-hourly.timer git-maintenance-hourly.service
> Tue 2023-08-08 13:50:00 EDT 37min left Tue 2023-08-08 12:50:10 EDT 22min ago git-maintenance@hourly.timer git-maintenance@hourly.service
> Wed 2023-08-09 00:13:00 EDT 11h left n/a n/a git-maintenance-daily.timer git-maintenance-daily.service
> Wed 2023-08-09 00:50:00 EDT 11h left Tue 2023-08-08 09:35:31 EDT 3h 37min ago git-maintenance@daily.timer git-maintenance@daily.service
> Mon 2023-08-14 00:13:00 EDT 5 days left n/a n/a git-maintenance-weekly.timer git-maintenance-weekly.service
> Mon 2023-08-14 00:50:00 EDT 5 days left Mon 2023-08-07 10:28:10 EDT 1 day 2h ago git-maintenance@weekly.timer git-maintenance@weekly.service
> > Do you have an alternative idea how to handle that?
I think we should stick with the names as you have them. It might be worth keeping the service file as a template so we only write the new timer files and have a reason to use the "@" naming scheme. We could update systemd_timer_setup_units() to delete git-maintenance@.timer if we successfully enable the new timer unit files.
Sorry for the confusion, I should have thought about the user running "git maintenance start" for a second time.
Best Wishes
Phillip
14e340b
to
958f644
Compare
This patch series was integrated into seen via git@6ebb0ba. |
There was a status update in the "New Topics" section about the branch Allow "git maintenance" schedule to be randomly distributed. Still under discussion. Expecting a reroll. source: <pull.1567.git.1691434300.gitgitgadget@gmail.com> |
958f644
to
66fb522
Compare
66fb522
to
c09bde0
Compare
When we initially created background maintenance -- with its hourly, daily, and weekly schedules -- we considered the effects of all clients launching fetches to the server every hour on the hour. The worry of DDoSing server hosts was noted, but left as something we would consider for a future update. As background maintenance has gained more adoption over the past three years, our worries about DDoSing the big Git hosts has been unfounded. Those systems, especially those serving public repositories, are already resilient to thundering herds of much smaller scale. However, sometimes organizations spin up specific custom server infrastructure either in addition to or on top of their Git host. Some of these technologies are built for a different range of scale, and can hit concurrency limits sooner. Organizations with such custom infrastructures are more likely to recommend tools like `scalar` which furthers their adoption of background maintenance. To help solve for this, create get_random_minute() as a method to help Git select a random minute when creating schedules in the future. The integrations with this method do not yet exist, but will follow in future changes. To avoid multiple sources of randomness in the Git codebase, create a new helper function, git_rand(), that returns a random uint32_t. This is similar to how rand() returns a random nonnegative value, except it is based on csprng_bytes() which is cryptographic and will return values larger than RAND_MAX. One thing that is important for testability is that we notice when we are under a test scenario and return a predictable result. The schedules themselves are not checked for this value, but at least one launchctl test checks that we do not unnecessarily reboot the schedule if it has not changed from a previous version. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance schedules to be fixed to a random minute of the hour. This randomness is only intended to spread out the load from a number of clients, but each client should have an hour between each maintenance cycle. Use get_random_minute() when constructing the schedules for launchctl. The format already includes a 'Minute' key which is modified from 0 to the random minute. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance schedules to be fixed to a random minute of the hour. This randomness is only intended to spread out the load from a number of clients, but each client should have an hour between each maintenance cycle. Add this random minute to the Windows scheduler integration. We need only to modify the minute value for the 'StartBoundary' tag across the three schedules. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance schedules to be fixed to a random minute of the hour. This randomness is only intended to spread out the load from a number of clients, but each client should have an hour between each maintenance cycle. Add this random minute to the cron integration. The cron schedule specification starts with a minute indicator, which was previously inserted as the "0" string but now takes the given minute as an integer parameter. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The systemd_timer_write_unit_templates() method writes a single template that is then used to start the hourly, daily, and weekly schedules with systemd. However, in order to schedule systemd maintenance on a given minute, these templates need to be replaced with specific schedules for each of these jobs. Before modifying the schedules, move the writing method above the systemd_timer_enable_unit() method, so we can write a specific schedule for each unit. The diff is computed smaller by showing systemd_timer_enable_unit() and systemd_timer_delete_units() move instead of systemd_timer_write_unit_templates() and systemd_timer_delete_unit_templates(). Signed-off-by: Derrick Stolee <derrickstolee@github.com>
The get_random_minute() method was created to allow maintenance schedules to be fixed to a random minute of the hour. This randomness is only intended to spread out the load from a number of clients, but each client should have an hour between each maintenance cycle. Add this random minute to the systemd integration. This integration is more complicated than similar changes for other schedulers because of a neat trick that systemd allows: templating. The previous implementation generated two template files with names of the form 'git-maintenance@.(timer|service)'. The '.timer' or '.service' indicates that this is a template that is picked up when we later specify '...@<schedule>.timer' or '...@<schedule>.service'. The '<schedule>' string is then used to insert into the template both the 'OnCalendar' schedule setting and the '--schedule' parameter of the 'git maintenance run' command. In order to set these schedules to a given minute, we can no longer use the 'hourly', 'daily', or 'weekly' strings for '<schedule>' and instead need to abandon the template model for the .timer files. We can still use templates for the .service files. For this reason, we split these writes into two methods. Modify the template with a custom schedule in the 'OnCalendar' setting. This schedule has some interesting differences from cron-like patterns, but is relatively easy to figure out from context. The one that might be confusing is that '*-*-*' is a date-based pattern, but this must be omitted when using 'Mon' to signal that we care about the day of the week. Monday is used since that matches the day used for the 'weekly' schedule used previously. Now that the timer files are not templates, we might want to abandon the '@' symbol in the file names. However, this would cause users with existing schedules to get two competing schedules due to different names. The work to remove the old schedule name is one thing that we can avoid by keeping the '@' symbol in our unit names. Since we are locked into this name, it makes sense that we keep the template model for the .service files. The rest of the change involves making sure we are writing these .timer and .service files before initializing the schedule with 'systemctl' and deleting the files when we are done. Some changes are also made to share the random minute along with a single computation of the execution path of the current Git executable. In addition, older Git versions may have written a 'git-maintenance@.timer' template file. Be sure to remove this when successfully enabling maintenance (or disabling maintenance). Signed-off-by: Derrick Stolee <derrickstolee@github.com>
0fdc1f0
to
f0c0f6e
Compare
This patch series was integrated into seen via git@1c58984. |
This patch series was integrated into seen via git@85fde1c. |
/submit |
Submitted as pull.1567.v2.git.1691699987.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
@@ -2355,6 +2348,23 @@ static int systemd_timer_write_unit_templates(const char *exec_path) | |||
if (!file) | |||
goto error; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <derrickstolee@github.com>
>
> The 'git maintenance run' command prevents concurrent runs in the same
> repository using a 'maintenance.lock' file. However, when using systemd
> the hourly maintenance runs the same time as the daily and weekly runs.
> (Similarly, daily maintenance runs at the same time as weekly
> maintenance.) These competing commands result in some maintenance not
> actually being run.
>
> This overlap was something we could not fix until we made the recent
> change to not use the builting 'hourly', 'daily', and 'weekly' schedules
> in systemd. We can adjust the schedules such that:
>
> 1. Hourly runs avoid the 0th hour.
> 2. Daily runs avoid Monday.
Simple and stupid is good. When I read the problem description, I
wondered if the hour in which daily jobs are run will also be
dispersed (so that the load of the server that helps daily tasks at
the clients will not be concentrated in a single hour out of 24
hours of a day), but hopefully the load is still spread within that
60 minutes evenly, which probably is good enough.
>
> This will keep maintenance runs from colliding when using systemd.
>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> builtin/gc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index e52129e485c..6f8df366fbe 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -2350,11 +2350,11 @@ static int systemd_timer_write_timer_file(enum schedule_priority schedule,
>
> switch (schedule) {
> case SCHEDULE_HOURLY:
> - schedule_pattern = xstrfmt("*-*-* *:%02d:00", minute);
> + schedule_pattern = xstrfmt("*-*-* 1..23:%02d:00", minute);
> break;
>
> case SCHEDULE_DAILY:
> - schedule_pattern = xstrfmt("*-*-* 0:%02d:00", minute);
> + schedule_pattern = xstrfmt("Tue..Sun *-*-* 0:%02d:00", minute);
> break;
>
> case SCHEDULE_WEEKLY:
@@ -1708,6 +1708,16 @@ static int get_schedule_cmd(const char **cmd, int *is_available) | |||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Taylor Blau wrote (reply to this):
On Thu, Aug 10, 2023 at 08:39:40PM +0000, Derrick Stolee via GitGitGadget wrote:
> ---
> builtin/gc.c | 10 ++++++++++
> wrapper.c | 10 ++++++++++
> wrapper.h | 6 ++++++
> 3 files changed, 26 insertions(+)
Very nice, this patch LGTM. Thanks for working on it!
Thanks,
Taylor
This patch series was integrated into seen via git@535c56d. |
There was a status update in the "Cooking" section about the branch Allow "git maintenance" schedule to be randomly distributed. Looking good. source: <pull.1567.v2.git.1691699987.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@43b2616. |
This patch series was integrated into seen via git@34adcfa. |
@@ -2276,77 +2292,54 @@ static char *xdg_config_home_systemd(const char *filename) | |||
return xdg_config_home_for("systemd/user", filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Stolee
This all looks good to me now, I've left a comment on the test but I think it is probably fine to go in as it is.
> -static int systemd_timer_write_unit_templates(const char *exec_path)
> +/*
> + * Write the schedule information into a git-maintenance@<schedule>.timer
> + * file using a custom minute. This timer file cannot use the templating
> + * system, so we generate a specific file for each.
> + */
Thanks for adding comments to the functions that write the timer and service files, they really helpful especially for readers who are not that familiar with systemd.
> +static int systemd_timer_delete_unit_files(void)
> +{
> + systemd_timer_delete_stale_timer_templates();
> +
> + /* Purposefully not short-circuited to make sure all are called. */
> + return systemd_timer_delete_timer_file(SCHEDULE_HOURLY) |
> + systemd_timer_delete_timer_file(SCHEDULE_DAILY) |
> + systemd_timer_delete_timer_file(SCHEDULE_WEEKLY) |
> + systemd_timer_delete_service_template();
Using "|" rather than "||" is a nice touch.
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 487e326b3fa..9ffe76729e6 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -744,7 +744,15 @@ test_expect_success 'start and stop Linux/systemd maintenance' '
> # start registers the repo
> git config --get --global --fixed-value maintenance.repo "$(pwd)" &&
> > - test_systemd_analyze_verify "systemd/user/git-maintenance@.service" &&
> + for schedule in hourly daily weekly
> + do
> + test_path_is_file "systemd/user/git-maintenance@$schedule.timer" || return 1
> + done &&
> + test_path_is_file "systemd/user/git-maintenance@.service" &&
> +
> + test_systemd_analyze_verify "systemd/user/git-maintenance@hourly.service" &&
> + test_systemd_analyze_verify "systemd/user/git-maintenance@daily.service" &&
> + test_systemd_analyze_verify "systemd/user/git-maintenance@weekly.service" &&
As we only write the template service file I'm not sure what we gain by these three checks but they don't seem to be doing any harm.
Best Wishes
Phillip
@@ -2355,6 +2348,23 @@ static int systemd_timer_write_unit_templates(const char *exec_path) | |||
if (!file) | |||
goto error; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Stolee
On 10/08/2023 21:39, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> > The 'git maintenance run' command prevents concurrent runs in the same
> repository using a 'maintenance.lock' file. However, when using systemd
> the hourly maintenance runs the same time as the daily and weekly runs.
> (Similarly, daily maintenance runs at the same time as weekly
> maintenance.) These competing commands result in some maintenance not
> actually being run.
> > This overlap was something we could not fix until we made the recent
> change to not use the builting 'hourly', 'daily', and 'weekly' schedules
> in systemd. We can adjust the schedules such that:
> > 1. Hourly runs avoid the 0th hour.
> 2. Daily runs avoid Monday.
> > This will keep maintenance runs from colliding when using systemd.
Thanks for fixing this, the patch looks good
Best Wishes
Phillip
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> builtin/gc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> > diff --git a/builtin/gc.c b/builtin/gc.c
> index e52129e485c..6f8df366fbe 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -2350,11 +2350,11 @@ static int systemd_timer_write_timer_file(enum schedule_priority schedule,
> > switch (schedule) {
> case SCHEDULE_HOURLY:
> - schedule_pattern = xstrfmt("*-*-* *:%02d:00", minute);
> + schedule_pattern = xstrfmt("*-*-* 1..23:%02d:00", minute);
> break;
> > case SCHEDULE_DAILY:
> - schedule_pattern = xstrfmt("*-*-* 0:%02d:00", minute);
> + schedule_pattern = xstrfmt("Tue..Sun *-*-* 0:%02d:00", minute);
> break;
> > case SCHEDULE_WEEKLY:
@@ -2606,9 +2728,12 @@ static int maintenance_start(int argc, const char **argv, const char *prefix) | |||
opts.scheduler = resolve_scheduler(opts.scheduler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Stolee
On 10/08/2023 21:39, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
> > When running 'git maintenance start', the current pattern is to
> configure global config settings to enable maintenance on the current
> repository and set 'maintenance.auto' to false and _then_ to set up the
> schedule with the system scheduler.
> > This has a problematic error condition: if the scheduler fails to
> initialize, the repository still will not use automatic maintenance due
> to the 'maintenance.auto' setting.
> > Fix this gap by swapping the order of operations. If Git fails to
> initialize maintenance, then the config changes should never happen.
The commit message and patch look good to me
Thanks
Phillip
> Reported-by: Phillip Wood <phillip.wood123@gmail.com>
> Signed-off-by: Derrick Stolee <derrickstolee@github.com>
> ---
> builtin/gc.c | 5 ++++-
> t/t7900-maintenance.sh | 13 +++++++++++++
> 2 files changed, 17 insertions(+), 1 deletion(-)
> > diff --git a/builtin/gc.c b/builtin/gc.c
> index 6f8df366fbe..fe5f871c493 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -2728,9 +2728,12 @@ static int maintenance_start(int argc, const char **argv, const char *prefix)
> opts.scheduler = resolve_scheduler(opts.scheduler);
> validate_scheduler(opts.scheduler);
> > + if (update_background_schedule(&opts, 1))
> + die(_("failed to set up maintenance schedule"));
> +
> if (maintenance_register(ARRAY_SIZE(register_args)-1, register_args, NULL))
> warning(_("failed to add repo to global config"));
> - return update_background_schedule(&opts, 1);
> + return 0;
> }
> > static const char *const builtin_maintenance_stop_usage[] = {
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index 9ffe76729e6..e56f5980dc4 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -849,4 +849,17 @@ test_expect_success 'register and unregister bare repo' '
> )
> '
> > +test_expect_success 'failed schedule prevents config change' '
> + git init --bare failcase &&
> +
> + for scheduler in crontab launchctl schtasks systemctl
> + do
> + GIT_TEST_MAINT_SCHEDULER="$scheduler:false" &&
> + export GIT_TEST_MAINT_SCHEDULER &&
> + test_must_fail \
> + git -C failcase maintenance start &&
> + test_must_fail git -C failcase config maintenance.auto || return 1
> + done
> +'
> +
> test_done
* [x] This is an early version of work already under review upstream. See gitgitgadget#1567 for the version submitted upstream. This is an early version for the `microsoft/git` fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.) At least on Windows, we should re-run `git maintenance start` as part of `scalar reconfigure` during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories. > This PR is a recreation of #593 now that the upstream change required rebasing to resolve header conflicts.
This patch series was integrated into seen via git@7dadcd0. |
This patch series was integrated into seen via git@01f8276. |
There was a status update in the "Cooking" section about the branch Allow "git maintenance" schedule to be randomly distributed. Will merge to 'next'. source: <pull.1567.v2.git.1691699987.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@644226b. |
* [x] This is an early version of work already under review upstream. See gitgitgadget#1567 for the version submitted upstream. This is an early version for the `microsoft/git` fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.) At least on Windows, we should re-run `git maintenance start` as part of `scalar reconfigure` during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories. > This PR is a recreation of git-for-windows#593 now that the upstream change required rebasing to resolve header conflicts.
This patch series was integrated into seen via git@54f9e7c. |
This patch series was integrated into seen via git@b721a13. |
This patch series was integrated into next via git@8d17424. |
* [x] This is an early version of work already under review upstream. See gitgitgadget#1567 for the version submitted upstream. This is an early version for the `microsoft/git` fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.) At least on Windows, we should re-run `git maintenance start` as part of `scalar reconfigure` during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories. > This PR is a recreation of git-for-windows#593 now that the upstream change required rebasing to resolve header conflicts.
* [x] This is an early version of work already under review upstream. See gitgitgadget#1567 for the version submitted upstream. This is an early version for the `microsoft/git` fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.) At least on Windows, we should re-run `git maintenance start` as part of `scalar reconfigure` during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories. > This PR is a recreation of git#593 now that the upstream change required rebasing to resolve header conflicts.
* [x] This is an early version of work already under review upstream. See gitgitgadget#1567 for the version submitted upstream. This is an early version for the `microsoft/git` fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.) At least on Windows, we should re-run `git maintenance start` as part of `scalar reconfigure` during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories. > This PR is a recreation of #593 now that the upstream change required rebasing to resolve header conflicts.
* [x] This is an early version of work already under review upstream. See gitgitgadget#1567 for the version submitted upstream. This is an early version for the `microsoft/git` fork so we could potentially include it in a release to our Microsoft partners. Upgrading to a version with these changes may help with some of the auth problems plaguing GVFS Cache Servers. (It's not a long-term fix, but would avoid having fingers pointed in this direction in the future.) At least on Windows, we should re-run `git maintenance start` as part of `scalar reconfigure` during the installer. On other platforms, we will need to rely on users slowly rotating through their local repositories. > This PR is a recreation of #593 now that the upstream change required rebasing to resolve header conflicts.
When we initially created background maintenance -- with its hourly, daily, and weekly schedules -- we considered the effects of all clients launching fetches to the server every hour on the hour. The worry of DDoSing server hosts was noted, but left as something we would consider for a future update.
As background maintenance has gained more adoption over the past three years, our worries about DDoSing the big Git hosts has been unfounded. Those systems, especially those serving public repositories, are already resilient to thundering herds of much smaller scale.
However, sometimes organizations spin up specific custom server infrastructure either in addition to or on top of their Git host. Some of these technologies are built for a different range of scale, and can hit concurrency limits sooner. Organizations with such custom infrastructures are more likely to recommend tools like
scalar
which furthers their adoption of background maintenance.This series attempts to help by spreading out background maintenance to a random minute of the hour. This minute is selected during
git maintenance start
, and the same minute applies to each of the three schedules.This isn't a full solution to this problem, as the custom infrastructure needs to be resilient to bursts of activity, but at least this will help somewhat.
Each of the integrated schedulers needs a different way of integrating the random minute. The most problematic is
systemd
, since our integration had a clever use of templates to write one schedule that inserted thehourly
,daily
, andweekly
schedules as a string into the template. This needs some refactoring before the custom minute could be inserted.For the most part, each scheduler's integration is relatively simple. That is, until we get to the
systemd
integration. That integration made use of a clever templating technique that is no longer possible when making this adjustment.Patches 5-7 involve
systemd
, though patch 5 is just a move of code (without edits) to make the diff in patch 6 somewhat simpler (it's still complicated due to templating changes). Patch 7 fixes an issue where the systemd schedules overlap.Patch 8 fixes an issue where config changes persist even if the scheduler fails to initialize. Thanks for noticing, Philip!
Updates in version 2
Thanks,
-Stolee
cc: gitster@pobox.com
cc: sandals@crustytoothpaste.net
cc: lenaic@lhuard.fr
cc: Taylor Blau me@ttaylorr.com
cc: Phillip Wood phillip.wood123@gmail.com