-
-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically remove stale locks #1674
Conversation
Current coverage is 84.37% (diff: 76.00%)@@ master #1674 diff @@
==========================================
Files 20 20
Lines 6609 6664 +55
Methods 0 0
Messages 0 0
Branches 1125 1136 +11
==========================================
+ Hits 5582 5623 +41
- Misses 754 767 +13
- Partials 273 274 +1
|
9ea8a64
to
487b5d1
Compare
@@ -323,12 +323,15 @@ def borg_cmd(self, args, testing): | |||
opts.append('--critical') | |||
else: | |||
raise ValueError('log level missing, fix this code') | |||
env_vars = [] | |||
if bool(os.environ.get('BORG_UNIQUE_HOSTNAME')): | |||
env_vars.append('BORG_UNIQUE_HOSTNAME=yes') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a while ago I changed semantics of env vars. in early borg and attic, non-present env vars meant False and any present ones meant True (even if e.g. FOO=False or FOO=0 was used - they would be evaluated as truish).
after my change, env vars are handled like yes() input. So, it is FOO=0 and FOO=1 or FOO=yes and FOO=no now.
So, I think it should be:
unique_hostname = os.environ.get('BORG_UNIQUE_HOSTNAME')
if unique_hostname is not None:
env_vars.append('BORG_UNIQUE_HOSTNAME=%s' % unique_hostname)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there are some other places that do bool(os.environ.get('B_U_H')), I'll change those as well for consistent behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I added prompt=True|False to yes() to facilitate this. Output now looks like this:
Remote: Enabled removal of stale repository locks
Remote: Killed stale lock tiger.3300-0.
Remote: Removed stale exclusive roster lock for pid 3300.
Remote: Removed stale exclusive roster lock for pid 3300.
Enabled removal of stale cache locks
Killed stale lock tiger.3292-0.
Removed stale exclusive roster lock for pid 3292.
Removed stale exclusive roster lock for pid 3292.
The doubled roster locks are because this is done on loading, so if it's not immediately saved that gets doubled. I think we could just move those down to INFO log level, or remove them.
else: # pragma: no cover | ||
remote_path = args.remote_path or os.environ.get('BORG_REMOTE_PATH', 'borg') | ||
remote_path = replace_placeholders(remote_path) | ||
return [remote_path, 'serve'] + opts | ||
return env_vars + [remote_path, 'serve'] + opts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this is no-cover: did you test this manually, using a ssh: repo location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it works.
dead_lock = ExclusiveLock(lockpath, id=dead_id).acquire() | ||
with ExclusiveLock(lockpath, id=our_id, kill_stale_locks=True): | ||
with pytest.raises(NotMyLock): | ||
dead_lock.release() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
69 can only acquire the lock, if there is no other lock for lockpath.
but there is dead_lock. as it is dead, and kill_stale_locks, it gets killed.
but why does 71 raise NotMyLock then and not NotLocked?
and why is it allowed to enter the with-block if the lock is still considered present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NotMyLock because there is an active lock and it belongs to 69. NotLocked would be the result if we de-indentend 70/71 one level, in which case 69 would be unlocked, and since 68 is dead, there is no lock = NotLocked.
a0345af
to
6b88da2
Compare
# the lock made by the parent, so it needs to use the same PID for that. | ||
_pid = os.getpid() | ||
# XXX this sometimes requires live internet access for issuing a DNS query in the background. | ||
_hostname = socket.gethostname() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a way around that API-wise, if the machine is configured that way.
Feature-wise this is done. Output could use some polishing perhaps:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small nitpicks
@@ -71,6 +71,8 @@ def __init__(self, repository, key, manifest, path=None, sync=True, do_files=Fal | |||
self.key = key | |||
self.manifest = manifest | |||
self.path = path or os.path.join(get_cache_dir(), repository.id_str) | |||
self.unique_hostname = yes(true_msg='Enabled removal of stale cache locks', | |||
env_var_override='BORG_UNIQUE_HOSTNAME', prompt=False, env_msg=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BORG_UNIQUE_HOSTNAME is a bit ambiguous / unintuitive maybe.
It is meant as boolean, but from the name, it could also be meant as a var holding a hostname.
BORG_HOSTNAME_IS_UNIQUE maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also affects the python variable names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
|
||
return local_pid_alive(pid) | ||
|
||
def local_pid_alive(pid): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8: add one more blank line above this one.
try: | ||
# This doesn't work on Windows. | ||
# This does not kill anything, 0 means "see if we can send a signal to this process or not". | ||
# Possible errors: No such process (== stale lock) or permission denied (not a stale lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a dot at end of sentence.
if err.errno == errno.ESRCH: | ||
# ESRCH = no such process | ||
return False | ||
# Any other error (eg. permissions) mean that the process ID refers to a live process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... means ... (and add a dot at the end).
Note: I reviewed the log levels and they seem ok as they are. Users should get notified if there are stale locks because this points to a problem in their setup. Not sure though if we always want to read the "Enabled removal ..." msg from yes() all the time. |
They're quick and distinct feedback that it worked / is set up correctly (on both ends). I guess we could move it to INFO level, moving the emission of the message from yes() to the call site. I'll apply that and the other suggestions you made. |
squash, merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, squash some, merge.
If BORG_UNIQUE_HOSTNAME shell variable is set, stale locks in both cache and repository are deleted. Stale lock is defined as a lock that's originating from the same hostname as us, and correspond to a pid that no longer exists. This fixes borgbackup#562
(testability)
10a6771
to
676e69c
Compare
@verygreen :) |
Supersedes #1246
Fixes #562
Edit travis #3000 :)