-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repository locking 🔒 #1292
base: main
Are you sure you want to change the base?
Repository locking 🔒 #1292
Conversation
d345e76
to
d3daefd
Compare
High level looks sane at a brief glance, but there's lots of details here. (One I have is how this intersects with worker threads we create internally) How about we avoid lots of |
Yeah, I tried to think about that a lot, but I couldn't totally come to grips with it. Basically, the worker threads are OK if you only ever acquire a shared lock. Which (I believe) is the case now. Otherwise, you have to arrange to take the lock before the worker is spawned and ensure that the worker doesn't go through a path that acquires locks. I spent a lot of time thinking about that how to address that. I had another version that tried to detect if the lock was being acquired in the same thread by stashing the main context in use when the repo was opened and skipping if the main context in use when the lock was acquired wasn't the same. But that only tells you if the context is the same, not if the thread is the same. (I might be missing something there. I am definitely not the The only other thing I could think of was having a dedicated locking thread, but I then didn't know how to address the lock stack being updated out of order by multiple threads.
You mean in
Not sure I follow here. What would the setting be? |
Yeah, I mean the binaries would use a cmdprivate API to do locking, which we could then enable by setting e.g.
or so in the repo config. (Which itself we could label as experimental I guess). That'd allow us to land most of these patches and gain experience with it a bit, even on builds that don't |
Oh, well I only added actual locking in However, the repo config option could be used to make I also see that the CI failed immediately because I haven't actually tested without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. I haven’t reviewed the locking code itself in detail yet.
man/ostree.repo-config.xml
Outdated
<listitem> | ||
<para> | ||
Integer value controlling the default repository locking | ||
timeout in seconds. If the value is <literal>-1</literal>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the value is 0 or <-1? Could go with “If the value is negative then ostree will block…”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 means don't wait - true non-blocking. I thought that was implied, but I could spell it out. < -1 is an error. I could just do all negative values, but I liked having the explicit value. It also allowed me to later use -2 as a sentinel.
man/ostree.xml
Outdated
Some <command>ostree</command> commands require the | ||
repository to be locked while operating. This option sets | ||
the repository lock timeout to TIMEOUT seconds. If TIMEOUT | ||
is <literal>-1</literal>, then the locking will block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here: what if it’s <-1 or 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
src/libostree/ostree-repo-private.h
Outdated
|
||
typedef OstreeRepo OstreeRepoAutoLock; | ||
OstreeRepoAutoLock * ostree_repo_auto_lock_push (OstreeRepo *self, | ||
gboolean exclusive, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to use an enum or flags instead of a boolean here, since it’s typically not obvious what the sense of a boolean is. Does TRUE
mean ‘make it exclusive’ or ‘make it shared’?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about that later after doing TRUE
/FALSE
so many times. It seemed sort of silly to have an enum that would only ever have 2 exclusive states. I guess:
typedef enum {
OSTREE_REPO_LOCK_SHARED,
OSTREE_REPO_LOCK_EXCLUSIVE
} OstreeRepoLockType;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think people (me included) should get out of the habit of thinking of two-element enums as ‘silly’, if they improve code readability. That said, there’s the possibility that this could be extended in future to include more flags or enum elements for more esoteric extensions to the locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll do it. That said, I would like this to remain a simple enum and not migrate into some flags setup later.
@@ -37,6 +37,9 @@ G_BEGIN_DECLS | |||
#define _OSTREE_MAX_OUTSTANDING_FETCHER_REQUESTS 8 | |||
#define _OSTREE_MAX_OUTSTANDING_DELTAPART_REQUESTS 2 | |||
|
|||
/* Keep this in sync with the man page for the core.lock-timeout option. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that it’s in seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
src/libostree/ostree-repo-private.h
Outdated
@@ -153,6 +157,7 @@ struct OstreeRepo { | |||
guint64 tmp_expiry_seconds; | |||
gchar *collection_id; | |||
gboolean add_remotes_config_dir; /* Add new remotes in remotes.d dir */ | |||
gint lock_timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe name it lock_timeout_seconds
to make the units more obvious? Same for the #define
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
src/libostree/ostree-repo-prune.c
Outdated
g_autoptr(OstreeRepoAutoLock) lock = NULL; | ||
lock = ostree_repo_auto_lock_push (self, TRUE, cancellable, error); | ||
if (!lock) | ||
return FALSE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought: if you changed the definition of ostree_repo_auto_lock_push()
to:
gboolean ostree_repo_auto_lock_push (OstreeRepo *self, OstreeRepoLockFlags flags, OstreeRepoAutoLock **out_lock, GCancellable *cancellable, GError **error)
, then all the uses of it could simplify to:
g_autoptr(OstreeRepoAutoLock) lock = NULL;
if (!ostree_repo_auto_lock_push (self, OSTREE_REPO_LOCK_EXCLUSIVE, &lock, cancellable, error))
return FALSE;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way on this and did consider that interface. I do generally like ostree's boolean return values with supplemental information in out params. In this case it seemed overkill to me, though, as both the return value and the out param would be transmitting the same information. Would the out param be a requirement? it would be useless to use the function otherwise. Not that we're counting cycles in ostree, but it's also less efficient to put another parameter on the stack and fill it in.
You can, of course, do the call and check in one line if you want with the current interface:
if (!(lock = ostree_repo_auto_lock_push(...)))
return FALSE;
That's definitely not as aesthetically pleasing as the interface you've suggested, which is also why I prefer to write it on 2 lines.
src/libostree/ostree-repo.c
Outdated
cur_state_name = "null"; | ||
} | ||
|
||
g_debug("Free lock: state=%s, depth=%u", cur_state_name, stack_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Missing space before (
(and in a couple of other places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
src/libostree/ostree-repo.c
Outdated
* configured in the core.lock-timeout option will replace this. | ||
* | ||
* Returns: %TRUE on success, %FALSE otherwise | ||
* Since: 2017.8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since: 2017.13
(probably wrong in a few other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah. I actually think I was pretty good about setting it to 2017.13, but I'll skim through for that. Thanks for catching that.
src/libostree/ostree-repo.h
Outdated
/** | ||
* OstreeRepoAutoLock: (skip) | ||
* | ||
* This is simply an alias tox #OstreeRepo used for automatic lock cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/tox/to/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I saw that myself later.
Yeah...we should probably try to move fsck into the library too. But doesn't need to block this patch. Let's try to get this cleaned up and then I'm happy to merge it and we can do followup commits? |
@cgwalters Had another thought about the threads.
I have a bunch of fixups that I'll push out shortly that should address the comments so far. |
☔ The latest upstream changes (presumably 2531d8f) made this pull request unmergeable. Please resolve the merge conflicts. |
266f3f9
to
65ba93b
Compare
OK, I added a bunch of fixups that I'll squash soon, but I wanted to give a chance to see the intermediate state if anyone was interested. I believe I addressed most of the comments so far. TODO:
|
☔ The latest upstream changes (presumably 9166605) made this pull request unmergeable. Please resolve the merge conflicts. |
Are you going to have a chance to rebase this soon? If not I might be able to do it early this week. |
Yeah, I'll rebase later today. Some of the commits should be reordered a bit. |
When compiling libostree, OSTREE_ENABLE_EXPERIMENTAL_API is managed via config.h. However, g-ir-scanner doesn't use that and won't include the experimental APIs unless the macro is defined through another means. Without this, none of the experimental APIs were being included in the gir data.
Both of these were inadvertently being included in the GIR. The OstreeCmdPrivateVTable declaration ended up causing a g-ir-compiler error because I wanted to add a function that had a return type that was marked as skipped (OstreeRepoAutoLock). The ostree_cmd__private__ documentation just had the wrong name.
Currently ostree has no method of guarding against concurrent pruning. When there are multiple repo writers, it's possible to have a pull or commit race against a prune and end up with missing objects. This adds an file based repo locking mechanism. The intention is to take a shared lock when writing objects and an exclusive lock when deleting them. In order to make use of the locking throughout the library in a fine grained fashion, the lock acts recursively with a stack of lock states. If the lock becomes exclusive, it will stay in that state until the stack is unwound past the initial exclusive push. The file locking is similar to GLnxLockFile in that it uses open file descriptor locks but falls back to flock when needed. The lock also attempts to be thread safe by storing the lock state in thread local storage with GPrivate. This means that each thread will have an independent lock for each repository it opens. There are some drawbacks to that, but it seemed impossible to manage the lock state coherently in the face of multithreaded access. The API is a push/pop interface in accordance with the recursive nature of the locking. The push interface uses an enum that's translated to LOCK_SH or LOCK_EX as needed. Both interfaces use an internal timeout field to decide whether to manage the lock in a blocking or non-blocking fashion. The intention is to allow ostree applications as well as administrators to control this timeout. For now, the default is a 30 second timeout. Note that the timeout is handled synchronously in thread since the lock is maintained in thread local storage. I.e., the thread that acquires the lock needs to be the same thread that runs the operation. There may be a way to offer an asynchronous version, but it's not clear exactly how that would work since it would likely involve a separate thread that invokes a callback when the locking operation completes. https://bugzilla.gnome.org/show_bug.cgi?id=759442
Define an auto cleanup handler for use with repo locking. This is based on the existing auto transaction cleanup. A wrapper for ostree_repo_lock_push() is added with it. The intended usage is like so: g_autoptr(OstreeRepoAutoLock) lock = NULL; lock = ostree_repo_auto_lock_push (repo, lock_type, cancellable, error); if (!lock) return FALSE; The functions and type are marked to be skipped by introspection since I can't see them being usable from bindings.
Take a shared repo lock during a transaction to ensure that another process doesn't delete objects.
Add exclusive repository locking to all the pruning entry points. This ensures that objects and deltas will not be removed while another process is writing to the repository.
Test that concurrent commits and prunes can succeed. Mostly this is a check that the new locking works correctly and the concurrent processes will properly wait until they've acquired the appropriate repository lock.
Set the PYTHONUNBUFFERED environment variable during tests so that python leaves stdout unbuffered. This is helpful when reading logs for failures since the interleaved stdout and stderr will generally come out in the right order. It's not perfect since tap-driver.sh does some special redirection to the log file, but it's an improvement.
When checking out objects, take a shared lock to ensure objects aren't deleted. If the repo isn't writable, the locking is a noop. This ensures that the repository owner can make robust checkouts, but it still allows other users to make a likely to be successful checkout if the repo mode supports it. During garbage collection, take an exclusive lock while deleting uncompressed objects.
Add a repository option, core.lock-timeout that controls the default lock timeout used. This allows administrators to have some influence over locking operations. For instance, critical repositories where operations should not fail could be configured to block when managing locks by setting lock-timeout to -1.
Exercise the ostree_repo_lock_push() and ostree_repo_lock_pop() APIs directly via pygi. This is a better unit test than relying on concurrent commit and prunes to check that the locking works correctly.
Ensure that the commit object doesn't get deleted while writing out the detached metadata. This is primarily to guard against dangling commitmeta files if racing with a prune since the commit object could still be updated while this is happening.
This ensures that the commits and deltas don't get deleted while generating the summary metadata.
Make sure that the summary file does not get replaced while generating summary.sig by taking an exclusive lock. It would be bad to publish a mismatching signature.
Make sure that neither commit nor any of the objects they reference are deleted while generating a delta.
Allow the ostree commands to use ostree_repo_set_lock_timeout() even when ostree is built without experimental API. When it's no longer experimental, this should be removed.
Builtins that require repository locking can add options with the OSTREE_BUILTIN_FLAG_LOCKING flag set to add a --lock-timeout option. This allows the user to control how long to wait to acquire the repository lock. The cmdprivate version of ostree_repo_set_lock_timeout() is used until it's no longer experimental.
All of these commands call one of the libostree functions that take locks, so add the --lock-timeout option to them.
Allow the ostree commands to use ostree_repo_auto_lock_push() and ostree_repo_auto_lock_cleanup() even when ostree is built without experimental API. When it's no longer experimental, this should be removed. Unfortunately, this requires including ostree-repo-private.h from ostree-cmdprivate.h. That includes otutil.h, which the systemd generator build wasn't prepared to handle.
Since ostree_repo_auto_lock_cleanup is not directly available, a separate type, wrappers and autoptr declaration are needed.
Although the libostree functions will take locks as needed, its best to ensure fsck has exclusive access to the repo for checking consistency.
Admin builtins that require repository locking set the OSTREE_BUILTIN_FLAG_LOCKING flag to add a --lock-timeout option. This needs to be used in conjunction with OSTREE_BUILTIN_FLAG_NO_REPO since the the timeout can only be applied once the repo is gathered from the sysroot. This allows the user to control how long to wait to acquire the repository lock. The cmdprivate version of ostree_repo_set_lock_timeout() is used until it's no longer experimental.
All of these commands call one of the libostree functions that take locks, so add the --lock-timeout option to them.
65ba93b
to
7ff4b93
Compare
Make sure the number of committers is even since we only create half as many trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more in-depth review of the locking code. I haven’t checked the lock call sites in the various ostree
builtins or APIs (and don’t plan to; they can be tweaked over time if problems arise).
@@ -261,6 +261,10 @@ OSTree-1.0.gir: libostree-1.la Makefile | |||
OSTree_1_0_gir_EXPORT_PACKAGES = ostree-1 | |||
OSTree_1_0_gir_INCLUDES = Gio-2.0 | |||
OSTree_1_0_gir_CFLAGS = $(libostree_1_la_CFLAGS) | |||
if ENABLE_EXPERIMENTAL_API | |||
# When compiling this is set via config.h, but g-ir-scanner doesn't use that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just be able to pass config.h
to OSTree_1_0_gir_FILES
and it will be #include
d into the temporary C file which g-ir-scanner
generates as part of the scanning process.
I think order is preserved, so list it first in OSTree_1_0_gir_FILES
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I didn't know that. I'll try that locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was separately fixed in #1322.
@@ -1,3 +1,4 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spurious blank line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address after #1343.
@@ -48,7 +49,14 @@ ostree_cmd__private__ (void) | |||
impl_ostree_generate_grub2_config, | |||
_ostree_repo_static_delta_dump, | |||
_ostree_repo_static_delta_query_exists, | |||
_ostree_repo_static_delta_delete | |||
_ostree_repo_static_delta_delete, | |||
/* Remove this when ostree_repo_set_lock_timeout is no longer experimental */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a FIXME
to this comment (and the one below) to make them more greppable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address after #1343.
@@ -20,17 +20,34 @@ | |||
#pragma once | |||
|
|||
#include "ostree-types.h" | |||
/* This is only needed to get the OstreeRepoLockType and OstreeRepoAutoLock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a FIXME
to this comment too. Similarly for other ‘remove this when no longer experimental’ comments throughout the rest of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address after #1343.
@@ -991,6 +992,16 @@ checkout_tree_at (OstreeRepo *self, | |||
g_assert (options->force_copy); | |||
} | |||
|
|||
/* Take a shared repo lock to try to ensure objects aren't deleted. If the | |||
* repo is not writable, this will be a noop and we just hope for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the repo is not writeable can we really expect objects to be deleted? Are you thinking of the situation where a repo is being operated on by processes belonging to two users: the user deleting objects has write permissions on the repository, and the user reading it does not?
We could perhaps fix this by sticking an inotify on the lock file if we only have read-only access to the repository, and bailing out if an exclusive lock is taken on the repository while we’re in a locked section. That might be overengineering things though. It would, however, be a general solution which could be implemented in the generic locking code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case is simply an unprivileged user trying to checkout from /ostree/repo or /var/lib/flatpak/repo or something like that. They should be allowed to do a checkout as they were before.
Like you say, if the repo is not writable by the current user, then they're not going to be deleting any objects. But if, say, flatpak is pruning /var/lib/flatpak/repo while you're trying to do a checkout, bad things could happen.
But that's already the case, and I don't think we should optimize for unprivileged users. I'm really just trying to make sure that multiple processes running as the repo owner can't screw each other since that's by far the common case. Since the unprivileged user won't even be able to open the lock file (it's opened 600 particularly so that only repo owners can take exclusive locks), then I don't think they'll be able to get the lock state.
An inotify watch is interesting. I'm not sure inotify can tell you when the lock state changes, though, only if it's been opened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, having done a bit of reading, I don’t think it’s possible to monitor the status of a file lock using inotify. The only way of doing it is to poll with fcntl(F_GETLK)
, but that’s horrific and racy.
The other approach I can think of is to use a world-writeable directory (somewhere in /run
, probably), which can contain the locks for many repositories, each with a deterministic name based on the repository directory’s device ID and inode.
That said, allowing an unprivileged process to take a lock on a repository it can’t write to allows priority inversion attacks, since there’s no realistic way to stop it taking an exclusive lock.
Unless we put shared locks in /run
and exclusive locks in $repo/.lock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe the way this was written where only the repo owner can manage the lock is correct. I think any attempt to allow non-owners to take a lock would be very complex and likely mean that flock-type locking can't be used.
gint64 lock_timeout = g_ascii_strtoll (lock_timeout_str, &endptr, 0); | ||
if ((lock_timeout < -1) || | ||
(lock_timeout > G_MAXINT) || | ||
(lock_timeout == 0 && endptr == lock_timeout_str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn’t check for the case where *endptr != '\0'
, which could be caused by lock_timeout_str == "123somestring"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I hadn't thought of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The g_ascii_strto*()
are an absolute nightmare for correct error handling.
GLib recently acquired g_ascii_string_to_{un,}signed()
, which are a lot better, but unfortunately too recent for libostree to use yet. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix after #1343. This part didn't make it into the slimmed down setup.
* represents the number of seconds the application will sleep attempting to | ||
* acquire the repository lock. | ||
* | ||
* Returns: lock timeout for the repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: maybe add , in seconds
to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix after #1343.
* Set the lock timeout for the repository. A timeout of -1 indicates that the | ||
* application will block until the lock is acquired. Otherwise, it represents | ||
* the number of seconds the application will sleep attempting to acquire the | ||
* repository lock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention that anything < -1
is not allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix after #1343.
* OstreeRepoLockType: | ||
* The type of repository lock to acquire. | ||
* @OSTREE_REPO_LOCK_SHARED: A shared lock | ||
* @OSTREE_REPO_LOCK_EXCLUSIVE: An exclusive lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a Since:
line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #1343.
gint64 lock_timeout = g_ascii_strtoll (value, &endptr, 0); | ||
if ((lock_timeout < -1) || | ||
(lock_timeout > G_MAXINT) || | ||
(lock_timeout == 0 && endptr == value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the other g_ascii_strtoll()
call, this should check for *endptr != '\0'
as another error condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix after #1343.
☔ The latest upstream changes (presumably 519b30b) made this pull request unmergeable. Please resolve the merge conflicts. |
Would be happy to start trying to take this - perhaps it's simplest to split this up into a PR that adds the internal locking API with just a few uses - maybe just prune and commit? Then we can gradually trickle in changes to enable it in different places like fsck as we test? |
Sorry, got busy with a bunch of other stuff but I should be getting back to this today or tomorrow. I'll split it up into 2 PRs like you say. Some other stuff I've thought of in the meantime:
|
I opened #1343 with the slimmed down initial setup as requested. I still passes the test suite, although I haven't really used it in real applications. |
How do you want to proceed with rebasing this? Are you planning to work on it anytime soon? |
Sorry, I had to take off my ostree developer hat for a while. Next week I'll get back on this and flesh it out. Does that timeline work or were you looking for something sooner? |
No pressure on the timeline! Currently locking is disabled by default. I am happy to help out a bit if we can figure out how to subdivide work. Just trying to keep things moving! |
@dbnicholson: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dbnicholson: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
In #813, some simple repository locking was being considered to prevent prunes from running concurrently with checkouts and transactions. While I think that would be a great thing, I don't think that level of locking is rich enough for ostree. Since the locking isn't recursive, it would prevent both libostree and ostree programs from obtaining an exclusive lock if it would also happen somewhere else in libostree. The simple example I can think of is
ostree_repo_prune_static_deltas()
. You'd want to take an exclusive lock there as an entry point to libostree, but it's also called fromostree_repo_prune()
, which you'd also want to take an exclusive lock for. That would simply deadlock.The alternative I came up with here is to maintain the lock in thread local storage with a stack of lock states. Then each thread locks independently and the stack stays coherent. 8c66c5a has that implementation with a large comment in
ostree-repo.c
describing how it works. The API is then a push/pop interface.The drawback of having the lock in TLS is that it seems to prevent taking the lock asynchronously since that would likely involve having another thread acquire the lock. Once that thread completed, the lock would be dropped. So, the implementation here blocks synchronously when pushing or popping the lock. The lock timeout can be managed globally with a repo config option or from an API.
All the new APIs are experimental, but I think it would be best to make them public so ostree programs can do their own locking around a series of ostree operations.
I added a bunch of locking in places I thought it would be needed. One spot I didn't try to tackle was the core and remote config. Thinking about all the use cases got me confused, so I punted for now...
This closes https://bugzilla.gnome.org/show_bug.cgi?id=759442.
Let me know what you think! I know there's a lot here, but this issue has affected us quite a bit at Endless.