Repository locking 🔒 #1292

dbnicholson · 2017-10-18T17:52:40Z

In #813, some simple repository locking was being considered to prevent prunes from running concurrently with checkouts and transactions. While I think that would be a great thing, I don't think that level of locking is rich enough for ostree. Since the locking isn't recursive, it would prevent both libostree and ostree programs from obtaining an exclusive lock if it would also happen somewhere else in libostree. The simple example I can think of is ostree_repo_prune_static_deltas(). You'd want to take an exclusive lock there as an entry point to libostree, but it's also called from ostree_repo_prune(), which you'd also want to take an exclusive lock for. That would simply deadlock.

The alternative I came up with here is to maintain the lock in thread local storage with a stack of lock states. Then each thread locks independently and the stack stays coherent. 8c66c5a has that implementation with a large comment in ostree-repo.c describing how it works. The API is then a push/pop interface.

The drawback of having the lock in TLS is that it seems to prevent taking the lock asynchronously since that would likely involve having another thread acquire the lock. Once that thread completed, the lock would be dropped. So, the implementation here blocks synchronously when pushing or popping the lock. The lock timeout can be managed globally with a repo config option or from an API.

All the new APIs are experimental, but I think it would be best to make them public so ostree programs can do their own locking around a series of ostree operations.

I added a bunch of locking in places I thought it would be needed. One spot I didn't try to tackle was the core and remote config. Thinking about all the use cases got me confused, so I punted for now...

This closes https://bugzilla.gnome.org/show_bug.cgi?id=759442.

Let me know what you think! I know there's a lot here, but this issue has affected us quite a bit at Endless.

cgwalters · 2017-10-18T18:37:35Z

High level looks sane at a brief glance, but there's lots of details here. (One I have is how this intersects with worker threads we create internally)

How about we avoid lots of #ifdefs here by having a private version that's used internally, then the "experimental" bit is just exposing it publicly? The private version could actually be a repo setting too so it doesn't depend on the experimental-API build?

dbnicholson · 2017-10-18T19:20:59Z

High level looks sane at a brief glance, but there's lots of details here. (One I have is how this intersects with worker threads we create internally)

Yeah, I tried to think about that a lot, but I couldn't totally come to grips with it. Basically, the worker threads are OK if you only ever acquire a shared lock. Which (I believe) is the case now. Otherwise, you have to arrange to take the lock before the worker is spawned and ensure that the worker doesn't go through a path that acquires locks.

I spent a lot of time thinking about that how to address that. I had another version that tried to detect if the lock was being acquired in the same thread by stashing the main context in use when the repo was opened and skipping if the main context in use when the lock was acquired wasn't the same. But that only tells you if the context is the same, not if the thread is the same. (I might be missing something there. I am definitely not the GMainContext guru.)

The only other thing I could think of was having a dedicated locking thread, but I then didn't know how to address the lock stack being updated out of order by multiple threads.

How about we avoid lots of #ifdefs here by having a private version that's used internally, then the "experimental" bit is just exposing it publicly?

You mean in src/ostree? I could add a cmdprivate pointer if that's better. Most of src/libostree is clean except for the headers.

The private version could actually be a repo setting too so it doesn't depend on the experimental-API build?

Not sure I follow here. What would the setting be?

cgwalters · 2017-10-18T19:25:52Z

Yeah, I mean the binaries would use a cmdprivate API to do locking, which we could then enable by setting e.g.

[core]
locking=true

or so in the repo config. (Which itself we could label as experimental I guess). That'd allow us to land most of these patches and gain experience with it a bit, even on builds that don't --enable-experimental-api.

dbnicholson · 2017-10-18T19:31:41Z

Oh, well I only added actual locking in fsck. The rest of the stuff in src/ostree is just turning the --lock-timeout option into a call to ostree_repo_set_lock_timeout(), which could be cmdprivate like you say.

However, the repo config option could be used to make ostree_repo_lock_push() and ostree_repo_lock_pop() into noops. That's a good idea in case this is broken...

I also see that the CI failed immediately because I haven't actually tested without --enable-experimental-api in a while. Whoops.

pwithnall

A few comments. I haven’t reviewed the locking code itself in detail yet.

pwithnall · 2017-10-18T23:23:56Z

man/ostree.repo-config.xml

+        <listitem>
+          <para>
+            Integer value controlling the default repository locking
+            timeout in seconds. If the value is <literal>-1</literal>,


What if the value is 0 or <-1? Could go with “If the value is negative then ostree will block…”

0 means don't wait - true non-blocking. I thought that was implied, but I could spell it out. < -1 is an error. I could just do all negative values, but I liked having the explicit value. It also allowed me to later use -2 as a sentinel.

pwithnall · 2017-10-18T23:25:57Z

man/ostree.xml

+                  Some <command>ostree</command> commands require the
+                  repository to be locked while operating. This option sets
+                  the repository lock timeout to TIMEOUT seconds. If TIMEOUT
+                  is <literal>-1</literal>, then the locking will block


Similarly here: what if it’s <-1 or 0?

pwithnall · 2017-10-18T23:27:45Z

src/libostree/ostree-repo-private.h

+
+typedef OstreeRepo OstreeRepoAutoLock;
+OstreeRepoAutoLock * ostree_repo_auto_lock_push (OstreeRepo    *self,
+                                                 gboolean       exclusive,


Might make sense to use an enum or flags instead of a boolean here, since it’s typically not obvious what the sense of a boolean is. Does TRUE mean ‘make it exclusive’ or ‘make it shared’?

Yeah, I thought about that later after doing TRUE/FALSE so many times. It seemed sort of silly to have an enum that would only ever have 2 exclusive states. I guess:

typedef enum { OSTREE_REPO_LOCK_SHARED, OSTREE_REPO_LOCK_EXCLUSIVE } OstreeRepoLockType;

I think people (me included) should get out of the habit of thinking of two-element enums as ‘silly’, if they improve code readability. That said, there’s the possibility that this could be extended in future to include more flags or enum elements for more esoteric extensions to the locking.

OK, I'll do it. That said, I would like this to remain a simple enum and not migrate into some flags setup later.

pwithnall · 2017-10-18T23:29:29Z

src/libostree/ostree-repo-private.h

@@ -37,6 +37,9 @@ G_BEGIN_DECLS
 #define _OSTREE_MAX_OUTSTANDING_FETCHER_REQUESTS 8
 #define _OSTREE_MAX_OUTSTANDING_DELTAPART_REQUESTS 2

+/* Keep this in sync with the man page for the core.lock-timeout option. */


Mention that it’s in seconds.

pwithnall · 2017-10-18T23:29:58Z

src/libostree/ostree-repo-private.h

@@ -153,6 +157,7 @@ struct OstreeRepo {
  guint64 tmp_expiry_seconds;
  gchar *collection_id;
  gboolean add_remotes_config_dir; /* Add new remotes in remotes.d dir */
+  gint lock_timeout;


Maybe name it lock_timeout_seconds to make the units more obvious? Same for the #define above.

pwithnall · 2017-10-18T23:32:22Z

src/libostree/ostree-repo-prune.c

+  g_autoptr(OstreeRepoAutoLock) lock = NULL;
+  lock = ostree_repo_auto_lock_push (self, TRUE, cancellable, error);
+  if (!lock)
+    return FALSE;


One thought: if you changed the definition of ostree_repo_auto_lock_push() to:
gboolean ostree_repo_auto_lock_push (OstreeRepo *self, OstreeRepoLockFlags flags, OstreeRepoAutoLock **out_lock, GCancellable *cancellable, GError **error), then all the uses of it could simplify to:

g_autoptr(OstreeRepoAutoLock) lock = NULL; if (!ostree_repo_auto_lock_push (self, OSTREE_REPO_LOCK_EXCLUSIVE, &lock, cancellable, error)) return FALSE;

I could go either way on this and did consider that interface. I do generally like ostree's boolean return values with supplemental information in out params. In this case it seemed overkill to me, though, as both the return value and the out param would be transmitting the same information. Would the out param be a requirement? it would be useless to use the function otherwise. Not that we're counting cycles in ostree, but it's also less efficient to put another parameter on the stack and fill it in.

You can, of course, do the call and check in one line if you want with the current interface:

if (!(lock = ostree_repo_auto_lock_push(...))) return FALSE;

That's definitely not as aesthetically pleasing as the interface you've suggested, which is also why I prefer to write it on 2 lines.

pwithnall · 2017-10-18T23:34:39Z

src/libostree/ostree-repo.c

+          cur_state_name = "null";
+        }
+
+      g_debug("Free lock: state=%s, depth=%u", cur_state_name, stack_len);


Nitpick: Missing space before ( (and in a couple of other places).

Good catch.

pwithnall · 2017-10-18T23:39:21Z

src/libostree/ostree-repo.c

+ * configured in the core.lock-timeout option will replace this.
+ *
+ * Returns: %TRUE on success, %FALSE otherwise
+ * Since: 2017.8


Since: 2017.13

(probably wrong in a few other places)

Bah. I actually think I was pretty good about setting it to 2017.13, but I'll skim through for that. Thanks for catching that.

pwithnall · 2017-10-18T23:39:39Z

src/libostree/ostree-repo.h

+/**
+ * OstreeRepoAutoLock: (skip)
+ *
+ * This is simply an alias tox #OstreeRepo used for automatic lock cleanup.


Thanks, I saw that myself later.

cgwalters · 2017-10-19T16:08:28Z

Yeah...we should probably try to move fsck into the library too. But doesn't need to block this patch.

Let's try to get this cleaned up and then I'm happy to merge it and we can do followup commits?

dbnicholson · 2017-10-19T17:07:49Z

@cgwalters Had another thought about the threads.

If the lock to be taken is shared, only do it when outside a transaction. Assuming that the transaction is started on a thread that will live throughout the life of the transaction, then you already have a shared lock.
If the lock to be taken is exclusive, add internal _unlocked versions that would be called in any place that would be run from a worker thread. So far I've only encountered ostree_repo_delete_object removing tombstones when writing out commits.

I have a bunch of fixups that I'll push out shortly that should address the comments so far.

rh-atomic-bot · 2017-10-19T19:26:59Z

☔ The latest upstream changes (presumably 2531d8f) made this pull request unmergeable. Please resolve the merge conflicts.

dbnicholson · 2017-10-19T19:40:11Z

OK, I added a bunch of fixups that I'll squash soon, but I wanted to give a chance to see the intermediate state if anyone was interested. I believe I addressed most of the comments so far.

TODO:

Add the core.locking option to globally enable or disable locking
Investigate locking from worker threads more
Look at locking for core and remote config

rh-atomic-bot · 2017-10-20T13:21:45Z

☔ The latest upstream changes (presumably 9166605) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2017-10-23T13:12:20Z

Are you going to have a chance to rebase this soon? If not I might be able to do it early this week.

dbnicholson · 2017-10-23T14:03:44Z

Yeah, I'll rebase later today. Some of the commits should be reordered a bit.

When compiling libostree, OSTREE_ENABLE_EXPERIMENTAL_API is managed via config.h. However, g-ir-scanner doesn't use that and won't include the experimental APIs unless the macro is defined through another means. Without this, none of the experimental APIs were being included in the gir data.

Both of these were inadvertently being included in the GIR. The OstreeCmdPrivateVTable declaration ended up causing a g-ir-compiler error because I wanted to add a function that had a return type that was marked as skipped (OstreeRepoAutoLock). The ostree_cmd__private__ documentation just had the wrong name.

Currently ostree has no method of guarding against concurrent pruning. When there are multiple repo writers, it's possible to have a pull or commit race against a prune and end up with missing objects. This adds an file based repo locking mechanism. The intention is to take a shared lock when writing objects and an exclusive lock when deleting them. In order to make use of the locking throughout the library in a fine grained fashion, the lock acts recursively with a stack of lock states. If the lock becomes exclusive, it will stay in that state until the stack is unwound past the initial exclusive push. The file locking is similar to GLnxLockFile in that it uses open file descriptor locks but falls back to flock when needed. The lock also attempts to be thread safe by storing the lock state in thread local storage with GPrivate. This means that each thread will have an independent lock for each repository it opens. There are some drawbacks to that, but it seemed impossible to manage the lock state coherently in the face of multithreaded access. The API is a push/pop interface in accordance with the recursive nature of the locking. The push interface uses an enum that's translated to LOCK_SH or LOCK_EX as needed. Both interfaces use an internal timeout field to decide whether to manage the lock in a blocking or non-blocking fashion. The intention is to allow ostree applications as well as administrators to control this timeout. For now, the default is a 30 second timeout. Note that the timeout is handled synchronously in thread since the lock is maintained in thread local storage. I.e., the thread that acquires the lock needs to be the same thread that runs the operation. There may be a way to offer an asynchronous version, but it's not clear exactly how that would work since it would likely involve a separate thread that invokes a callback when the locking operation completes. https://bugzilla.gnome.org/show_bug.cgi?id=759442

Define an auto cleanup handler for use with repo locking. This is based on the existing auto transaction cleanup. A wrapper for ostree_repo_lock_push() is added with it. The intended usage is like so: g_autoptr(OstreeRepoAutoLock) lock = NULL; lock = ostree_repo_auto_lock_push (repo, lock_type, cancellable, error); if (!lock) return FALSE; The functions and type are marked to be skipped by introspection since I can't see them being usable from bindings.

Take a shared repo lock during a transaction to ensure that another process doesn't delete objects.

Add exclusive repository locking to all the pruning entry points. This ensures that objects and deltas will not be removed while another process is writing to the repository.

Test that concurrent commits and prunes can succeed. Mostly this is a check that the new locking works correctly and the concurrent processes will properly wait until they've acquired the appropriate repository lock.

Set the PYTHONUNBUFFERED environment variable during tests so that python leaves stdout unbuffered. This is helpful when reading logs for failures since the interleaved stdout and stderr will generally come out in the right order. It's not perfect since tap-driver.sh does some special redirection to the log file, but it's an improvement.

When checking out objects, take a shared lock to ensure objects aren't deleted. If the repo isn't writable, the locking is a noop. This ensures that the repository owner can make robust checkouts, but it still allows other users to make a likely to be successful checkout if the repo mode supports it. During garbage collection, take an exclusive lock while deleting uncompressed objects.

Add a repository option, core.lock-timeout that controls the default lock timeout used. This allows administrators to have some influence over locking operations. For instance, critical repositories where operations should not fail could be configured to block when managing locks by setting lock-timeout to -1.

Exercise the ostree_repo_lock_push() and ostree_repo_lock_pop() APIs directly via pygi. This is a better unit test than relying on concurrent commit and prunes to check that the locking works correctly.

Ensure that the commit object doesn't get deleted while writing out the detached metadata. This is primarily to guard against dangling commitmeta files if racing with a prune since the commit object could still be updated while this is happening.

This ensures that the commits and deltas don't get deleted while generating the summary metadata.

Make sure that the summary file does not get replaced while generating summary.sig by taking an exclusive lock. It would be bad to publish a mismatching signature.

Make sure that neither commit nor any of the objects they reference are deleted while generating a delta.

Allow the ostree commands to use ostree_repo_set_lock_timeout() even when ostree is built without experimental API. When it's no longer experimental, this should be removed.

Builtins that require repository locking can add options with the OSTREE_BUILTIN_FLAG_LOCKING flag set to add a --lock-timeout option. This allows the user to control how long to wait to acquire the repository lock. The cmdprivate version of ostree_repo_set_lock_timeout() is used until it's no longer experimental.

All of these commands call one of the libostree functions that take locks, so add the --lock-timeout option to them.

Allow the ostree commands to use ostree_repo_auto_lock_push() and ostree_repo_auto_lock_cleanup() even when ostree is built without experimental API. When it's no longer experimental, this should be removed. Unfortunately, this requires including ostree-repo-private.h from ostree-cmdprivate.h. That includes otutil.h, which the systemd generator build wasn't prepared to handle.

Since ostree_repo_auto_lock_cleanup is not directly available, a separate type, wrappers and autoptr declaration are needed.

Although the libostree functions will take locks as needed, its best to ensure fsck has exclusive access to the repo for checking consistency.

Admin builtins that require repository locking set the OSTREE_BUILTIN_FLAG_LOCKING flag to add a --lock-timeout option. This needs to be used in conjunction with OSTREE_BUILTIN_FLAG_NO_REPO since the the timeout can only be applied once the repo is gathered from the sysroot. This allows the user to control how long to wait to acquire the repository lock. The cmdprivate version of ostree_repo_set_lock_timeout() is used until it's no longer experimental.

All of these commands call one of the libostree functions that take locks, so add the --lock-timeout option to them.

Make sure the number of committers is even since we only create half as many trees.

pwithnall

A more in-depth review of the locking code. I haven’t checked the lock call sites in the various ostree builtins or APIs (and don’t plan to; they can be tweaked over time if problems arise).

pwithnall · 2017-10-26T12:33:14Z

Makefile-libostree.am

@@ -261,6 +261,10 @@ OSTree-1.0.gir: libostree-1.la Makefile
 OSTree_1_0_gir_EXPORT_PACKAGES = ostree-1
 OSTree_1_0_gir_INCLUDES = Gio-2.0
 OSTree_1_0_gir_CFLAGS = $(libostree_1_la_CFLAGS)
+if ENABLE_EXPERIMENTAL_API
+# When compiling this is set via config.h, but g-ir-scanner doesn't use that


I think you should just be able to pass config.h to OSTree_1_0_gir_FILES and it will be #included into the temporary C file which g-ir-scanner generates as part of the scanning process.

I think order is preserved, so list it first in OSTree_1_0_gir_FILES.

Interesting, I didn't know that. I'll try that locally.

This was separately fixed in #1322.

pwithnall · 2017-10-26T12:34:46Z

src/libostree/ostree-cmdprivate.c

@@ -1,3 +1,4 @@
+


Spurious blank line.

Will address after #1343.

pwithnall · 2017-10-26T12:36:24Z

src/libostree/ostree-cmdprivate.c

@@ -48,7 +49,14 @@ ostree_cmd__private__ (void)
    impl_ostree_generate_grub2_config,
    _ostree_repo_static_delta_dump,
    _ostree_repo_static_delta_query_exists,
-    _ostree_repo_static_delta_delete
+    _ostree_repo_static_delta_delete,
+    /* Remove this when ostree_repo_set_lock_timeout is no longer experimental */


Maybe add a FIXME to this comment (and the one below) to make them more greppable?

Will address after #1343.

pwithnall · 2017-10-26T12:37:32Z

src/libostree/ostree-cmdprivate.h

@@ -20,17 +20,34 @@
 #pragma once

 #include "ostree-types.h"
+/* This is only needed to get the OstreeRepoLockType and OstreeRepoAutoLock


Maybe add a FIXME to this comment too. Similarly for other ‘remove this when no longer experimental’ comments throughout the rest of the PR.

Will address after #1343.

pwithnall · 2017-10-26T12:43:00Z

src/libostree/ostree-repo-checkout.c

@@ -991,6 +992,16 @@ checkout_tree_at (OstreeRepo                        *self,
      g_assert (options->force_copy);
    }

+  /* Take a shared repo lock to try to ensure objects aren't deleted. If the
+   * repo is not writable, this will be a noop and we just hope for the


If the repo is not writeable can we really expect objects to be deleted? Are you thinking of the situation where a repo is being operated on by processes belonging to two users: the user deleting objects has write permissions on the repository, and the user reading it does not?

We could perhaps fix this by sticking an inotify on the lock file if we only have read-only access to the repository, and bailing out if an exclusive lock is taken on the repository while we’re in a locked section. That might be overengineering things though. It would, however, be a general solution which could be implemented in the generic locking code.

The case is simply an unprivileged user trying to checkout from /ostree/repo or /var/lib/flatpak/repo or something like that. They should be allowed to do a checkout as they were before.

Like you say, if the repo is not writable by the current user, then they're not going to be deleting any objects. But if, say, flatpak is pruning /var/lib/flatpak/repo while you're trying to do a checkout, bad things could happen.

But that's already the case, and I don't think we should optimize for unprivileged users. I'm really just trying to make sure that multiple processes running as the repo owner can't screw each other since that's by far the common case. Since the unprivileged user won't even be able to open the lock file (it's opened 600 particularly so that only repo owners can take exclusive locks), then I don't think they'll be able to get the lock state.

An inotify watch is interesting. I'm not sure inotify can tell you when the lock state changes, though, only if it's been opened.

Yeah, having done a bit of reading, I don’t think it’s possible to monitor the status of a file lock using inotify. The only way of doing it is to poll with fcntl(F_GETLK), but that’s horrific and racy.

The other approach I can think of is to use a world-writeable directory (somewhere in /run, probably), which can contain the locks for many repositories, each with a deterministic name based on the repository directory’s device ID and inode.

That said, allowing an unprivileged process to take a lock on a repository it can’t write to allows priority inversion attacks, since there’s no realistic way to stop it taking an exclusive lock.

Unless we put shared locks in /run and exclusive locks in $repo/.lock?

I still believe the way this was written where only the repo owner can manage the lock is correct. I think any attempt to allow non-owners to take a lock would be very complex and likely mean that flock-type locking can't be used.

pwithnall · 2017-10-26T13:45:13Z

src/libostree/ostree-repo.c

+        gint64 lock_timeout = g_ascii_strtoll (lock_timeout_str, &endptr, 0);
+        if ((lock_timeout < -1) ||
+            (lock_timeout > G_MAXINT) ||
+            (lock_timeout == 0 && endptr == lock_timeout_str))


This doesn’t check for the case where *endptr != '\0', which could be caused by lock_timeout_str == "123somestring".

Good catch. I hadn't thought of that.

The g_ascii_strto*() are an absolute nightmare for correct error handling.

GLib recently acquired g_ascii_string_to_{un,}signed(), which are a lot better, but unfortunately too recent for libostree to use yet. :-(

Will fix after #1343. This part didn't make it into the slimmed down setup.

pwithnall · 2017-10-26T13:46:48Z

src/libostree/ostree-repo.c

+ * represents the number of seconds the application will sleep attempting to
+ * acquire the repository lock.
+ *
+ * Returns: lock timeout for the repository


Nitpick: maybe add , in seconds to this?

Will fix after #1343.

pwithnall · 2017-10-26T13:47:25Z

src/libostree/ostree-repo.c

+ * Set the lock timeout for the repository. A timeout of -1 indicates that the
+ * application will block until the lock is acquired. Otherwise, it represents
+ * the number of seconds the application will sleep attempting to acquire the
+ * repository lock.


Should mention that anything < -1 is not allowed.

Will fix after #1343.

pwithnall · 2017-10-26T13:47:45Z

src/libostree/ostree-repo.h

+ * OstreeRepoLockType:
+ * The type of repository lock to acquire.
+ * @OSTREE_REPO_LOCK_SHARED: A shared lock
+ * @OSTREE_REPO_LOCK_EXCLUSIVE: An exclusive lock


Missing a Since: line.

Good catch.

Fixed in #1343.

pwithnall · 2017-10-26T13:49:09Z

src/ostree/ot-main.c

+  gint64 lock_timeout = g_ascii_strtoll (value, &endptr, 0);
+  if ((lock_timeout < -1) ||
+      (lock_timeout > G_MAXINT) ||
+      (lock_timeout == 0 && endptr == value))


As with the other g_ascii_strtoll() call, this should check for *endptr != '\0' as another error condition.

Will fix after #1343.

rh-atomic-bot · 2017-11-03T22:20:16Z

☔ The latest upstream changes (presumably 519b30b) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2017-11-15T14:21:20Z

Would be happy to start trying to take this - perhaps it's simplest to split this up into a PR that adds the internal locking API with just a few uses - maybe just prune and commit? Then we can gradually trickle in changes to enable it in different places like fsck as we test?

dbnicholson · 2017-11-15T17:45:18Z

Sorry, got busy with a bunch of other stuff but I should be getting back to this today or tomorrow. I'll split it up into 2 PRs like you say. Some other stuff I've thought of in the meantime:

A repo wide locking enable boolean. I hope this wouldn't be necessary much, but it would allow programs to opt out of locking if we run into issues with this or if there's a sequence that's just too complex for the internal locking.
A thread specific locking enable boolean. This is my idea for handling the worker threads. What you'd do is take a lock, spawn the thread, and immediately disable locking for that thread. When the thread completed, the spawner would pop the lock. That would allow (for example) all the pull threads to run without locking while the thread spawner has the repo locked and avoid the issues with

dbnicholson · 2017-11-15T23:02:04Z

I opened #1343 with the slimmed down initial setup as requested. I still passes the test suite, although I haven't really used it in real applications.

cgwalters · 2018-01-08T18:05:01Z

How do you want to proceed with rebasing this? Are you planning to work on it anytime soon?

dbnicholson · 2018-01-08T20:16:02Z

Sorry, I had to take off my ostree developer hat for a while. Next week I'll get back on this and flesh it out. Does that timeline work or were you looking for something sooner?

cgwalters · 2018-01-08T20:33:43Z

No pressure on the timeline! Currently locking is disabled by default. I am happy to help out a bit if we can figure out how to subdivide work. Just trying to keep things moving!

openshift-ci-robot · 2021-04-12T21:19:14Z

@dbnicholson: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-06-29T14:45:58Z

@dbnicholson: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/sanity	`8c9f518`	link	true	`/test sanity`
ci/prow/images	`8c9f518`	link	true	`/test images`
ci/prow/fcos-e2e	`8c9f518`	link	true	`/test fcos-e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

dbnicholson mentioned this pull request Oct 18, 2017

Prune concurrency #813

Closed

dbnicholson force-pushed the repo-locking branch from d345e76 to d3daefd Compare October 18, 2017 18:03

pwithnall reviewed Oct 18, 2017

View reviewed changes

dbnicholson force-pushed the repo-locking branch from 266f3f9 to 65ba93b Compare October 19, 2017 19:37

dbnicholson and others added 15 commits October 23, 2017 20:18

lib/commit: Add repository locking during transactions

7be3051

Take a shared repo lock during a transaction to ensure that another process doesn't delete objects.

lib/prune: Take exclusive repository lock

d98bc52

Add exclusive repository locking to all the pruning entry points. This ensures that objects and deltas will not be removed while another process is writing to the repository.

tests: Test concurrent operations

19a4dbf

Test that concurrent commits and prunes can succeed. Mostly this is a check that the new locking works correctly and the concurrent processes will properly wait until they've acquired the appropriate repository lock.

tests: Add tests for repo locking APIs

0ab32a4

Exercise the ostree_repo_lock_push() and ostree_repo_lock_pop() APIs directly via pygi. This is a better unit test than relying on concurrent commit and prunes to check that the locking works correctly.

lib/repo: Take shared lock while generating summary

eba8a17

This ensures that the commits and deltas don't get deleted while generating the summary metadata.

lib/repo: Take exclusive lock while signing summary

23b7aa4

Make sure that the summary file does not get replaced while generating summary.sig by taking an exclusive lock. It would be bad to publish a mismatching signature.

lib/deltas: Take shared lock while generating

4b8d03e

Make sure that neither commit nor any of the objects they reference are deleted while generating a delta.

dbnicholson added 8 commits October 23, 2017 21:36

lib/cmdprivate: Add private access to ostree_repo_set_lock_timeout

615b949

Allow the ostree commands to use ostree_repo_set_lock_timeout() even when ostree is built without experimental API. When it's no longer experimental, this should be removed.

bin: Add --lock-timeout option to relevant commands

1ccd3db

All of these commands call one of the libostree functions that take locks, so add the --lock-timeout option to them.

bin/main: Provide repo lock autoptr implementation

e3a8476

Since ostree_repo_auto_lock_cleanup is not directly available, a separate type, wrappers and autoptr declaration are needed.

bin/fsck: Take exclusive lock for entire operation

7c14891

Although the libostree functions will take locks as needed, its best to ensure fsck has exclusive access to the repo for checking consistency.

bin/admin: Add --lock-timeout option to relevant commands

7ff4b93

All of these commands call one of the libostree functions that take locks, so add the --lock-timeout option to them.

dbnicholson force-pushed the repo-locking branch from 65ba93b to 7ff4b93 Compare October 23, 2017 22:04

squash! tests: Test concurrent operations

8c9f518

Make sure the number of committers is even since we only create half as many trees.

pwithnall requested changes Oct 26, 2017

View reviewed changes

dbnicholson mentioned this pull request Nov 15, 2017

Slimmer repo locking #1343

Closed

dbnicholson mentioned this pull request Nov 17, 2017

Basic repository locking endlessm/ostree#99

Merged

cgwalters changed the title ~~Repository locking~~ Repository locking 🔒 Dec 5, 2017

cgwalters mentioned this pull request Feb 3, 2018

ostree: new option for the overlay driver containers/storage#137

Merged

giuseppe mentioned this pull request Feb 16, 2018

syscontainer: remove of fallback version of skopeo projectatomic/atomic#1180

Closed

2 tasks

mwleeds mentioned this pull request Sep 28, 2018

[WIP] lib/repo: Validate config more strictly #1739

Open

dbnicholson mentioned this pull request Feb 26, 2021

Make repo locking API public #2286

Closed

openshift-ci-robot added the needs-rebase label Apr 12, 2021

dbnicholson mentioned this pull request Apr 15, 2021

Improve multi-threaded repo locking #2344

Closed

dbnicholson mentioned this pull request May 19, 2023

Allow clients to choose different repo lock timeout #2861

Open

Repository locking 🔒 #1292

Are you sure you want to change the base?

Repository locking 🔒 #1292

Conversation

dbnicholson commented Oct 18, 2017 • edited Loading

cgwalters commented Oct 18, 2017

dbnicholson commented Oct 18, 2017

cgwalters commented Oct 18, 2017

dbnicholson commented Oct 18, 2017

pwithnall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cgwalters commented Oct 19, 2017

dbnicholson commented Oct 19, 2017

rh-atomic-bot commented Oct 19, 2017

dbnicholson commented Oct 19, 2017

rh-atomic-bot commented Oct 20, 2017

cgwalters commented Oct 23, 2017

dbnicholson commented Oct 23, 2017

pwithnall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rh-atomic-bot commented Nov 3, 2017

cgwalters commented Nov 15, 2017

dbnicholson commented Nov 15, 2017

dbnicholson commented Nov 15, 2017

cgwalters commented Jan 8, 2018

dbnicholson commented Jan 8, 2018

cgwalters commented Jan 8, 2018

openshift-ci-robot commented Apr 12, 2021

openshift-ci bot commented Jun 29, 2023

dbnicholson commented Oct 18, 2017 •

edited

Loading