Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repository locking 🔒 #1292

Open
wants to merge 28 commits into
base: main
Choose a base branch
from
Open

Conversation

dbnicholson
Copy link
Member

@dbnicholson dbnicholson commented Oct 18, 2017

In #813, some simple repository locking was being considered to prevent prunes from running concurrently with checkouts and transactions. While I think that would be a great thing, I don't think that level of locking is rich enough for ostree. Since the locking isn't recursive, it would prevent both libostree and ostree programs from obtaining an exclusive lock if it would also happen somewhere else in libostree. The simple example I can think of is ostree_repo_prune_static_deltas(). You'd want to take an exclusive lock there as an entry point to libostree, but it's also called from ostree_repo_prune(), which you'd also want to take an exclusive lock for. That would simply deadlock.

The alternative I came up with here is to maintain the lock in thread local storage with a stack of lock states. Then each thread locks independently and the stack stays coherent. 8c66c5a has that implementation with a large comment in ostree-repo.c describing how it works. The API is then a push/pop interface.

The drawback of having the lock in TLS is that it seems to prevent taking the lock asynchronously since that would likely involve having another thread acquire the lock. Once that thread completed, the lock would be dropped. So, the implementation here blocks synchronously when pushing or popping the lock. The lock timeout can be managed globally with a repo config option or from an API.

All the new APIs are experimental, but I think it would be best to make them public so ostree programs can do their own locking around a series of ostree operations.

I added a bunch of locking in places I thought it would be needed. One spot I didn't try to tackle was the core and remote config. Thinking about all the use cases got me confused, so I punted for now...

This closes https://bugzilla.gnome.org/show_bug.cgi?id=759442.

Let me know what you think! I know there's a lot here, but this issue has affected us quite a bit at Endless.

@cgwalters
Copy link
Member

High level looks sane at a brief glance, but there's lots of details here. (One I have is how this intersects with worker threads we create internally)

How about we avoid lots of #ifdefs here by having a private version that's used internally, then the "experimental" bit is just exposing it publicly? The private version could actually be a repo setting too so it doesn't depend on the experimental-API build?

@dbnicholson
Copy link
Member Author

High level looks sane at a brief glance, but there's lots of details here. (One I have is how this intersects with worker threads we create internally)

Yeah, I tried to think about that a lot, but I couldn't totally come to grips with it. Basically, the worker threads are OK if you only ever acquire a shared lock. Which (I believe) is the case now. Otherwise, you have to arrange to take the lock before the worker is spawned and ensure that the worker doesn't go through a path that acquires locks.

I spent a lot of time thinking about that how to address that. I had another version that tried to detect if the lock was being acquired in the same thread by stashing the main context in use when the repo was opened and skipping if the main context in use when the lock was acquired wasn't the same. But that only tells you if the context is the same, not if the thread is the same. (I might be missing something there. I am definitely not the GMainContext guru.)

The only other thing I could think of was having a dedicated locking thread, but I then didn't know how to address the lock stack being updated out of order by multiple threads.

How about we avoid lots of #ifdefs here by having a private version that's used internally, then the "experimental" bit is just exposing it publicly?

You mean in src/ostree? I could add a cmdprivate pointer if that's better. Most of src/libostree is clean except for the headers.

The private version could actually be a repo setting too so it doesn't depend on the experimental-API build?

Not sure I follow here. What would the setting be?

@cgwalters
Copy link
Member

Yeah, I mean the binaries would use a cmdprivate API to do locking, which we could then enable by setting e.g.

[core]
locking=true

or so in the repo config. (Which itself we could label as experimental I guess). That'd allow us to land most of these patches and gain experience with it a bit, even on builds that don't --enable-experimental-api.

@dbnicholson
Copy link
Member Author

Oh, well I only added actual locking in fsck. The rest of the stuff in src/ostree is just turning the --lock-timeout option into a call to ostree_repo_set_lock_timeout(), which could be cmdprivate like you say.

However, the repo config option could be used to make ostree_repo_lock_push() and ostree_repo_lock_pop() into noops. That's a good idea in case this is broken...

I also see that the CI failed immediately because I haven't actually tested without --enable-experimental-api in a while. Whoops.

Copy link
Member

@pwithnall pwithnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. I haven’t reviewed the locking code itself in detail yet.

<listitem>
<para>
Integer value controlling the default repository locking
timeout in seconds. If the value is <literal>-1</literal>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the value is 0 or <-1? Could go with “If the value is negative then ostree will block…”

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 means don't wait - true non-blocking. I thought that was implied, but I could spell it out. < -1 is an error. I could just do all negative values, but I liked having the explicit value. It also allowed me to later use -2 as a sentinel.

man/ostree.xml Outdated
Some <command>ostree</command> commands require the
repository to be locked while operating. This option sets
the repository lock timeout to TIMEOUT seconds. If TIMEOUT
is <literal>-1</literal>, then the locking will block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here: what if it’s <-1 or 0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.


typedef OstreeRepo OstreeRepoAutoLock;
OstreeRepoAutoLock * ostree_repo_auto_lock_push (OstreeRepo *self,
gboolean exclusive,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might make sense to use an enum or flags instead of a boolean here, since it’s typically not obvious what the sense of a boolean is. Does TRUE mean ‘make it exclusive’ or ‘make it shared’?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about that later after doing TRUE/FALSE so many times. It seemed sort of silly to have an enum that would only ever have 2 exclusive states. I guess:

typedef enum {
  OSTREE_REPO_LOCK_SHARED,
  OSTREE_REPO_LOCK_EXCLUSIVE
} OstreeRepoLockType;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think people (me included) should get out of the habit of thinking of two-element enums as ‘silly’, if they improve code readability. That said, there’s the possibility that this could be extended in future to include more flags or enum elements for more esoteric extensions to the locking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll do it. That said, I would like this to remain a simple enum and not migrate into some flags setup later.

@@ -37,6 +37,9 @@ G_BEGIN_DECLS
#define _OSTREE_MAX_OUTSTANDING_FETCHER_REQUESTS 8
#define _OSTREE_MAX_OUTSTANDING_DELTAPART_REQUESTS 2

/* Keep this in sync with the man page for the core.lock-timeout option. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention that it’s in seconds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@@ -153,6 +157,7 @@ struct OstreeRepo {
guint64 tmp_expiry_seconds;
gchar *collection_id;
gboolean add_remotes_config_dir; /* Add new remotes in remotes.d dir */
gint lock_timeout;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name it lock_timeout_seconds to make the units more obvious? Same for the #define above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

g_autoptr(OstreeRepoAutoLock) lock = NULL;
lock = ostree_repo_auto_lock_push (self, TRUE, cancellable, error);
if (!lock)
return FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought: if you changed the definition of ostree_repo_auto_lock_push() to:
gboolean ostree_repo_auto_lock_push (OstreeRepo *self, OstreeRepoLockFlags flags, OstreeRepoAutoLock **out_lock, GCancellable *cancellable, GError **error), then all the uses of it could simplify to:

g_autoptr(OstreeRepoAutoLock) lock = NULL;
if (!ostree_repo_auto_lock_push (self, OSTREE_REPO_LOCK_EXCLUSIVE, &lock, cancellable, error))
  return FALSE;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way on this and did consider that interface. I do generally like ostree's boolean return values with supplemental information in out params. In this case it seemed overkill to me, though, as both the return value and the out param would be transmitting the same information. Would the out param be a requirement? it would be useless to use the function otherwise. Not that we're counting cycles in ostree, but it's also less efficient to put another parameter on the stack and fill it in.

You can, of course, do the call and check in one line if you want with the current interface:

if (!(lock = ostree_repo_auto_lock_push(...)))
  return FALSE;

That's definitely not as aesthetically pleasing as the interface you've suggested, which is also why I prefer to write it on 2 lines.

cur_state_name = "null";
}

g_debug("Free lock: state=%s, depth=%u", cur_state_name, stack_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Missing space before ( (and in a couple of other places).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

* configured in the core.lock-timeout option will replace this.
*
* Returns: %TRUE on success, %FALSE otherwise
* Since: 2017.8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since: 2017.13

(probably wrong in a few other places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah. I actually think I was pretty good about setting it to 2017.13, but I'll skim through for that. Thanks for catching that.

/**
* OstreeRepoAutoLock: (skip)
*
* This is simply an alias tox #OstreeRepo used for automatic lock cleanup.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tox/to/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I saw that myself later.

@cgwalters
Copy link
Member

Yeah...we should probably try to move fsck into the library too. But doesn't need to block this patch.

Let's try to get this cleaned up and then I'm happy to merge it and we can do followup commits?

@dbnicholson
Copy link
Member Author

@cgwalters Had another thought about the threads.

  • If the lock to be taken is shared, only do it when outside a transaction. Assuming that the transaction is started on a thread that will live throughout the life of the transaction, then you already have a shared lock.

  • If the lock to be taken is exclusive, add internal _unlocked versions that would be called in any place that would be run from a worker thread. So far I've only encountered ostree_repo_delete_object removing tombstones when writing out commits.

I have a bunch of fixups that I'll push out shortly that should address the comments so far.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 2531d8f) made this pull request unmergeable. Please resolve the merge conflicts.

@dbnicholson
Copy link
Member Author

OK, I added a bunch of fixups that I'll squash soon, but I wanted to give a chance to see the intermediate state if anyone was interested. I believe I addressed most of the comments so far.

TODO:

  • Add the core.locking option to globally enable or disable locking
  • Investigate locking from worker threads more
  • Look at locking for core and remote config

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 9166605) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member

Are you going to have a chance to rebase this soon? If not I might be able to do it early this week.

@dbnicholson
Copy link
Member Author

Yeah, I'll rebase later today. Some of the commits should be reordered a bit.

dbnicholson and others added 15 commits October 23, 2017 20:18
When compiling libostree, OSTREE_ENABLE_EXPERIMENTAL_API is managed via
config.h. However, g-ir-scanner doesn't use that and won't include the
experimental APIs unless the macro is defined through another means.
Without this, none of the experimental APIs were being included in the
gir data.
Both of these were inadvertently being included in the GIR. The
OstreeCmdPrivateVTable declaration ended up causing a g-ir-compiler
error because I wanted to add a function that had a return type that was
marked as skipped (OstreeRepoAutoLock). The ostree_cmd__private__
documentation just had the wrong name.
Currently ostree has no method of guarding against concurrent pruning.
When there are multiple repo writers, it's possible to have a pull or
commit race against a prune and end up with missing objects.

This adds an file based repo locking mechanism. The intention is to take
a shared lock when writing objects and an exclusive lock when deleting
them. In order to make use of the locking throughout the library in a
fine grained fashion, the lock acts recursively with a stack of lock
states. If the lock becomes exclusive, it will stay in that state until
the stack is unwound past the initial exclusive push. The file locking
is similar to GLnxLockFile in that it uses open file descriptor locks
but falls back to flock when needed.

The lock also attempts to be thread safe by storing the lock state in
thread local storage with GPrivate. This means that each thread will
have an independent lock for each repository it opens. There are some
drawbacks to that, but it seemed impossible to manage the lock state
coherently in the face of multithreaded access.

The API is a push/pop interface in accordance with the recursive nature
of the locking. The push interface uses an enum that's translated to
LOCK_SH or LOCK_EX as needed. Both interfaces use an internal timeout
field to decide whether to manage the lock in a blocking or non-blocking
fashion. The intention is to allow ostree applications as well as
administrators to control this timeout. For now, the default is a 30
second timeout.

Note that the timeout is handled synchronously in thread since the lock
is maintained in thread local storage. I.e., the thread that acquires
the lock needs to be the same thread that runs the operation. There may
be a way to offer an asynchronous version, but it's not clear exactly
how that would work since it would likely involve a separate thread that
invokes a callback when the locking operation completes.

https://bugzilla.gnome.org/show_bug.cgi?id=759442
Define an auto cleanup handler for use with repo locking. This is based
on the existing auto transaction cleanup. A wrapper for
ostree_repo_lock_push() is added with it. The intended usage is like so:

  g_autoptr(OstreeRepoAutoLock) lock = NULL;
  lock = ostree_repo_auto_lock_push (repo, lock_type, cancellable, error);
  if (!lock)
    return FALSE;

The functions and type are marked to be skipped by introspection since I
can't see them being usable from bindings.
Take a shared repo lock during a transaction to ensure that another
process doesn't delete objects.
Add exclusive repository locking to all the pruning entry points. This
ensures that objects and deltas will not be removed while another
process is writing to the repository.
Test that concurrent commits and prunes can succeed. Mostly this is a
check that the new locking works correctly and the concurrent processes
will properly wait until they've acquired the appropriate repository
lock.
Set the PYTHONUNBUFFERED environment variable during tests so that
python leaves stdout unbuffered. This is helpful when reading logs for
failures since the interleaved stdout and stderr will generally come out
in the right order. It's not perfect since tap-driver.sh does some
special redirection to the log file, but it's an improvement.
When checking out objects, take a shared lock to ensure objects aren't
deleted. If the repo isn't writable, the locking is a noop. This ensures
that the repository owner can make robust checkouts, but it still allows
other users to make a likely to be successful checkout if the repo mode
supports it. During garbage collection, take an exclusive lock while
deleting uncompressed objects.
Add a repository option, core.lock-timeout that controls the default
lock timeout used. This allows administrators to have some influence
over locking operations. For instance, critical repositories where
operations should not fail could be configured to block when managing
locks by setting lock-timeout to -1.
Exercise the ostree_repo_lock_push() and ostree_repo_lock_pop() APIs
directly via pygi. This is a better unit test than relying on concurrent
commit and prunes to check that the locking works correctly.
Ensure that the commit object doesn't get deleted while writing out the
detached metadata. This is primarily to guard against dangling
commitmeta files if racing with a prune since the commit object could
still be updated while this is happening.
This ensures that the commits and deltas don't get deleted while
generating the summary metadata.
Make sure that the summary file does not get replaced while generating
summary.sig by taking an exclusive lock. It would be bad to publish a
mismatching signature.
Make sure that neither commit nor any of the objects they reference are
deleted while generating a delta.
Allow the ostree commands to use ostree_repo_set_lock_timeout() even
when ostree is built without experimental API. When it's no longer
experimental, this should be removed.
Builtins that require repository locking can add options with the
OSTREE_BUILTIN_FLAG_LOCKING flag set to add a --lock-timeout option.
This allows the user to control how long to wait to acquire the
repository lock.

The cmdprivate version of ostree_repo_set_lock_timeout() is used until
it's no longer experimental.
All of these commands call one of the libostree functions that take
locks, so add the --lock-timeout option to them.
Allow the ostree commands to use ostree_repo_auto_lock_push() and
ostree_repo_auto_lock_cleanup() even when ostree is built without
experimental API. When it's no longer experimental, this should be
removed.

Unfortunately, this requires including ostree-repo-private.h from
ostree-cmdprivate.h. That includes otutil.h, which the systemd generator
build wasn't prepared to handle.
Since ostree_repo_auto_lock_cleanup is not directly available, a
separate type, wrappers and autoptr declaration are needed.
Although the libostree functions will take locks as needed, its best to
ensure fsck has exclusive access to the repo for checking consistency.
Admin builtins that require repository locking set the
OSTREE_BUILTIN_FLAG_LOCKING flag to add a --lock-timeout option. This
needs to be used in conjunction with OSTREE_BUILTIN_FLAG_NO_REPO since
the the timeout can only be applied once the repo is gathered from the
sysroot. This allows the user to control how long to wait to acquire the
repository lock.

The cmdprivate version of ostree_repo_set_lock_timeout() is used until
it's no longer experimental.
All of these commands call one of the libostree functions that take
locks, so add the --lock-timeout option to them.
Make sure the number of committers is even since we only create half as
many trees.
Copy link
Member

@pwithnall pwithnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more in-depth review of the locking code. I haven’t checked the lock call sites in the various ostree builtins or APIs (and don’t plan to; they can be tweaked over time if problems arise).

@@ -261,6 +261,10 @@ OSTree-1.0.gir: libostree-1.la Makefile
OSTree_1_0_gir_EXPORT_PACKAGES = ostree-1
OSTree_1_0_gir_INCLUDES = Gio-2.0
OSTree_1_0_gir_CFLAGS = $(libostree_1_la_CFLAGS)
if ENABLE_EXPERIMENTAL_API
# When compiling this is set via config.h, but g-ir-scanner doesn't use that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just be able to pass config.h to OSTree_1_0_gir_FILES and it will be #included into the temporary C file which g-ir-scanner generates as part of the scanning process.

I think order is preserved, so list it first in OSTree_1_0_gir_FILES.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I didn't know that. I'll try that locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was separately fixed in #1322.

@@ -1,3 +1,4 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spurious blank line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address after #1343.

@@ -48,7 +49,14 @@ ostree_cmd__private__ (void)
impl_ostree_generate_grub2_config,
_ostree_repo_static_delta_dump,
_ostree_repo_static_delta_query_exists,
_ostree_repo_static_delta_delete
_ostree_repo_static_delta_delete,
/* Remove this when ostree_repo_set_lock_timeout is no longer experimental */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a FIXME to this comment (and the one below) to make them more greppable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address after #1343.

@@ -20,17 +20,34 @@
#pragma once

#include "ostree-types.h"
/* This is only needed to get the OstreeRepoLockType and OstreeRepoAutoLock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a FIXME to this comment too. Similarly for other ‘remove this when no longer experimental’ comments throughout the rest of the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address after #1343.

@@ -991,6 +992,16 @@ checkout_tree_at (OstreeRepo *self,
g_assert (options->force_copy);
}

/* Take a shared repo lock to try to ensure objects aren't deleted. If the
* repo is not writable, this will be a noop and we just hope for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the repo is not writeable can we really expect objects to be deleted? Are you thinking of the situation where a repo is being operated on by processes belonging to two users: the user deleting objects has write permissions on the repository, and the user reading it does not?

We could perhaps fix this by sticking an inotify on the lock file if we only have read-only access to the repository, and bailing out if an exclusive lock is taken on the repository while we’re in a locked section. That might be overengineering things though. It would, however, be a general solution which could be implemented in the generic locking code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case is simply an unprivileged user trying to checkout from /ostree/repo or /var/lib/flatpak/repo or something like that. They should be allowed to do a checkout as they were before.

Like you say, if the repo is not writable by the current user, then they're not going to be deleting any objects. But if, say, flatpak is pruning /var/lib/flatpak/repo while you're trying to do a checkout, bad things could happen.

But that's already the case, and I don't think we should optimize for unprivileged users. I'm really just trying to make sure that multiple processes running as the repo owner can't screw each other since that's by far the common case. Since the unprivileged user won't even be able to open the lock file (it's opened 600 particularly so that only repo owners can take exclusive locks), then I don't think they'll be able to get the lock state.

An inotify watch is interesting. I'm not sure inotify can tell you when the lock state changes, though, only if it's been opened.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having done a bit of reading, I don’t think it’s possible to monitor the status of a file lock using inotify. The only way of doing it is to poll with fcntl(F_GETLK), but that’s horrific and racy.

The other approach I can think of is to use a world-writeable directory (somewhere in /run, probably), which can contain the locks for many repositories, each with a deterministic name based on the repository directory’s device ID and inode.

That said, allowing an unprivileged process to take a lock on a repository it can’t write to allows priority inversion attacks, since there’s no realistic way to stop it taking an exclusive lock.

Unless we put shared locks in /run and exclusive locks in $repo/.lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe the way this was written where only the repo owner can manage the lock is correct. I think any attempt to allow non-owners to take a lock would be very complex and likely mean that flock-type locking can't be used.

gint64 lock_timeout = g_ascii_strtoll (lock_timeout_str, &endptr, 0);
if ((lock_timeout < -1) ||
(lock_timeout > G_MAXINT) ||
(lock_timeout == 0 && endptr == lock_timeout_str))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn’t check for the case where *endptr != '\0', which could be caused by lock_timeout_str == "123somestring".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I hadn't thought of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The g_ascii_strto*() are an absolute nightmare for correct error handling.

GLib recently acquired g_ascii_string_to_{un,}signed(), which are a lot better, but unfortunately too recent for libostree to use yet. :-(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix after #1343. This part didn't make it into the slimmed down setup.

* represents the number of seconds the application will sleep attempting to
* acquire the repository lock.
*
* Returns: lock timeout for the repository
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: maybe add , in seconds to this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix after #1343.

* Set the lock timeout for the repository. A timeout of -1 indicates that the
* application will block until the lock is acquired. Otherwise, it represents
* the number of seconds the application will sleep attempting to acquire the
* repository lock.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mention that anything < -1 is not allowed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix after #1343.

* OstreeRepoLockType:
* The type of repository lock to acquire.
* @OSTREE_REPO_LOCK_SHARED: A shared lock
* @OSTREE_REPO_LOCK_EXCLUSIVE: An exclusive lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a Since: line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #1343.

gint64 lock_timeout = g_ascii_strtoll (value, &endptr, 0);
if ((lock_timeout < -1) ||
(lock_timeout > G_MAXINT) ||
(lock_timeout == 0 && endptr == value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As with the other g_ascii_strtoll() call, this should check for *endptr != '\0' as another error condition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix after #1343.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 519b30b) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member

Would be happy to start trying to take this - perhaps it's simplest to split this up into a PR that adds the internal locking API with just a few uses - maybe just prune and commit? Then we can gradually trickle in changes to enable it in different places like fsck as we test?

@dbnicholson
Copy link
Member Author

Sorry, got busy with a bunch of other stuff but I should be getting back to this today or tomorrow. I'll split it up into 2 PRs like you say. Some other stuff I've thought of in the meantime:

  • A repo wide locking enable boolean. I hope this wouldn't be necessary much, but it would allow programs to opt out of locking if we run into issues with this or if there's a sequence that's just too complex for the internal locking.

  • A thread specific locking enable boolean. This is my idea for handling the worker threads. What you'd do is take a lock, spawn the thread, and immediately disable locking for that thread. When the thread completed, the spawner would pop the lock. That would allow (for example) all the pull threads to run without locking while the thread spawner has the repo locked and avoid the issues with

@dbnicholson
Copy link
Member Author

I opened #1343 with the slimmed down initial setup as requested. I still passes the test suite, although I haven't really used it in real applications.

@cgwalters cgwalters changed the title Repository locking Repository locking 🔒 Dec 5, 2017
@cgwalters
Copy link
Member

How do you want to proceed with rebasing this? Are you planning to work on it anytime soon?

@dbnicholson
Copy link
Member Author

Sorry, I had to take off my ostree developer hat for a while. Next week I'll get back on this and flesh it out. Does that timeline work or were you looking for something sooner?

@cgwalters
Copy link
Member

No pressure on the timeline! Currently locking is disabled by default. I am happy to help out a bit if we can figure out how to subdivide work. Just trying to keep things moving!

@openshift-ci-robot
Copy link
Collaborator

@dbnicholson: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link

openshift-ci bot commented Jun 29, 2023

@dbnicholson: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/sanity 8c9f518 link true /test sanity
ci/prow/images 8c9f518 link true /test images
ci/prow/fcos-e2e 8c9f518 link true /test fcos-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants