Prune concurrency #813

cgwalters · 2017-04-26T14:34:18Z

Migrating from https://bugzilla.gnome.org/show_bug.cgi?id=759442

cgwalters · 2017-04-26T14:37:01Z

There have been a lot of code changes since these patches were written - one of the stumbling blocks was that they were originally doing unlock() in the out: section but most of those no longer exist. I ported to use g_auto() which current libglnx has.

In prune at least we now have two functions, so I had to add locking there.

This is going to need careful auditing and review. There are also a few compiler errors to fix still, which I left intentionally to be sure we don't merge this until it's fixed more.

dbnicholson · 2017-04-26T15:27:58Z

I think it would be great (if possible) to add some tests using parallel to try to make a pull and prune race. Or maybe add some tests in the C or JS tests where this can be done at the API level with tigher control of the operations.

Minor, but I realized `checkout_tree_at()` is a better place to do common setup before checkout. Prep for ostreedev#813

Minor, but I realized `checkout_tree_at()` is a better place to do common setup before checkout. Prep for #813 Closes: #823 Approved by: jlebon

rh-atomic-bot · 2017-05-01T15:26:32Z

☔ The latest upstream changes (presumably 8d8f06f) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2017-05-09T21:01:21Z

Did some work on this; I rebased 🏄‍♂️ and worked on a test case which blows up with concurrent prunes; will debug more later.

dbnicholson · 2017-05-10T15:13:51Z

I've been looking at locking for some work at Endless. There are a few other spots I think could use some locking.

Signing a commit - Ensure the commit object doesn't get deleted while updating the detached metadata. Possibly this could just be a lock within ostree_repo_write_commit_detached_metadata so that any metadata update is covered. A shared lock would be fine.
Generating summary - You don't want objects or deltas to be deleted while generating the summary. This can probably be a shared lock, although part of me really wants an exclusive lock to ensure repo consistency in the summary file.
Signing summary - This is maybe more about Summary file races #487. You really want to make sure the signature actually matches the summary. I think the only way to accomplish it would be an exclusive lock. If the signing was done atomically with the generation, maybe this wouldn't be needed.
Generating deltas - You don't want objects to be deleted while a delta is being calculated. This should take a shared lock.
Pruning deltas - Take an exclusive so that summary generation doesn't try to read a deleted delta like with regular pruning.

rh-atomic-bot · 2017-05-11T14:24:39Z

☔ The latest upstream changes (presumably e6f17b9) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2017-05-15T21:57:32Z

Yeah...there is a long tail of stuff here. I think delta pruning falls under commit pruning, no? But today it is possible to regenerate a delta which isn't really safe at all against concurrent readers...that's its own problem.

I'm not sure why one would be signing a commit to be deleted? If it failed, it'd mostly be an operator issue?

At a high level, I'd say we should focus on in order, given concurrent commit + prune operations:

Avoiding data loss (hopefully of course this isn't real data loss since you could just retry the build from cached artifacts)
Avoiding race conditions seen by clients
Avoiding race conditions seen by repo operation tools

When a transaction is finished and we have moved all the staged loose objects into the repo we fsync all the object directory, to ensure the filenames are stable before we update the refs files to point to the new commits. With out this an unclean shutdown after the transaction is finished could result in a refs file that points to an incomplete commit. https://bugzilla.gnome.org/show_bug.cgi?id=759442

In general, we're pretty robust when there are multiple ongoing operations, because transactions are isolated to per-transaction staging dirs, and the transaction commit is a last-writer-win replace operation on immutable data. However, any non-atomic read operation can fail if it is concurrent with a prune, as then objects may disappear under the operations feet half-way. This patch-set makes this robust by having a repo lock that is taken in a shared fashion for all transactions, and for checkouts, but is taken in exclusive mode for prune. It also adds a non-blocking prune mode that allows you to do opportunistic prunes that don't block if there is an ongoing operation. There are a bunch of operations that are still unsafe (does not block prune), such as: cat, ls, show, log, diff, generate-delta, rev-parse. I don't think that is necessarily a huge problem, as these are mainly used in development or debugging, and a failure here will just be an error printed, it will never cause a broken repo. This is a simpler version of what i proposed at: https://mail.gnome.org/archives/ostree-list/2015-December/msg00024.html - it takes a lock during the whole transaction rather than retrying the transaction under a lock when commiting it. https://bugzilla.gnome.org/show_bug.cgi?id=759442

… prune ops

dbnicholson · 2017-05-19T15:37:10Z

For signing a commit that's getting deleted, it's not that you'd do it intentionally. It's that someone else starts pruning while you're trying to do it. I suppose you could just let it fail in that case, but it also seems like you could race with the prune and end up leaving dangling detached metadata around.

There's a separate entry point for ostree_repo_prune_static_deltas which isn't currently under lock. Which brings me to a separate thought I've been having about the design here.

Currently each caller except for the transaction lock separately opens the lock file. This has pros and cons. On the plus side, you're not at risk of lowering the lock state (either unlocking or dropping exclusive to shared) if some other part of the code has locked the repo. On the down side, there's no way to coordinate locking access and it leaves things open for deadlocking. I.e., it would be ideal to take an exclusive lock in ostree_repo_prune_static_deltas, but it's already taken in ostree_repo_prune, which calls ostree_repo_prune_static_deltas. So, you'd deadlock in the current design if you added exclusive locking in ostree_repo_prune_static_deltas and called ostree_repo_prune.

This also means, for example, that you can't lock the repo in your application before starting a chain of operations. This is the type of thing I'd like to do during a release - take an exclusive lock immediately after opening the repo and maintaining it for the entire release process. It might be overkill, but I'd really like to prevent anything else from touching the repo during that process.

In order to support that, you'd probably want to keep one lock in the OstreeRepo structure like is done for the transaction lock in this PR. However, to coordinate, you'd probably want to store the lock state. I.e., if asked to lock shared and the repo is already exclusively locked, then do nothing. Furthermore, to support multithreaded operations, you'd probably need a mutex or something to manage access to the lock and lock state. Maybe I'm overthinking it. I'm definitely not a locking guru.

Another ugly option would be to add a bunch of _locked API variants that don't attempt to take a lock because you already have one.

cgwalters · 2017-05-21T20:01:29Z

Yeah, I think you're right. The recursive case could be handled by tracking whether or not we already own the lock in this process, but to handle the general case of wanting exclusion across multiple operations, I'd agree we need to hoist the locking to be caller-controlled.

giuseppe · 2017-05-22T09:10:54Z

src/libostree/ostree-repo-private.h

+                          GError           **error);
+
+gboolean
+_ostree_repo_lock_exclusive (OstreeRepo        *self,


do you think it would make sense to expose these two functions externally? In general I think they can be useful when more complex prune logics are implemented by an application. In the particular case, there is still an issue with Atomic system containers when atomic images prune is used at the same time an image referring to a dangling layer (which is already present locally) is being pulled: prune might delete the layer but the pull process already checked it is present in the repository and it doesn't pull it again. If these two functions are made public, I could use them directly, otherwise we will probably need to implement the locking part separately.

cgwalters · 2017-06-12T14:29:16Z

Working on converting this to explicit locking.

rh-atomic-bot · 2017-06-26T16:16:03Z

☔ The latest upstream changes (presumably d0d5f54) made this pull request unmergeable. Please resolve the merge conflicts.

cgwalters · 2017-07-06T22:05:32Z

I was thinking about this more, and while it wouldn't be a complete fix, we could greatly mitigate things if we followed what low-pause memory GC algorithms do, which is: Compute references without locking, then stop and look for any new objects that have appeared, rescan those, etc. In general, we can also assume objects in any transaction staging dir are referenced.

dbnicholson · 2017-09-07T17:29:16Z

I've been thinking about this off and on for a while. I think that in order to be really useful, the lock should be acquired freely so that you can take it freely within ostree or outside ostree. I'm definitely no locking guru, but here are the features I think are needed to accomplish that:

Read/write locking - As discussed above, only a few operations need exclusive access to the repo and locking the repo exclusively would block a lot of operations that are safe to run concurrently.
Recursive locking - To coherently take the lock at many places and leave the previous lock state intact. When the lock is already taken exclusively, you'd do nothing to not lower the lock state.
Threadsafe - If you agree that recursive locking is desired, then I believe it would be difficult to implement if each thread is operating on the same lock. I think it would be reasonable to have each thread operate on an independent lock.

I came up with something layered on top of flock(2) that seems to work. It uses GPrivate so that the lock is in thread local storage. Here's the basics:

typedef struct {
  GQueue *stack;
  int fd;
} Lock;

static void
free_lock (gpointer data)
{
  Lock *lock = data;

  if (lock != NULL)
    {
      g_queue_free (lock->stack);
      if (lock->fd >= 0)
        close (lock->fd);
      g_free (lock);
    }
}

static GPrivate repo_lock = G_PRIVATE_INIT (free_lock);

static gboolean
push_lock (int flags)
{
  Lock *lock;
  int cur_state;

  g_assert (flags & (LOCK_EX | LOCK_SH));

  lock = g_private_get (&repo_lock);
  if (!lock)
    {
      lock = g_new (Lock, 1);
      lock->stack = g_queue_new ();
      lock->fd = openat (AT_FDCWD, "lock", O_CREAT | O_RDWR | O_CLOEXEC,
                         0600);
      if (lock->fd < 0)
        {
          g_printerr ("Open of lock failed: %s\n", g_strerror (errno));
          return FALSE;
        }
      g_private_set (&repo_lock, lock);
    }

  if (g_queue_get_length (lock->stack) == 0)
    cur_state = LOCK_UN;
  else
    cur_state = GPOINTER_TO_INT (g_queue_peek_head (lock->stack));
  if (cur_state == LOCK_EX)
    {
      g_queue_push_head (lock->stack, GINT_TO_POINTER (LOCK_EX));
    }
  else
    {
      int state = (flags & LOCK_EX) ? LOCK_EX : LOCK_SH;

      if (flock (lock->fd, flags) < 0)
        {
          g_printerr ("Lock failed: %s\n", g_strerror (errno));
          return FALSE;
        }

      g_queue_push_head (lock->stack, GINT_TO_POINTER (state));
    }

  return TRUE;
}

static gboolean
pop_lock (void)
{
  Lock *lock;

  lock = g_private_get (&repo_lock);
  g_return_val_if_fail (lock != NULL, FALSE);
  g_return_val_if_fail (lock->stack != NULL, FALSE);
  g_return_val_if_fail (g_queue_get_length (lock->stack) > 0, FALSE);
  g_return_val_if_fail (lock->fd != -1, FALSE);

  if (g_queue_get_length (lock->stack) > 1)
    {
      int cur_state = GPOINTER_TO_INT (g_queue_peek_head (lock->stack));
      int next_state = GPOINTER_TO_INT (g_queue_peek_nth (lock->stack, 1));

      /* Drop back to the previous lock state if it differs */
      if (next_state != cur_state)
        {
          /* We should never drop from shared to exclusive */
          g_return_val_if_fail (next_state == LOCK_SH, FALSE);
          if (flock (lock->fd, next_state) < 0)
            {
              g_printerr ("Drop to shared lock failed: %s\n",
                          g_strerror (errno));
              return FALSE;
            }
        }
    }
  else
    {
      /* Lock stack will be empty, unlock */
      if (flock (lock->fd, LOCK_UN) < 0)
        {
          g_printerr ("Unlock failed: %s\n", g_strerror (errno));
          return FALSE;
        }
    }

  g_queue_pop_head (lock->stack);

  return TRUE;
}

A GQueue is used to keep a stack of lock states and if the lock is already exclusive, then it just keeps that and puts another LOCK_EX on the stack.

Obviously that's not totally usable as is (you'd want to control the lock file path, use GError, put some nicer wrappers around it, etc), but I was wondering what you thought about this concept.

dbnicholson · 2017-10-18T17:55:09Z

I came up with an alternative that I think handles most use cases in #1292.

cgwalters · 2017-11-29T16:58:36Z

Closing this in favor of #813 (comment)

cgwalters added a commit to cgwalters/ostree that referenced this pull request Apr 28, 2017

checkout: Dedup calls to memcache ref

ed253bb

Minor, but I realized `checkout_tree_at()` is a better place to do common setup before checkout. Prep for ostreedev#813

cgwalters mentioned this pull request Apr 28, 2017

checkout: Dedup calls to memcache ref #823

Closed

rh-atomic-bot pushed a commit that referenced this pull request May 1, 2017

checkout: Dedup calls to memcache ref

8d8f06f

Minor, but I realized `checkout_tree_at()` is a better place to do common setup before checkout. Prep for #813 Closes: #823 Approved by: jlebon

cgwalters added the reviewed label May 3, 2017

cgwalters mentioned this pull request May 3, 2017

Use more specific error codes #830

Open

cosimoc mentioned this pull request May 3, 2017

Retry update without static deltas endlessm/gnome-software#217

Merged

cgwalters force-pushed the prune-concurrency branch from 7a1f9fd to e0695ad Compare May 9, 2017 21:00

alexlarsson added 2 commits May 18, 2017 15:23

cgwalters force-pushed the prune-concurrency branch from e0695ad to 472863d Compare May 18, 2017 19:50

fixup! repo: Add a file lock to protect operations against concurrent…

de73ba1

… prune ops

giuseppe reviewed May 22, 2017

View reviewed changes

cgwalters mentioned this pull request Jun 8, 2017

repo: After renaming in all loose objects, ensure metadata is stable #918

Closed

cgwalters self-assigned this Jun 12, 2017

dbnicholson mentioned this pull request Oct 18, 2017

Repository locking 🔒 #1292

Open

cgwalters closed this Nov 29, 2017

cgwalters mentioned this pull request Apr 23, 2018

lib/repo: Enable locking by default, make API non-experimental #1555

Closed

cgwalters mentioned this pull request Apr 16, 2021

Improve multi-threaded repo locking #2344

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prune concurrency #813

Prune concurrency #813

cgwalters commented Apr 26, 2017

cgwalters commented Apr 26, 2017 •

edited

Loading

dbnicholson commented Apr 26, 2017

rh-atomic-bot commented May 1, 2017

cgwalters commented May 9, 2017

dbnicholson commented May 10, 2017

rh-atomic-bot commented May 11, 2017

cgwalters commented May 15, 2017

dbnicholson commented May 19, 2017

cgwalters commented May 21, 2017

giuseppe May 22, 2017

cgwalters commented Jun 12, 2017

rh-atomic-bot commented Jun 26, 2017

cgwalters commented Jul 6, 2017

dbnicholson commented Sep 7, 2017

dbnicholson commented Oct 18, 2017

cgwalters commented Nov 29, 2017

Prune concurrency #813

Prune concurrency #813

Conversation

cgwalters commented Apr 26, 2017

cgwalters commented Apr 26, 2017 • edited Loading

dbnicholson commented Apr 26, 2017

rh-atomic-bot commented May 1, 2017

cgwalters commented May 9, 2017

dbnicholson commented May 10, 2017

rh-atomic-bot commented May 11, 2017

cgwalters commented May 15, 2017

dbnicholson commented May 19, 2017

cgwalters commented May 21, 2017

giuseppe May 22, 2017

Choose a reason for hiding this comment

cgwalters commented Jun 12, 2017

rh-atomic-bot commented Jun 26, 2017

cgwalters commented Jul 6, 2017

dbnicholson commented Sep 7, 2017

dbnicholson commented Oct 18, 2017

cgwalters commented Nov 29, 2017

cgwalters commented Apr 26, 2017 •

edited

Loading