Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune concurrency #813

Closed
wants to merge 3 commits into from
Closed

Conversation

cgwalters
Copy link
Member

@cgwalters
Copy link
Member Author

cgwalters commented Apr 26, 2017

There have been a lot of code changes since these patches were written - one of the stumbling blocks was that they were originally doing unlock() in the out: section but most of those no longer exist. I ported to use g_auto() which current libglnx has.

In prune at least we now have two functions, so I had to add locking there.

This is going to need careful auditing and review. There are also a few compiler errors to fix still, which I left intentionally to be sure we don't merge this until it's fixed more.

@dbnicholson
Copy link
Member

I think it would be great (if possible) to add some tests using parallel to try to make a pull and prune race. Or maybe add some tests in the C or JS tests where this can be done at the API level with tigher control of the operations.

cgwalters added a commit to cgwalters/ostree that referenced this pull request Apr 28, 2017
Minor, but I realized `checkout_tree_at()` is a better place to
do common setup before checkout.  Prep for
ostreedev#813
rh-atomic-bot pushed a commit that referenced this pull request May 1, 2017
Minor, but I realized `checkout_tree_at()` is a better place to
do common setup before checkout.  Prep for
#813

Closes: #823
Approved by: jlebon
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 8d8f06f) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

Did some work on this; I rebased 🏄‍♂️ and worked on a test case which blows up with concurrent prunes; will debug more later.

@dbnicholson
Copy link
Member

I've been looking at locking for some work at Endless. There are a few other spots I think could use some locking.

  • Signing a commit - Ensure the commit object doesn't get deleted while updating the detached metadata. Possibly this could just be a lock within ostree_repo_write_commit_detached_metadata so that any metadata update is covered. A shared lock would be fine.
  • Generating summary - You don't want objects or deltas to be deleted while generating the summary. This can probably be a shared lock, although part of me really wants an exclusive lock to ensure repo consistency in the summary file.
  • Signing summary - This is maybe more about Summary file races #487. You really want to make sure the signature actually matches the summary. I think the only way to accomplish it would be an exclusive lock. If the signing was done atomically with the generation, maybe this wouldn't be needed.
  • Generating deltas - You don't want objects to be deleted while a delta is being calculated. This should take a shared lock.
  • Pruning deltas - Take an exclusive so that summary generation doesn't try to read a deleted delta like with regular pruning.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably e6f17b9) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

Yeah...there is a long tail of stuff here. I think delta pruning falls under commit pruning, no? But today it is possible to regenerate a delta which isn't really safe at all against concurrent readers...that's its own problem.

I'm not sure why one would be signing a commit to be deleted? If it failed, it'd mostly be an operator issue?

At a high level, I'd say we should focus on in order, given concurrent commit + prune operations:

  • Avoiding data loss (hopefully of course this isn't real data loss since you could just retry the build from cached artifacts)
  • Avoiding race conditions seen by clients
  • Avoiding race conditions seen by repo operation tools

When a transaction is finished and we have moved all the staged loose
objects into the repo we fsync all the object directory, to ensure the
filenames are stable before we update the refs files to point to the
new commits.

With out this an unclean shutdown after the transaction is finished
could result in a refs file that points to an incomplete commit.

https://bugzilla.gnome.org/show_bug.cgi?id=759442
In general, we're pretty robust when there are multiple ongoing operations,
because transactions are isolated to per-transaction staging dirs, and the
transaction commit is a last-writer-win replace operation on immutable data.

However, any non-atomic read operation can fail if it is concurrent with a
prune, as then objects may disappear under the operations feet half-way.

This patch-set makes this robust by having a repo lock that is taken in a shared
fashion for all transactions, and for checkouts, but is taken in exclusive mode
for prune. It also adds a non-blocking prune mode that allows you to do
opportunistic prunes that don't block if there is an ongoing operation.

There are a bunch of operations that are still unsafe (does not block prune),
such as: cat, ls, show, log, diff, generate-delta, rev-parse. I don't think that
is necessarily a huge problem, as these are mainly used in development or
debugging, and a failure here will just be an error printed, it will never cause
a broken repo.

This is a simpler version of what i proposed at:
https://mail.gnome.org/archives/ostree-list/2015-December/msg00024.html - it
takes a lock during the whole transaction rather than retrying the transaction
under a lock when commiting it.

https://bugzilla.gnome.org/show_bug.cgi?id=759442
@dbnicholson
Copy link
Member

For signing a commit that's getting deleted, it's not that you'd do it intentionally. It's that someone else starts pruning while you're trying to do it. I suppose you could just let it fail in that case, but it also seems like you could race with the prune and end up leaving dangling detached metadata around.

There's a separate entry point for ostree_repo_prune_static_deltas which isn't currently under lock. Which brings me to a separate thought I've been having about the design here.

Currently each caller except for the transaction lock separately opens the lock file. This has pros and cons. On the plus side, you're not at risk of lowering the lock state (either unlocking or dropping exclusive to shared) if some other part of the code has locked the repo. On the down side, there's no way to coordinate locking access and it leaves things open for deadlocking. I.e., it would be ideal to take an exclusive lock in ostree_repo_prune_static_deltas, but it's already taken in ostree_repo_prune, which calls ostree_repo_prune_static_deltas. So, you'd deadlock in the current design if you added exclusive locking in ostree_repo_prune_static_deltas and called ostree_repo_prune.

This also means, for example, that you can't lock the repo in your application before starting a chain of operations. This is the type of thing I'd like to do during a release - take an exclusive lock immediately after opening the repo and maintaining it for the entire release process. It might be overkill, but I'd really like to prevent anything else from touching the repo during that process.

In order to support that, you'd probably want to keep one lock in the OstreeRepo structure like is done for the transaction lock in this PR. However, to coordinate, you'd probably want to store the lock state. I.e., if asked to lock shared and the repo is already exclusively locked, then do nothing. Furthermore, to support multithreaded operations, you'd probably need a mutex or something to manage access to the lock and lock state. Maybe I'm overthinking it. I'm definitely not a locking guru.

Another ugly option would be to add a bunch of _locked API variants that don't attempt to take a lock because you already have one.

@cgwalters
Copy link
Member Author

Yeah, I think you're right. The recursive case could be handled by tracking whether or not we already own the lock in this process, but to handle the general case of wanting exclusion across multiple operations, I'd agree we need to hoist the locking to be caller-controlled.

GError **error);

gboolean
_ostree_repo_lock_exclusive (OstreeRepo *self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it would make sense to expose these two functions externally? In general I think they can be useful when more complex prune logics are implemented by an application. In the particular case, there is still an issue with Atomic system containers when atomic images prune is used at the same time an image referring to a dangling layer (which is already present locally) is being pulled: prune might delete the layer but the pull process already checked it is present in the repository and it doesn't pull it again. If these two functions are made public, I could use them directly, otherwise we will probably need to implement the locking part separately.

@cgwalters
Copy link
Member Author

Working on converting this to explicit locking.

@cgwalters cgwalters self-assigned this Jun 12, 2017
@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably d0d5f54) made this pull request unmergeable. Please resolve the merge conflicts.

@cgwalters
Copy link
Member Author

I was thinking about this more, and while it wouldn't be a complete fix, we could greatly mitigate things if we followed what low-pause memory GC algorithms do, which is: Compute references without locking, then stop and look for any new objects that have appeared, rescan those, etc. In general, we can also assume objects in any transaction staging dir are referenced.

@dbnicholson
Copy link
Member

I've been thinking about this off and on for a while. I think that in order to be really useful, the lock should be acquired freely so that you can take it freely within ostree or outside ostree. I'm definitely no locking guru, but here are the features I think are needed to accomplish that:

  • Read/write locking - As discussed above, only a few operations need exclusive access to the repo and locking the repo exclusively would block a lot of operations that are safe to run concurrently.

  • Recursive locking - To coherently take the lock at many places and leave the previous lock state intact. When the lock is already taken exclusively, you'd do nothing to not lower the lock state.

  • Threadsafe - If you agree that recursive locking is desired, then I believe it would be difficult to implement if each thread is operating on the same lock. I think it would be reasonable to have each thread operate on an independent lock.

I came up with something layered on top of flock(2) that seems to work. It uses GPrivate so that the lock is in thread local storage. Here's the basics:

typedef struct {
  GQueue *stack;
  int fd;
} Lock;

static void
free_lock (gpointer data)
{
  Lock *lock = data;

  if (lock != NULL)
    {
      g_queue_free (lock->stack);
      if (lock->fd >= 0)
        close (lock->fd);
      g_free (lock);
    }
}

static GPrivate repo_lock = G_PRIVATE_INIT (free_lock);

static gboolean
push_lock (int flags)
{
  Lock *lock;
  int cur_state;

  g_assert (flags & (LOCK_EX | LOCK_SH));

  lock = g_private_get (&repo_lock);
  if (!lock)
    {
      lock = g_new (Lock, 1);
      lock->stack = g_queue_new ();
      lock->fd = openat (AT_FDCWD, "lock", O_CREAT | O_RDWR | O_CLOEXEC,
                         0600);
      if (lock->fd < 0)
        {
          g_printerr ("Open of lock failed: %s\n", g_strerror (errno));
          return FALSE;
        }
      g_private_set (&repo_lock, lock);
    }

  if (g_queue_get_length (lock->stack) == 0)
    cur_state = LOCK_UN;
  else
    cur_state = GPOINTER_TO_INT (g_queue_peek_head (lock->stack));
  if (cur_state == LOCK_EX)
    {
      g_queue_push_head (lock->stack, GINT_TO_POINTER (LOCK_EX));
    }
  else
    {
      int state = (flags & LOCK_EX) ? LOCK_EX : LOCK_SH;

      if (flock (lock->fd, flags) < 0)
        {
          g_printerr ("Lock failed: %s\n", g_strerror (errno));
          return FALSE;
        }

      g_queue_push_head (lock->stack, GINT_TO_POINTER (state));
    }

  return TRUE;
}

static gboolean
pop_lock (void)
{
  Lock *lock;

  lock = g_private_get (&repo_lock);
  g_return_val_if_fail (lock != NULL, FALSE);
  g_return_val_if_fail (lock->stack != NULL, FALSE);
  g_return_val_if_fail (g_queue_get_length (lock->stack) > 0, FALSE);
  g_return_val_if_fail (lock->fd != -1, FALSE);

  if (g_queue_get_length (lock->stack) > 1)
    {
      int cur_state = GPOINTER_TO_INT (g_queue_peek_head (lock->stack));
      int next_state = GPOINTER_TO_INT (g_queue_peek_nth (lock->stack, 1));

      /* Drop back to the previous lock state if it differs */
      if (next_state != cur_state)
        {
          /* We should never drop from shared to exclusive */
          g_return_val_if_fail (next_state == LOCK_SH, FALSE);
          if (flock (lock->fd, next_state) < 0)
            {
              g_printerr ("Drop to shared lock failed: %s\n",
                          g_strerror (errno));
              return FALSE;
            }
        }
    }
  else
    {
      /* Lock stack will be empty, unlock */
      if (flock (lock->fd, LOCK_UN) < 0)
        {
          g_printerr ("Unlock failed: %s\n", g_strerror (errno));
          return FALSE;
        }
    }

  g_queue_pop_head (lock->stack);

  return TRUE;
}

A GQueue is used to keep a stack of lock states and if the lock is already exclusive, then it just keeps that and puts another LOCK_EX on the stack.

Obviously that's not totally usable as is (you'd want to control the lock file path, use GError, put some nicer wrappers around it, etc), but I was wondering what you thought about this concept.

@dbnicholson
Copy link
Member

I came up with an alternative that I think handles most use cases in #1292.

@cgwalters
Copy link
Member Author

Closing this in favor of #813 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants