Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT #3967

Merged
merged 2 commits into from
Oct 24, 2023
Merged

rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT #3967

merged 2 commits into from
Oct 24, 2023

Commits on Oct 24, 2023

  1. tests/int: fix teardown in mounts_sshfs.bats

    Function teardown assumes that every test case will call
    setup_sshfs. Currently, this assumption is true, but once
    a test case that won't call setup_sshfs is added (say because
    it has some extra "requires" or "skip"), it will break bats,
    so any bats invocation involving such a test case will end up
    hanging after the last test case is run.
    
    The reason is, we have set -u in helpers.bash to help catching the use
    of undefined variables. In the above scenario, such a variable is DIR,
    which is referenced in teardown but is only defined after a call to
    setup_sshfs. As a result, bash that is running the teardown function
    will exit upon seeing the first $DIR, and thus teardown_bundle won't be
    run. This, in turn, results in a stale recvtty process, which inherits
    bats' fd 3. Until that fd is closed, bats waits for test logs.
    
    Long story short, the fix is to
     - check if DIR is set before referencing it;
     - unset it after unmount.
    
    PS it is still not clear why there is no diagnostics about the failed
    teardown.
    
    Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    kolyshkin authored and cyphar committed Oct 24, 2023
    Configuration menu
    Copy the full SHA
    153865d View commit details
    Browse the repository at this point in the history
  2. rootfs: remove --no-mount-fallback and finally fix MS_REMOUNT

    The original reasoning for this option was to avoid having mount options
    be overwritten by runc. However, adding command-line arguments has
    historically been a bad idea because it forces strict-runc-compatible
    OCI runtimes to copy out-of-spec features directly from runc and these
    flags are usually quite difficult to enable by users when using runc
    through several layers of engines and orchestrators.
    
    A far more preferable solution is to have a heuristic which detects
    whether copying the original mount's mount options would override an
    explicit mount option specified by the user. In this case, we should
    return an error. You only end up in this path in the userns case, if you
    have a bind-mount source with locked flags.
    
    During the course of writing this patch, I discovered that several
    aspects of our handling of flags for bind-mounts left much to be
    desired. We have completely botched the handling of explicitly cleared
    flags since commit 97f5ee4 ("Only remount if requested flags differ
    from current"), with our behaviour only becoming increasingly more weird
    with 50105de ("Fix failure with rw bind mount of a ro fuse") and
    da780e4 ("Fix bind mounts of filesystems with certain options
    set"). In short, we would only clear flags explicitly request by the
    user purely by chance, in ways that it really should've been reported to
    us by now. The most egregious is that mounts explicitly marked "rw" were
    actually mounted "ro" if the bind-mount source was "ro" and no other
    special flags were included. In addition, our handling of atime was
    completely broken -- mostly due to how subtle the semantics of atime are
    on Linux.
    
    Unfortunately, while the runtime-spec requires us to implement
    mount(8)'s behaviour, several aspects of the util-linux mount(8)'s
    behaviour are broken and thus copying them makes little sense. Since the
    runtime-spec behaviour for this case (should mount options for a "bind"
    mount use the "mount --bind -o ..." or "mount --bind -o remount,..."
    semantics? Is the fallback code we have for userns actually
    spec-compliant?) and the mount(8) behaviour (see [1]) are not
    well-defined, this commit simply fixes the most obvious aspects of the
    behaviour that are broken while keeping the current spirit of the
    implementation.
    
    NOTE: The handling of atime in the base case is left for a future PR to
    deal with. This means that the atime of the source mount will be
    silently left alone unless the fallback path needs to be taken, and any
    flags not explicitly set will be cleared in the base case. Whether we
    should always be operating as "mount --bind -o remount,..." (where we
    default to the original mount source flags) is a topic for a separate PR
    and (probably) associated runtime-spec PR.
    
    So, to resolve this:
    
    * We store which flags were explicitly requested to be cleared by the
      user, so that we can detect whether the userns fallback path would end
      up setting a flag the user explicitly wished to clear. If so, we
      return an error because we couldn't fulfil the configuration settings.
    
    * Revert 97f5ee4 ("Only remount if requested flags differ from
      current"), as missing flags do not mean we can skip MS_REMOUNT (in
      fact, missing flags are how you indicate a flag needs to be cleared
      with mount(2)). The original purpose of the patch was to fix the
      userns issue, but as mentioned above the correct mechanism is to do a
      fallback mount that copies the lockable flags from statfs(2).
    
    * Improve handling of atime in the fallback case by:
        - Correctly handling the returned flags in statfs(2).
        - Implement the MNT_LOCK_ATIME checks in our code to ensure we
          produce errors rather than silently producing incorrect atime
          mounts.
    
    * Improve the tests so we correctly detect all of these contingencies,
      including a general "bind-mount atime handling" test to ensure that
      the behaviour described here is accurate.
    
    This change also inlines the remount() function -- it was only ever used
    for the bind-mount remount case, and its behaviour is very bind-mount
    specific.
    
    [1]: util-linux/util-linux#2433
    
    Reverts: 97f5ee4 ("Only remount if requested flags differ from current")
    Fixes: 50105de ("Fix failure with rw bind mount of a ro fuse")
    Fixes: da780e4 ("Fix bind mounts of filesystems with certain options set")
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    cyphar committed Oct 24, 2023
    Configuration menu
    Copy the full SHA
    7c71a22 View commit details
    Browse the repository at this point in the history