Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set object lock for volumes for cephfs encryption #4697

Merged
merged 1 commit into from
Jul 11, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 62 additions & 3 deletions internal/cephfs/nodeserver.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ import (
"os"
"path"
"strings"
"syscall"
"time"

cerrors "github.com/ceph/ceph-csi/internal/cephfs/errors"
"github.com/ceph/ceph-csi/internal/cephfs/mounter"
Expand Down Expand Up @@ -127,15 +129,72 @@ func maybeUnlockFileEncryption(
stagingTargetPath string,
volID fsutil.VolumeID,
) error {
if volOptions.IsEncrypted() {
log.DebugLog(ctx, "cephfs: unlocking fscrypt on volume %q path %s", volID, stagingTargetPath)
if !volOptions.IsEncrypted() {
return nil
}

// Define Mutex Lock variables
lockName := string(volID) + "-mutexLock"
lockDesc := "Lock for " + string(volID)
lockDuration := 150 * time.Second
// Generate a consistent lock cookie for the client using hostname and process ID
lockCookie := generateLockCookie()
var flags byte = 0

log.DebugLog(ctx, "Creating lock for the following volume ID %s", volID)

ioctx, err := volOptions.GetConnection().GetIoctx(volOptions.MetadataPool)
if err != nil {
nixpanic marked this conversation as resolved.
Show resolved Hide resolved
log.ErrorLog(ctx, "Failed to create ioctx: %s", err)
nixpanic marked this conversation as resolved.
Show resolved Hide resolved

return fscrypt.Unlock(ctx, volOptions.Encryption, stagingTargetPath, string(volID))
return err
}
defer ioctx.Destroy()

res, err := ioctx.LockExclusive(volOptions.VolID, lockName, lockCookie, lockDesc, lockDuration, &flags)
Copy link
Contributor

@NymanRobin NymanRobin Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the code in go-ceph it seems to be that the object that we want to lock should exist and this won't create a new object? Are we sure there exist an object in rados corresponding to VolID, my initial assumption is no but I am not 100% confident, without further investigation

// LockExclusive takes an exclusive lock on an object.
func (ioctx *IOContext) LockExclusive(oid, name, cookie, desc string, duration time.Duration, flags *byte) (int, error) {

Edit: The object seems not to be needed

Copy link
Contributor

@NymanRobin NymanRobin Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuing my dig here, I added the following logging before the LockExclusive

_, err = ioctx.Stat(volOptions.VolID)
if err != nil {
  log.ErrorLog(ctx, "The error when object is not here: %s", err)
  log.ErrorLog(ctx, "Object %s does not exist", volOptions.VolID)
}

I get these logs:

E0724 09:03:43.715019       1 nodeserver.go:156] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de The error when object is not here: rados: ret=-2, No such file or directory
E0724 09:03:43.715051       1 nodeserver.go:157] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de Object csi-vol-489ccf33-13a5-40fc-8460-7dd866bc44de does not exist
E0724 09:03:43.715477       1 utils.go:240] ID: 142 Req-ID: 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de GRPC error: rpc error: code = Internal desc = Failed to lock volume ID 0001-0009-rook-ceph-0000000000000001-489ccf33-13a5-40fc-8460-7dd866bc44de: rados: ret=-1, Operation not permitted

This makes me highly suspicious... I however wonder why in Ceph they do not return -2, in case a lock is tried to be acquired on a object that does not exist? Maybe they have not thought of this scenario and it could be reported?I guess next step in this journey to the core would be to create the object first and see if the lock can the be acquired 😄

Edit: Actual problem is comments below, the object seems to not be needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I know at least why the -1 comes, and it is correct that the permissions are not there!
After adding the following to the authentication I can acquire the lock:

ceph auth caps client.csi-cephfs-node \
  mon 'allow *' \
  mgr 'allow *' \
  osd 'allow *' \
  mds 'allow *'

But I think what is actually needed is the following:
ceph auth caps client.csi-cephfs-node rados 'allow rw'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to conclude this it seems the lock can be created without the object existing so that is all good and my ramblings from above can be ignored!

Except some more permission for the client.csi-cephfs-node is needed to be able to create the lock

However it is extremely slow currently for me, bringing up five pods with 3 replicas and after 5 minutes I have only one replica of each. The error reported is the following:

  Warning  FailedMount             3m32s (x2 over 5m47s)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[my-volume kube-api-access-8cmvd]: timed out waiting for the condition
  Warning  FailedMount             81s (x11 over 7m35s)   kubelet                  MountVolume.MountDevice failed for volume "pvc-6afa784b-dc4f-44fb-bed9-8c6fe985c7e8" : rpc error: code = Internal desc = Lock is already held by another client and cookie pair for 0001-0009-rook-ceph-0000000000000001-c7f5d8f9-f9ca-4af8-8457-7ffd1bfb4437 volume
  Warning  FailedMount             74s                    kubelet                  Unable to attach or mount volumes: unmounted volumes=[my-volume], unattached volumes=[kube-api-access-8cmvd my-volume]: timed out waiting for the condition

This is the status of the pods after 8m46s:

$ kubectl get pods -n default 
NAME                            READY   STATUS              RESTARTS   AGE
deploy-pod-1-7f7789fd5f-92x4f   1/1     Running             0          8m46s
deploy-pod-1-7f7789fd5f-99qmp   1/1     Running             0          8m46s
deploy-pod-1-7f7789fd5f-dnr79   0/1     ContainerCreating   0          8m46s
deploy-pod-2-58c654874c-j2mvp   0/1     ContainerCreating   0          8m46s
deploy-pod-2-58c654874c-jh2wp   1/1     Running             0          8m46s
deploy-pod-2-58c654874c-svjws   1/1     Running             0          8m46s
deploy-pod-3-6674fc79f-rbtk2    1/1     Running             0          8m46s
deploy-pod-3-6674fc79f-w452p    1/1     Running             0          8m46s
deploy-pod-3-6674fc79f-w8wqb    0/1     ContainerCreating   0          8m46s
deploy-pod-4-6c958fb787-cqxlb   1/1     Running             0          8m46s
deploy-pod-4-6c958fb787-kvgbl   0/1     ContainerCreating   0          8m46s
deploy-pod-4-6c958fb787-q794c   1/1     Running             0          8m46s
deploy-pod-5-86b9bb47-5vmrq     0/1     ContainerCreating   0          8m45s
deploy-pod-5-86b9bb47-dd6v7     1/1     Running             0          8m45s
deploy-pod-5-86b9bb47-g5jvg     0/1     ContainerCreating   0          8m45s

After a whopping 13 minutes all the pods were finally running!

Should we actually poll the lock when trying to acquire, I think that might make it faster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @NymanRobin for testing.
I also tested. it is slow.
It took me 8 minutes to 6 replicas to be ready.
We should definitely improve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can write up a ticket for the performance and for the permissions!

if res != 0 {
nixpanic marked this conversation as resolved.
Show resolved Hide resolved
switch res {
case -int(syscall.EBUSY):
return fmt.Errorf("Lock is already held by another client and cookie pair for %v volume", volID)
case -int(syscall.EEXIST):
return fmt.Errorf("Lock is already held by the same client and cookie pair for %v volume", volID)
default:
return fmt.Errorf("Failed to lock volume ID %v: %w", volID, err)
}
}
log.DebugLog(ctx, "Lock successfully created for volume ID %s", volID)

log.DebugLog(ctx, "cephfs: unlocking fscrypt on volume %q path %s", volID, stagingTargetPath)
err = fscrypt.Unlock(ctx, volOptions.Encryption, stagingTargetPath, string(volID))
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won’t this leave a hanging lock in case the fscrypt fails to unlock the filesystem?

I think the ”ceph unlock functionality” could be moved to a separate function and this function pushed to the defer stack after the exclusiveLock is successful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about lockDuration variable on the LockExclusive function? When duration expires it will automatically unlock?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this I understand. But it cause unnecessary delays in my opinion that could easily be avoided. My point is that usually this is operation that would take seconds start to take minutes due to one failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes make sense to avoid unwanted delay in this case,As suggested ioctx.Unlock can be called defer right after the successful Lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created an issue for this so it is not forgotten: #4715

return err
}

ret, err := ioctx.Unlock(string(volID), lockName, lockCookie)
switch ret {
case 0:
log.DebugLog(ctx, "Lock %s successfully released ", lockName)
case -int(syscall.ENOENT):
log.DebugLog(ctx, "Lock is not held by the specified %s, %s pair", lockCookie, lockName)
default:
log.ErrorLog(ctx, "Failed to release following lock, this will lead to orphan lock %s: %v",
lockName, err)
}

return nil
}

// generateLockCookie generates a consistent lock cookie for the client.
func generateLockCookie() string {
hostname, err := os.Hostname()
if err != nil {
hostname = "unknown-host"
}
pid := os.Getpid()

return fmt.Sprintf("%s-%d", hostname, pid)
}

// maybeInitializeFileEncryption initializes KMS and node specifics, if volContext enables encryption.
func maybeInitializeFileEncryption(
ctx context.Context,
Expand Down