[4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1160

fortinj66 · 2022-03-16T17:00:22Z

Describe the bug
writing to image registry fails after upgrade from OKD 4.9 -> 4.10 using ceph-fs filesystems.

Version
OKD 4.10 VMWare IPI

How reproducible
100%

See #1153 for details

schuemann · 2022-03-17T22:51:35Z

I have the same problem. Not only with the image registry!
Both with a clean okd 4.10 installation with rook 1.8.7 / ceph 16.2.7 as well as with a okd 4.9 updated to 4.10.

Creating more than one new file on a cephfs volume leads to permission denied errors. Waiting more than a minute between the requests seems to help (but is obviously no solution).

sh-4.4$ echo 1 > /test/1.txt
sh-4.4$ echo 2 > /test/2.txt
sh: /test/2.txt: Permission denied
sh-4.4$ echo 3 > /test/3.txt
sh: /test/3.txt: Permission denied
sh-4.4$ ls -la /test/
ls: cannot access '/test/3.txt': Permission denied
ls: cannot access '/test/2.txt': Permission denied
total 1
drwxrwxrwx. 2 root root  3 Mar 17 22:46 .
dr-xr-xr-x. 1 root root 40 Mar 17 22:44 ..
-rw-r--r--. 1 rook rook  2 Mar 17 22:46 1.txt
-?????????? ? ?    ?     ?            ? 2.txt
-?????????? ? ?    ?     ?            ? 3.txt
sh-4.4$ sleep 120
sh-4.4$ echo > /test/4.txt
sh-4.4$ ls -la /test/
ls: cannot access '/test/3.txt': Permission denied
ls: cannot access '/test/2.txt': Permission denied
total 1
drwxrwxrwx. 2 root root  4 Mar 17 22:48 .
dr-xr-xr-x. 1 root root 40 Mar 17 22:44 ..
-rw-r--r--. 1 rook rook  2 Mar 17 22:46 1.txt
-?????????? ? ?    ?     ?            ? 2.txt
-?????????? ? ?    ?     ?            ? 3.txt
-rw-r--r--. 1 rook rook  1 Mar 17 22:48 4.txt

No problems with rdb block volumes. No hints in logs or on the ceph status dashboard.

mschaefers · 2022-03-18T08:21:49Z

Same problem here. okd 4.9 worked, okd 4.10 doesn't:

Version Details:
Updated from 4.9.0-0.okd-2022-02-12-140851 to 4.10.0-0.okd-2022-03-07-131213 (bare metal installation)
CephFS CSI Driver: Helm Chart 3.5.1 https://ceph.github.io/csi-charts
Ceph MDS: ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)

LorbusChris · 2022-03-18T10:03:20Z

ref: https://bugzilla.redhat.com/show_bug.cgi?id=2063929

fortinj66 · 2022-03-18T14:35:30Z

Reset the bugzilla as a Fedora/FCOS and Ceph issue rather than an image registry issue

SriRamanujam · 2022-04-17T04:46:02Z

I had some time tonight to look into this a bit further. I first tried to reproduce the issue from a bare CoreOS VM. Since my CephFS can be mounted from outside of my OKD cluster, I was able to mount my image registry's CephFS volume and do some tests.

I could not reproduce the issue from a bare FCOS VM. I tried experimentally creating single files one at a time, creating many thousands of files in a loop from the shell, etc. Nothing worked.

Okay, that puts this firmly back into the realm of OKD. The next step I took was to start up a standalone pod with the registry cephfs mounted into it and try to reproduce the issue there. I was careful to ensure the security context in my standalone pod matched the image registry pod, just to be extra sure. I could not reproduce the issue from a standalone pod.

As a last resort, I swapped my image registry pod back to using the CephFS mount, and the issue was immediately reproducible. While poking around at that, I found something rather interesting. Check this out:

A broken directory, as seen from inside the image registry pod:

sh-4.4$ pwd
/registry/docker/registry/v2/repositories/media/jackett/_uploads/1a31f11f-b7bd-413f-a9fd-829f4a996ec7
sh-4.4$ ls -lashZ
ls: cannot access 'data': Permission denied
total 512
  0 drwxr-xr-x.   2 1000330000 root system_u:object_r:container_file_t:s0:c12,c18   2 Apr 17 04:04 .
  0 drwxr-xr-x. 270 1000330000 root system_u:object_r:container_file_t:s0:c12,c18 268 Apr 17 04:04 ..
  ? -??????????   ? ?          ?                                                0   ?            ? data
512 -rw-r--r--.   1 1000330000 root system_u:object_r:container_file_t:s0:c12,c18  20 Apr 17 04:04 startedat
sh-4.4$

That same directory, as seen from my standalone debug pod:

sh-5.1$ pwd
/data/docker/registry/v2/repositories/media/jackett/_uploads/1a31f11f-b7bd-413f-a9fd-829f4a996ec7
sh-5.1$ ls -lashZ
total 512
  0 drwxr-xr-x.   2 1000330000 root system_u:object_r:container_file_t:s0:c12,c18   2 Apr 17 04:04 .
  0 drwxr-xr-x. 270 1000330000 root system_u:object_r:container_file_t:s0:c12,c18 268 Apr 17 04:04 ..
  0 -rw-r--r--.   1 1000330000 root system_u:object_r:container_file_t:s0:c12,c18   0 Apr 17 04:04 data
512 -rw-r--r--.   1 1000330000 root system_u:object_r:container_file_t:s0:c12,c18  20 Apr 17 04:04 startedat
sh-5.1$

Note that from the standalone pod, the SELinux context is absolutely fine! I can touch the file, edit it as usual, everything works. However, you know what's really interesting? If I use the standalone pod to delete and re-create the data file, then the registry pod sees the file with the correct context data!

I am unsure where to go from here. SELinux contexts are stored as xattrs, so perhaps the Ceph MDS is messing up? But in that case, I would have seen this regardless of OKD version 'cause I haven't upgraded my version of Ceph in a bit. This very clearly started with OKD 4.10, though. So perhaps the image registry is doing something new, and that new something is interacting badly with Ceph?

SriRamanujam · 2022-04-17T05:50:59Z

From the perspective of the host, here is what a failing data file looks like:

[root@worker2 e4d8b3ba-8232-4a0e-8cfd-ad2c51b43e61]# pwd
/var/lib/kubelet/pods/8a3acbb1-e7b2-4d01-ad6b-31f1301ed2c8/volumes/kubernetes.io~csi/pvc-a6b40fa4-01c6-489e-8991-eeecae63ff78/mount/docker/registry/v2/repositories/media/jackett/_uploads/e4d8b3ba-8232-4a0e-8cfd-ad2c51b43e61
[root@worker2 e4d8b3ba-8232-4a0e-8cfd-ad2c51b43e61]# ls -alhsZ
total 512
  0 drwxr-xr-x.   2 1000330000 root system_u:object_r:container_file_t:s0:c12,c18   2 Apr 17 05:43 .
  0 drwxr-xr-x. 282 1000330000 root system_u:object_r:container_file_t:s0:c12,c18 280 Apr 17 05:43 ..
  0 -rw-r--r--.   1 1000330000 root system_u:object_r:unlabeled_t:s0                0 Apr 17 05:43 data
512 -rw-r--r--.   1 1000330000 root system_u:object_r:container_file_t:s0:c12,c18  20 Apr 17 05:43 startedat

Why it has a proper label outside of the pod but question marks inside, I don't know. I'm also wondering if an xattr of all zeroes corresponds to system_u:object_r:unlabeled_t:s0.

Another note for debugging and/or theorycrafting: my debug pod and my registry pod are two different hosts.

fortinj66 · 2022-04-17T14:19:44Z

@SriRamanujam Take a look at this: coreos/fedora-coreos-tracker#1167

I can recreate without using the image registry...

What I find odd is that I am running the exact same container image and the exact same cephfs code. The only difference is OKD 4.10 and FCOS 35.

Unfortunately, I don't have an external Ceph cluster :(

SriRamanujam · 2022-04-17T16:13:18Z

@SriRamanujam Take a look at this: coreos/fedora-coreos-tracker#1167

I saw that issue, and in fact I was all ready to write up reproduction steps for that issue until I couldn't reproduce it with a bare FCOS VM :(

I can recreate without using the image registry...

What were your reproduction steps? I was doing for i in $(seq 1 1000); do echo "testing123" > test$i.txt; done inside of the cephfs mount and hoping to get a denial, but it never happened.

What I find odd is that I am running the exact same container image and the exact same cephfs code. The only difference is OKD 4.10 and FCOS 35.

Same :(

Unfortunately, I don't have an external Ceph cluster :(

If you are using Rook, you can set hostNetwork: true in your CephCluster resource and that will change the mons and MDSes to listen on the host's IP address rather than the ClusterIP. I have mine set up that way, and I also have a one-liner that can mount the CephFS root from outside the cluster. I'm not sure if that's disruptive to existing PVC mounts, though, as I've had mine set up like this from the start.

fortinj66 · 2022-04-17T18:03:55Z

I wonder if it could be a crio/selinux issue on FCOS 35. Can you try mounting the CephFS mount in a container in the standalone FCOS node?

depouill · 2022-04-20T07:45:55Z

Hi,
as mentioned here : coreos/fedora-coreos-tracker#1167 (comment)
Same problem with an external ceph cluster after upgrading to OKD 4.10.

SriRamanujam · 2022-04-22T01:10:59Z

Okay, so I have made a breakthrough of sorts. I am able to reproduce the issue from within my registry container. The important thing is that I have to make a new directory beforehand, then cd into there and try to touch a thousand files as I described in #1160 (comment).

When I do this, I am able to see that exactly one of the 1000 files gets a proper context, and the rest are question marks. This exactly matches the behavior I see in the registry-managed folders, where startedat (presumably created first) gets a proper label and data (the important one) getting all question marks.

I am still unable to reproduce from outside of the container, neither from my laptop nor from a bare FCOS install (both running Fedora 35).

SriRamanujam · 2022-04-22T02:28:23Z

This is from my workstation. I have mounted the registry cephfs mount to my workstation, using the same options and such as the CSI does. When I touch a bunch of files from this mount, the file contexts are different from what I expect. Unmounting and re-mounting the cephfs mount magically makes the file contexts line up with the expected value.

sramanujam@sriramanujam.lan@hapes /tmp/cephfs/docker/registry/v2/test/testing2 
❯ for i in $(seq 1 1000); do echo "testing123" > test$i.txt; done
sramanujam@sriramanujam.lan@hapes /tmp/cephfs/docker/registry/v2/test/testing2 
❯ ls -lashZ | head
total 500K
  0 drwxr-xr-x. 2 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18 1000 Apr 21 22:05 .
  0 drwxrwxrwx. 3 root                        root                          unconfined_u:object_r:container_file_t:s0:c12,c18    1 Apr 21 22:05 ..
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:05 test1000.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:11 test100.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:11 test101.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:11 test102.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:11 test103.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:11 test104.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan system_u:object_r:unlabeled_t:s0                    11 Apr 21 22:11 test105.txt
sramanujam@sriramanujam.lan@hapes /tmp/cephfs/docker/registry/v2/test/testing2 
❯ cd /tmp
sramanujam@sriramanujam.lan@hapes /tmp 
❯ sudo umount cephfs/
sramanujam@sriramanujam.lan@hapes /tmp 
❯ sudo mount -t ceph "$(oc -n rook-ceph get configmap/rook-ceph-mon-endpoints -o jsonpath={.data.csi-cluster-config-json} | jq -r '.[0].monitors | @csv' | sed 's/"//g')":/volumes/csi/csi-vol-4decf389-d18c-11eb-94dd-0a580a81040a/fb85b670-c8c0-44b2-98db-1c093796145e -o name="$(oc -n rook-ceph get secret/rook-csi-cephfs-node -o jsonpath={.data.adminID} | base64 -d)",secret="$(oc -n rook-ceph get secret/rook-csi-cephfs-node -o jsonpath={.data.adminKey} | base64 -d)",mds_namespace=library-cephfs ./cephfs
sramanujam@sriramanujam.lan@hapes /tmp 
❯ cd /tmp/cephfs/docker/registry/v2/test/testing2
sramanujam@sriramanujam.lan@hapes /tmp/cephfs/docker/registry/v2/test/testing2 
❯ ls -lashZ | head
total 500K
  0 drwxr-xr-x. 2 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18 1000 Apr 21 22:05 .
  0 drwxrwxrwx. 3 root                        root                          unconfined_u:object_r:container_file_t:s0:c12,c18    1 Apr 21 22:05 ..
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:05 test1000.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:11 test100.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:11 test101.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:11 test102.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:11 test103.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:11 test104.txt
512 -rw-r--r--. 1 sramanujam@sriramanujam.lan domain users@sriramanujam.lan unconfined_u:object_r:container_file_t:s0:c12,c18   11 Apr 21 22:11 test105.txt

I think this could go some way towards explaining why I am unable to reproduce the denials outside of the container. The security context system_u:object_r:unlabeled_t:s0 matches the context of a question mark file as seen from the container host's context (see #1160 (comment)). However, the container user is extremely restricted. Permissions on an administrator user outside of the container probably still allow access to files with the system_u:object_r:unlabeled_t:s0 context on them.

SriRamanujam · 2022-04-22T04:04:46Z

Okay, I figured it out. This commit was merged into the kernel during the 5.16 release cycle, changing the ceph driver's default to be asynchronous dirops. This asynchronicity seems to be the root cause of the missing/incorrect labels. Setting the mount option wsync changes it back to the old behavior, which fixes the issue for me.

However, getting your StorageClass to actually use the wsync option is more than a bit annoying. Apparently the Ceph CSI plugin reads this in from a custom parameter set on the StorageClass. This, then, gets stored on the actual PersistentVolumes under /spec/csi/volumeAttributes. Neither of these fields are editable after creation. Therefore, in order to fix this, we're all going to have to delete and re-create our CephFS StorageClasses and every PV that needs this mount option set (which, imo, is all of them).

I also think this might be worth reporting as a bug against the kernel itself, since this, at heart, is probably affecting propagation of extended attributes and selinux contexts for everyone, not just OKD+Ceph users.

takyon77 · 2022-04-22T05:51:56Z

@SriRamanujam When installing ceph-csi-cephfs, there's an option 'kernelMountOptions' which allow to pass kernel mount options. When I set it with 'wsync' it appears to indeed remediate the issue!
Thanks!

vrutkovs · 2022-04-22T06:13:32Z

@SriRamanujam great investigation, would you mind reporting this to Fedora / upstream? Bonus points for making a KNOWN_ISSUE.md update PR.

See okd-project#1160

See #1160

mschaefers · 2022-04-22T08:12:21Z

@SriRamanujam I can confirm that the issue can be solved by passing kernelMountOptions: wsync to the StorageClass

darren-oxford · 2022-04-22T12:59:31Z

Great work on tracking down the problem @SriRamanujam I have delayed updating to 4.10 to avoid running into this issue but when I delete the cephfs StorageClass and add it back again with kernelMountOptions: wsync in the parameters stanza it is ignored. What am I doing wrong, I am trying to use the rook/ceph installed as part of the Openshift Data Foundation operator (formerly OpenShift Container Storage) if that makes a difference?

SriRamanujam · 2022-04-22T13:41:20Z

@vrutkovs I have updated the existing bug ticket @fortinj66 filed with this information, will that be sufficient?

@darren-oxford You have to delete and re-create your PV once you've added the option to your StorageClass.

darren-oxford · 2022-04-22T13:48:24Z

@SriRamanujam Unfortunately recreating the StorageClass as follows...

apiVersion: storage.k8s.io/v1
metadata:
  name: ocs-storagecluster-cephfs
  annotations:
    description: Provides RWO and RWX Filesystem volumes
provisioner: openshift-storage.cephfs.csi.ceph.com
parameters:
  kernelMountOptions: wsync
  clusterID: openshift-storage
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: openshift-storage
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: openshift-storage
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: openshift-storage
  fsName: ocs-storagecluster-cephfilesystem
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

results in kernelMountOptions: wsync being ignored when creating the StorageClass so is missing from the StorageClass when it is re-created.

SriRamanujam · 2022-04-22T13:59:02Z

@darren-oxford You might be missing kind: StorageClass in your YAML. Apart from that, that's more or less identical to what I have.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-cephfs
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  # clusterID is the namespace where operator is deployed.
  clusterID: rook-ceph

  # CephFS filesystem name into which the volume shall be created
  fsName: library-cephfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: library-cephfs-data0

  # Root path of an existing CephFS volume
  # Required for provisionVolume: "false"
  # rootPath: /absolute/path

  kernelMountOptions: wsync

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

  # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel)
  # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse
  # or by setting the default mounter explicitly via --volumemounter command-line argument.
  # mounter: kernel
reclaimPolicy: Delete
mountOptions:
  # uncomment the following line for debugging
  #- debug

darren-oxford · 2022-04-22T14:07:16Z

Thanks @SriRamanujam kind: StorageClass is there, just missed it of pasting it here. Tried it again and OpenShift Data Foundation operator is overwriting it. Thanks for confirming that I am doing it right though, seems this workaround may not work with ODF.

I guess I should switch to standard rook Ceph as I have found ODF to be incredibly frustrating.

mschaefers · 2022-04-22T14:11:18Z

@darren-oxford if your StorageClass is controlled/maintained by some operator, you need to find a way of configuring the StorageClass in your operator config. That would be you only option aside from not using the ODF operator at all.

fortinj66 · 2022-04-22T16:03:00Z

I can also confirm that the fix works on ceph-rook

pflaeging · 2022-04-25T09:10:08Z

Hi, I've got the same problem with my 4.10 test cluster (baremetal UPI on KVM with rook-ceph).

Here's my fix:

login as cluster admin ;-)
export your storage class oc get sc/rook-cephfs -o yaml > sc-temp.yaml
delete rook-cephfs storage class oc delete sc/rook-cephfs
edit sc-temp.yaml
- strip it down to the required lines (like this)
- add the new parameter: kernelMountOptions: wsync

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com # driver:namespace:operator
parameters:
  ### THIS is the new option
  kernelMountOptions: wsync
  ###
  clusterID: rook-ceph # namespace:cluster
  fsName: myfs
  pool: myfs-replicated
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph 
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph 
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:

deploy it with oc apply -f sc-temp.yaml
reboot all nodes (one after another) (just to be sure ;-)

After this, everything is working like before 4.10, ...

depouill · 2022-04-25T21:48:20Z

Hi, I've got the same problem with my 4.10 test cluster (baremetal UPI on KVM with rook-ceph).

Here's my fix:

* login as cluster admin ;-)

* export your storage class `oc get sc/rook-cephfs -o yaml > sc-temp.yaml`

* delete rook-cephfs storage class `oc delete sc/rook-cephfs`

* edit `sc-temp.yaml`
  
  * strip it down to the required lines (like this)
  * add the new parameter: `kernelMountOptions: wsync`

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com # driver:namespace:operator
parameters:
  ### THIS is the new option
  kernelMountOptions: wsync
  ###
  clusterID: rook-ceph # namespace:cluster
  fsName: myfs
  pool: myfs-replicated
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph 
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph 
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:

* deploy it with `oc apply -f sc-temp.yaml`

* reboot all nodes (one after another) (just to be sure ;-)

After this, everything is working like before 4.10, ...

I'm not sure it works every time. Mount options are not modified for me.

depouill · 2022-04-25T21:49:47Z

I can also confirm that the fix works on ceph-rook

It works for me too, but it is a liitle bit hard to reconstruct each PV

vrutkovs · 2022-05-25T16:33:23Z

openshift/okd-machine-os#364 should have a new kernel with a fix, new nightly is on the way

vrutkovs · 2022-05-30T07:24:40Z

Fix included in https://amd64.origin.releases.ci.openshift.org/releasestream/4-stable/release/4.10.0-0.okd-2022-05-28-062148

Gjonni · 2022-06-21T14:02:57Z

maybe it's a stupid question .. but what happens to the volumes on which we applied the previous fix?

kernelMountOptions: wsync

thank you

darren-oxford · 2022-06-21T14:08:40Z

...what happens to the volumes on which we applied the previous fix?

AFIK, nothing as the problem was caused by a decision to change the default behaviour to wsync off, the fix has been to have it default on again. Your specifying the kernelMountOptions: wsync is essentially just superfluous and will not change the behaviour.

Gjonni · 2022-06-21T14:09:57Z

perfect thank you very much

fortinj66 changed the title ~~CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail.~~ [4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. Mar 16, 2022

fortinj66 mentioned this issue Apr 12, 2022

[OKD 4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. coreos/fedora-coreos-tracker#1167

Closed

vrutkovs added a commit to vrutkovs/okd that referenced this issue Apr 22, 2022

KNOWN_ISSUES: add CephFS bug description

14a8f4e

See okd-project#1160

vrutkovs mentioned this issue Apr 22, 2022

KNOWN_ISSUES: add CephFS bug #1190

Merged

vrutkovs added a commit that referenced this issue Apr 22, 2022

KNOWN_ISSUES: add CephFS bug description

9f1266f

See #1160

vrutkovs pinned this issue Apr 22, 2022

fortinj66 closed this as completed Jun 21, 2022

vrutkovs unpinned this issue Aug 14, 2022

jasminstrkonjic mentioned this issue Jan 27, 2023

PVC creation pending due to deadline exception on external cluster using official 1.10 rook docs which remain after apply a fix to issue 8696 rook/rook#11347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1160

[4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1160

fortinj66 commented Mar 16, 2022

schuemann commented Mar 17, 2022

mschaefers commented Mar 18, 2022

LorbusChris commented Mar 18, 2022

fortinj66 commented Mar 18, 2022

SriRamanujam commented Apr 17, 2022

SriRamanujam commented Apr 17, 2022

fortinj66 commented Apr 17, 2022

SriRamanujam commented Apr 17, 2022

fortinj66 commented Apr 17, 2022

depouill commented Apr 20, 2022

SriRamanujam commented Apr 22, 2022

SriRamanujam commented Apr 22, 2022

SriRamanujam commented Apr 22, 2022

takyon77 commented Apr 22, 2022 •

edited

Loading

vrutkovs commented Apr 22, 2022

mschaefers commented Apr 22, 2022

darren-oxford commented Apr 22, 2022

SriRamanujam commented Apr 22, 2022

darren-oxford commented Apr 22, 2022 •

edited

Loading

SriRamanujam commented Apr 22, 2022

darren-oxford commented Apr 22, 2022

mschaefers commented Apr 22, 2022

fortinj66 commented Apr 22, 2022

pflaeging commented Apr 25, 2022 •

edited

Loading

depouill commented Apr 25, 2022

depouill commented Apr 25, 2022 •

edited

Loading

vrutkovs commented May 25, 2022

vrutkovs commented May 30, 2022

Gjonni commented Jun 21, 2022

darren-oxford commented Jun 21, 2022

Gjonni commented Jun 21, 2022

[4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1160

[4.10] CephFS seems to be broken with FCOS 35 upgrade. image registry writes fail. #1160

Comments

fortinj66 commented Mar 16, 2022

schuemann commented Mar 17, 2022

mschaefers commented Mar 18, 2022

LorbusChris commented Mar 18, 2022

fortinj66 commented Mar 18, 2022

SriRamanujam commented Apr 17, 2022

SriRamanujam commented Apr 17, 2022

fortinj66 commented Apr 17, 2022

SriRamanujam commented Apr 17, 2022

fortinj66 commented Apr 17, 2022

depouill commented Apr 20, 2022

SriRamanujam commented Apr 22, 2022

SriRamanujam commented Apr 22, 2022

SriRamanujam commented Apr 22, 2022

takyon77 commented Apr 22, 2022 • edited Loading

vrutkovs commented Apr 22, 2022

mschaefers commented Apr 22, 2022

darren-oxford commented Apr 22, 2022

SriRamanujam commented Apr 22, 2022

darren-oxford commented Apr 22, 2022 • edited Loading

SriRamanujam commented Apr 22, 2022

darren-oxford commented Apr 22, 2022

mschaefers commented Apr 22, 2022

fortinj66 commented Apr 22, 2022

pflaeging commented Apr 25, 2022 • edited Loading

depouill commented Apr 25, 2022

depouill commented Apr 25, 2022 • edited Loading

vrutkovs commented May 25, 2022

vrutkovs commented May 30, 2022

Gjonni commented Jun 21, 2022

darren-oxford commented Jun 21, 2022

Gjonni commented Jun 21, 2022

takyon77 commented Apr 22, 2022 •

edited

Loading

darren-oxford commented Apr 22, 2022 •

edited

Loading

pflaeging commented Apr 25, 2022 •

edited

Loading

depouill commented Apr 25, 2022 •

edited

Loading