Volsync no longer creating new Rook Volumes #1163
Replies: 6 comments
-
I'm not sure I understand this one - is the "influxdb" PVC this one? (i.e. a pvc using your replication destination as a datasourceref): https://github.com/adampetrovic/home-ops/blob/main/kubernetes/templates/volsync/claim.yaml And is this also your ReplicationSource sourcePVC? If the replicationdestination cannot sync, then the pvc will never complete, so the PVC will not become available. This means your replicationsource cannot use this pvc for syncs. Meaning you're always doing the replicationdestination pull down first, before any syncs. Perhaps this is what you intend with your setup however. We have a fix in 0.8.1 that detects in restic if there are issues with the destination repository. I'm wondering if you are hitting something related to this (let me know if this sounds likely): #1005 Essentially this will detect if the replicationdestination cannot connect to the datastore and fail the job. Previously it would just fall through and say it was successful even though no connection to the datastore could be made (presumably this can also happen if the store doesn't yet exist). So on an initial deployment of your charts when the datastore doesn't yet have data you'd get something like this:
That's my assumption anyway - Previously this may have worked as the replicationdestination would not have noticed there was an error, and you'd just have an empty PVC, but since it thought it was successful everything else would proceed. |
Beta Was this translation helpful? Give feedback.
-
Okay, weirdly enough I have upgraded back to v0.8.1 to try replicate the issue and now it's working fine (without any other changes to the deployment) I can't quite figure out why, perhaps there was a transient issue between the communication between volsync / the rook-ceph csi provisioner that has been resolved by redeploying the volsync stack. Sorry for the noise! Happy to close this and keep an eye on things and very much appreciate your input |
Beta Was this translation helpful? Give feedback.
-
Seems to be a transient issue that reoccurs on v0.8.1. I have reverted to v0.8.0 so I can continue using volsync. There is at least one other in the homelab community who has had the same issue: https://discord.com/channels/673534664354430999/1215741302210167016 |
Beta Was this translation helpful? Give feedback.
-
@adampetrovic did the explanation in #1163 (comment) sound like this was your scenario? I was guessing that this was similar to #1172. Since you say it's transient I'm not sure now. If the issue is due to your bucket being deleted in-between some of these tests, then you should get an error restoring from a backup (i.e. at the replicationdestination) every time there is no bucket to restore from. |
Beta Was this translation helpful? Give feedback.
-
@tesshuflower yes, i've tested again and can reproduce it continuously in v0.8.1 My problem is exactly the same as those in #1172 as we all use the same approach to provisioning ReplicationSource's at the same time as creating the underlying workload PVC. |
Beta Was this translation helpful? Give feedback.
-
@adampetrovic thanks for the confirmation - I'm going to close this one and keep #1172 as it hits on the exact source of the problem and I think is a bit easier to follow for others who may hit it. Just one note - keep in mind your scenario should still be working for when you actually need to restore a backup as part of the process - it should only be the initial setup with empty datastore that causes this issue. Let me know if this isn't the case. |
Beta Was this translation helpful? Give feedback.
-
[EDIT] Reverting back to v0.8.0 solved the problem.
Apologies if this is more of a Rook Ceph issue, but I haven't been able to wrap my head around the source of the problem.
I have recently rebuilt my cluster to start using Volsync + Rook Ceph. For the last few days everything has been working flawlessly. I have been able to create Volsync backed Rook volumes and all has been well.
After attempting to add a new application to my cluster, following the exact approach as I have the past few days, I am unable to get past a pending Rook Ceph PVC. Volsync spawns a pod to start the ReplicationDestination, it errors out saying repository can't be found (which is expected, because its a brand new application volsync has never seen before). I would have expected first that the ReplicationSource is created and the rook-block PVC is created, allowing the workload pod to attach.
If I create a ceph-block PVC directly (without volsync backing) it works and attaches to the PV fine. So it seems the volsync interaction is causing issues.
I am following the exact approach that others in the homelab community have been following for a little while:
PVC Claim (all defaults used): https://github.com/adampetrovic/home-ops/blob/main/kubernetes/templates/volsync/claim.yaml
Volsync Replication (all defaults used): https://github.com/adampetrovic/home-ops/blob/main/kubernetes/templates/volsync/minio.yaml
What I see:
Rook Ceph says it's expecting an External Provisioner (presumably Volsync) to handle the provisioning:
Volsync logs:
I'm using the following Snapshot Class
Which is definitely defined:
I'm kinda at a loss for determining which component is responsible for not provisioning the PVC.
I am using the latest version of Volsync and Rook
Beta Was this translation helpful? Give feedback.
All reactions