Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume snapshot does not work #408

Closed
matofeder opened this issue Apr 13, 2023 · 8 comments
Closed

Volume snapshot does not work #408

matofeder opened this issue Apr 13, 2023 · 8 comments
Assignees
Labels
bug Something isn't working Container Issues or pull requests relevant for Team 2: Container Infra and Tooling Sprint Copenhagen Sprint Copenhagen (2023, cwk 20+21)

Comments

@matofeder
Copy link
Member

matofeder commented Apr 13, 2023

Desired state

As a KaaS user, I want to create a volume-snapshot as is described e.g. here

Current state

The volume snapshot feature is not working in the workload clusters managed by the latest (R4) version of k8s-cluster-api-provider (with the default settings, i.e. k8s version v1.25.x)

Steps to reproduce

  1. Create a workload cluster by the k8s-cluster-api-provider
    • use default settings ( i.e. k8s version v1.25.x)
  2. Create PVC and its snapshot (well described e.g. in this example)
  3. K8s objects volumesnapshotcontents.snapshot.storage.k8s.io and volumesnapshots.snapshot.storage.k8s.io are not created and volume snapshot is not created as well

Reason

Currently, the script apply_cindercsi.sh wants to enable this feature with the deployment of related snapshot CRDs here.

Since Kubernetes version 1.17, the CSI external-snapshotter sidecar (sidecar deployed together with CSI cinder driver) controller has been split into two controllers: a snapshot-controller (an independent controller of any CSI Driver) and a CSI external-snapshotter sidecar (sidecar deployed together with CSI cinder driver).

Therefore the snapshot-controller should be deployed as well as part of k8s cluster management process to enable the volume snapshot feature in the k8s cluster.

Posible solution

Update the apply_cindercsi.sh script and deploy the snapshot-controller as is described e.g. here

@matofeder matofeder added bug Something isn't working Container Issues or pull requests relevant for Team 2: Container Infra and Tooling labels Apr 13, 2023
@garloff
Copy link
Member

garloff commented Apr 14, 2023

Ideally, we should add a test case for this.
(The sonobuoy CNCF tests don't cover CSI by default unfortunately, otherwise we would have caught this one.)

@jschoone jschoone added the Sprint Amsterdam Sprint Amsterdam (2023, cwk 16+17) label Apr 24, 2023
@chess-knight
Copy link
Member

(The sonobuoy CNCF tests don't cover CSI by default unfortunately, otherwise we would have caught this one.)

@garloff what about make check-storage and make check-csi?

@garloff
Copy link
Member

garloff commented May 5, 2023

Would be good

@chess-knight
Copy link
Member

But I mean that targets are actually there, see check-storage and check-csi.

@chess-knight
Copy link
Member

Ok, so I added deploying of snapshot controller in #415.

But I mean that targets are actually there, see check-storage and check-csi.

I played a little bit with sonobuoy tests and I found that check-csi target contains tests also for the volume snapshots.
The total number of tests matching CSI is 583:

$ sonobuoy e2e --focus='CSI' | wc -l
583

52 of them mention word snapshot:

$ sonobuoy e2e --focus='CSI' | grep -i snapshot | wc -l
52

I was able to filter one test(source code here), which is sufficient in my opinion for testing volume snapshots:

$ sonobuoy e2e --focus='CSI' | grep "CSI Volume Snapshots \[Feature:VolumeSnapshotDataSource\]"
[It] [sig-storage] CSI mock volume CSI Volume Snapshots [Feature:VolumeSnapshotDataSource] volumesnapshotcontent and pvc in Bound state with deletion timestamp set should not get deleted while snapshot finalizer exists

If I run this test on the cluster without snapshot controller deployed, I get:

$ sonobuoy.sh chesscluster --e2e-focus='CSI Volume Snapshots \[Feature:VolumeSnapshotDataSource\]'
...
=== Collecting results ===
...
Failed tests:
[It] [sig-storage] CSI mock volume CSI Volume Snapshots [Feature:VolumeSnapshotDataSource] volumesnapshotcontent and pvc in Bound state with deletion timestamp set should not get deleted while snapshot finalizer exists
...
Run Details:
API Server version: v1.25.9
Node health: 4/4 (100%)
Pods health: 30/31 (96%)
Details for failed pods:
sonobuoy/sonobuoy-e2e-job-bf4f64855e234258 Ready:False: PodFailed:
...
FAIL: Investigate 202305120803_sonobuoy_bf41e3fe-b990-4234-b1a6-648065e008a5.tar.gz for further inspection

If I run this test on the cluster with snapshot controller deployed, I get:

$ sonobuoy.sh snapshotcluster --e2e-focus='CSI Volume Snapshots \[Feature:VolumeSnapshotDataSource\]'
...
=== Collecting results ===
...
Run Details:
API Server version: v1.25.9
Node health: 4/4 (100%)
Pods health: 33/33 (100%)
...
=== Sonobuoy conformance tests passed in 298s ===

Note: Notice also the difference in number of pods in the clusters. It is because in the snapshotcluster there is deployed snapshot controller with 2 replicas.

We can introduce make target, e.g.:

check-snapshot:
	$(MAKE) check SONOMODE="--e2e-focus='CSI Volume Snapshots \[Feature:VolumeSnapshotDataSource\]'"

But probably it is not necessary, because this test is included in the check-csi target as I wrote, and I don't know if it is correct to write make target only for one sonobuoy test.
Note: I didn't run check-csi test, so I don't know if these tests are working.

@jschoone jschoone added the Sprint Bratislava Sprint Bratislava (2023, cwk 18+19) label May 14, 2023
@garloff
Copy link
Member

garloff commented May 15, 2023

OK, I had forgotten that we already have some good coverage with a specific sonobuoy check-csi target.
So we should ensure they work (pass once we have the snapshots enabled) and then just start using them.
THANKS!

@chess-knight
Copy link
Member

OK, I had forgotten that we already have some good coverage with a specific sonobuoy check-csi target. So we should ensure they work (pass once we have the snapshots enabled) and then just start using them. THANKS!

Ok, I ran check-csi target(sonobuoy.sh snapshotcluster --e2e-focus='CSI' --e2e-skip='Disruptive') 2 times successfully(two clusters with default configuration). Each run lasted 3+ hours, see e.g.:

=== Running sonobuoy conformance tests ... --e2e-focus=CSI --e2e-skip=Disruptive ===
...
=== Collecting results ===
...
Plugin: e2e
Status: passed
Total: 7069
Passed: 131
Failed: 0
Skipped: 6938

Plugin: systemd-logs
Status: passed
Total: 4
Passed: 4
Failed: 0
Skipped: 0

Run Details:
API Server version: v1.25.9
Node health: 4/4 (100%)
Pods health: 33/33 (100%)
...
=== Sonobuoy conformance tests passed in 11089s ===

I think that this issue can be closed as #415 is merged.

@jschoone jschoone added the Sprint Copenhagen Sprint Copenhagen (2023, cwk 20+21) label May 17, 2023
@jschoone
Copy link
Contributor

OK, I had forgotten that we already have some good coverage with a specific sonobuoy check-csi target. So we should ensure they work (pass once we have the snapshots enabled) and then just start using them. THANKS!

Ok, I ran check-csi target(sonobuoy.sh snapshotcluster --e2e-focus='CSI' --e2e-skip='Disruptive') 2 times successfully(two clusters with default configuration). Each run lasted 3+ hours, see e.g.:

=== Running sonobuoy conformance tests ... --e2e-focus=CSI --e2e-skip=Disruptive ===
...
=== Collecting results ===
...
Plugin: e2e
Status: passed
Total: 7069
Passed: 131
Failed: 0
Skipped: 6938

Plugin: systemd-logs
Status: passed
Total: 4
Passed: 4
Failed: 0
Skipped: 0

Run Details:
API Server version: v1.25.9
Node health: 4/4 (100%)
Pods health: 33/33 (100%)
...
=== Sonobuoy conformance tests passed in 11089s ===

I think that this issue can be closed as #415 is merged.

Agree. Done and closed :)

matofeder added a commit that referenced this issue Sep 4, 2023
…ate e2e pipelines

Issue: issues/#408

Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
matofeder added a commit that referenced this issue Sep 5, 2023
…ate e2e pipelines

Issue: issues/#408

Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
matofeder added a commit that referenced this issue Sep 5, 2023
…ate e2e pipelines

Issue: issues/#408

Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
jschoone pushed a commit that referenced this issue Sep 14, 2023
* Add CI E2E testing for k8s-cluster-api-provider project

This commit adds the following:
- `.zuul.yaml` configuration file that informs Zuul about the project's
  CI settings, pipelines and jobs
- pipelines:
  - `e2e-test` that executes `e2e-conformance` job
  - `unlabel-on-update-e2e-test` that ensures that any PR update invalidates a previous successful e2e test
  - `e2e-quick-test` that executes `e2e-quick` job
  - `unlabel-on-update-e2e-quick-test` that ensures that any PR update invalidates a previous successful e2e test
- jobs:
  - `e2e-conformance` that runs a sonobuoy conformance test against Kubernetes cluster spawned by k8s-cluster-api-provider scripts
  - `e2e-quick` that runs a sonobuoy quick test against Kubernetes cluster spawned by k8s-cluster-api-provider scripts

Issue: issues#378

Co-authored-by: Roman Hros <roman.hros@dnation.cloud>
Signed-off-by: Matej Feder <matej.feder@dnation.cloud>

* Update k8s-cluster-api-provider docs with instructions on how to operate e2e pipelines

Issue: issues/#408

Signed-off-by: Matej Feder <matej.feder@dnation.cloud>

* Apply suggestions from code review

Co-authored-by: Filip Dobrovolný <dobrovolny.filip@gmail.com>
Signed-off-by: Matej Feder <matej.feder@dnation.cloud>

* Rename jobs

Set 1500 mtu

Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Co-authored-by: Matej Feder <matej.feder@dnation.cloud>

---------

Signed-off-by: Matej Feder <matej.feder@dnation.cloud>
Signed-off-by: Roman Hros <roman.hros@dnation.cloud>
Co-authored-by: Roman Hros <roman.hros@dnation.cloud>
Co-authored-by: Filip Dobrovolný <dobrovolny.filip@gmail.com>
@jschoone jschoone removed Sprint Amsterdam Sprint Amsterdam (2023, cwk 16+17) Sprint Bratislava Sprint Bratislava (2023, cwk 18+19) labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Container Issues or pull requests relevant for Team 2: Container Infra and Tooling Sprint Copenhagen Sprint Copenhagen (2023, cwk 20+21)
Projects
Archived in project
Development

No branches or pull requests

4 participants