Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash when launching a backup session with volume snapshot #1072

Closed
sebastien-prudhomme opened this issue Apr 6, 2020 · 7 comments · Fixed by #1073
Closed

Crash when launching a backup session with volume snapshot #1072

sebastien-prudhomme opened this issue Apr 6, 2020 · 7 comments · Fixed by #1073

Comments

@sebastien-prudhomme
Copy link
Contributor

Hi,

Got this error in the job that is implementing the backup session:

 kubectl logs stash-vs-deploy-my-awesome-deployment-session-6j8wc
I0405 21:09:43.929486       1 log.go:172] FLAG: --alsologtostderr="false"
I0405 21:09:43.930922       1 log.go:172] FLAG: --backupsession="my-awesome-deployment-session"
I0405 21:09:43.930933       1 log.go:172] FLAG: --bypass-validating-webhook-xray="false"
I0405 21:09:43.930941       1 log.go:172] FLAG: --enable-analytics="false"
I0405 21:09:43.930970       1 log.go:172] FLAG: --help="false"
I0405 21:09:43.930978       1 log.go:172] FLAG: --kubeconfig=""
I0405 21:09:43.930987       1 log.go:172] FLAG: --log-flush-frequency="5s"
I0405 21:09:43.930998       1 log.go:172] FLAG: --log_backtrace_at=":0"
I0405 21:09:43.931015       1 log.go:172] FLAG: --log_dir=""
I0405 21:09:43.931024       1 log.go:172] FLAG: --logtostderr="true"
I0405 21:09:43.931031       1 log.go:172] FLAG: --master=""
I0405 21:09:43.931038       1 log.go:172] FLAG: --metrics-enabled="true"
I0405 21:09:43.931055       1 log.go:172] FLAG: --pushgateway-url="http://stash.stash.svc:56789"
I0405 21:09:43.931063       1 log.go:172] FLAG: --service-name="stash-operator"
I0405 21:09:43.931071       1 log.go:172] FLAG: --stderrthreshold="0"
I0405 21:09:43.931078       1 log.go:172] FLAG: --target-kind="Deployment"
I0405 21:09:43.931085       1 log.go:172] FLAG: --target-name="my-awesome-deployment"
I0405 21:09:43.931093       1 log.go:172] FLAG: --use-kubeapiserver-fqdn-for-aks="true"
I0405 21:09:43.931100       1 log.go:172] FLAG: --v="3"
I0405 21:09:43.931108       1 log.go:172] FLAG: --vmodule=""
W0405 21:09:43.933440       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x16679f9]

goroutine 1 [running]:
stash.appscode.dev/stash/pkg/util.WaitUntilVolumeSnapshotReady.func1(0xc0008a6d40, 0x1391d4d, 0x22fc4c0)
	/src/pkg/util/kubernetes.go:450 +0xc9
k8s.io/apimachinery/pkg/util/wait.pollImmediateInternal(0xc000710ce0, 0xc0008a6da8, 0xc000710ce0, 0x7f27bb144008)
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:338 +0x2b
k8s.io/apimachinery/pkg/util/wait.PollImmediate(0x2faf080, 0x68c61714000, 0xc0008a6da8, 0xc0008a6dc8, 0x40cb48)
	/src/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:334 +0x4d
stash.appscode.dev/stash/pkg/util.WaitUntilVolumeSnapshotReady(0x2a3eca0, 0xc0007b4c60, 0xc000710ca0, 0x19, 0x0, 0x0, 0xc000526c99, 0x7, 0xc00070a070, 0x62, ...)
	/src/pkg/util/kubernetes.go:448 +0x7e
stash.appscode.dev/stash/pkg/cmds.(*VSoption).createVolumeSnapshot(0xc000538000, 0xc00034c520, 0x1d, 0x0, 0x0, 0xc000331380, 0x7, 0xc000099680, 0x60, 0xc00082a990, ...)
	/src/pkg/cmds/create_volumesnapshot.go:183 +0x258
stash.appscode.dev/stash/pkg/cmds.NewCmdCreateVolumeSnapshot.func1(0xc00074a280, 0xc000575080, 0x0, 0xb, 0x0, 0x0)
	/src/pkg/cmds/create_volumesnapshot.go:105 +0x4ca
github.com/spf13/cobra.(*Command).execute(0xc00074a280, 0xc000574fd0, 0xb, 0xb, 0xc00074a280, 0xc000574fd0)
	/src/vendor/github.com/spf13/cobra/command.go:826 +0x460
github.com/spf13/cobra.(*Command).ExecuteC(0xc00021d900, 0x2, 0x0, 0x0)
	/src/vendor/github.com/spf13/cobra/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
	/src/vendor/github.com/spf13/cobra/command.go:864
main.main()
	/src/main.go:40 +0x8d

Stash version is v0.9.0-rc.6.

On the cloud provider (Scaleway), the volume snapshot have been created. It's also displayed as a volumesnapshot resource in Kubernetes:

kubectl get volumesnapshot
NAME                        READYTOUSE   SOURCEPVC           SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
scaleway-pvc-test-session   true         scaleway-pvc-test                           1Gi           scw-snapshot    snapcontent-eb3f890e-91f9-406a-8289-8124dcc9c638   10h            10h

kubectl get volumesnapshotcontent
NAME                                               READYTOUSE   RESTORESIZE   DELETIONPOLICY   DRIVER             VOLUMESNAPSHOTCLASS   VOLUMESNAPSHOT              AGE
snapcontent-eb3f890e-91f9-406a-8289-8124dcc9c638   true         1073741824    Delete           csi.scaleway.com   scw-snapshot          scaleway-pvc-test-session   10h
@hossainemruz
Copy link
Contributor

What is your Kubernetes version?

@sebastien-prudhomme
Copy link
Contributor Author

@hossainemruz I'm using k3s version v1.17.3+k3s1, so based on k8s v1.17.3.

@hossainemruz
Copy link
Contributor

That might causing the problem. VolumeSnapshot is now a beta features. We haven't updated yet. We are still using VolumeSnapshot's alpha API.

@sebastien-prudhomme
Copy link
Contributor Author

@hossainemruz the code that crashed seems to refer to v1beta1 api not alpha

if obj, err := c.SnapshotV1beta1().VolumeSnapshots(meta.Namespace).Get(meta.Name, metav1.GetOptions{}); err == nil {

This was done in PR #987

@hossainemruz
Copy link
Contributor

You are right. The problem is the obj.Status itself is a pointer. We didn't check for it.

@hossainemruz
Copy link
Contributor

In v1alpha1, the Status filed was not a pointer. It has been changed in v1beta1 API. We didn't notice this during updating to v1beta1.

v1alpha1:
https://github.com/kubernetes-csi/external-snapshotter/blob/4a3e2fea8f49945c6a4840c17b7e9644d3a199c4/pkg/apis/volumesnapshot/v1alpha1/types.go#L53

v1beta1:
https://github.com/kubernetes-csi/external-snapshotter/blob/ed6cd4a7ec1ab012c1b4aa87f66c2f9f2f0d15f6/pkg/apis/volumesnapshot/v1beta1/types.go#L60

Thank you for bringing it to our attention. This should be fixed by #1073.

I have already built an custom docker image with this fix. You can try that with the following installation process:

helm install stash-operator appscode/stash \
     --version v0.9.0-rc.6                 \
     --namespace kube-system               \
	 --set operator.registry=appscodeci    \
     --set operator.tag=fix-vs-nil-ptr-exception_linux_amd64

@sebastien-prudhomme
Copy link
Contributor Author

Can confirm the problem is gone. In the logs:

I0406 12:27:17.647410       1 log.go:172] FLAG: --alsologtostderr="false"
I0406 12:27:17.647972       1 log.go:172] FLAG: --backupsession="my-awesome-deployment-session"
I0406 12:27:17.647985       1 log.go:172] FLAG: --bypass-validating-webhook-xray="false"
I0406 12:27:17.648154       1 log.go:172] FLAG: --enable-analytics="false"
I0406 12:27:17.648166       1 log.go:172] FLAG: --help="false"
I0406 12:27:17.648173       1 log.go:172] FLAG: --kubeconfig=""
I0406 12:27:17.648181       1 log.go:172] FLAG: --log-flush-frequency="5s"
I0406 12:27:17.648197       1 log.go:172] FLAG: --log_backtrace_at=":0"
I0406 12:27:17.648210       1 log.go:172] FLAG: --log_dir=""
I0406 12:27:17.648222       1 log.go:172] FLAG: --logtostderr="true"
I0406 12:27:17.648229       1 log.go:172] FLAG: --master=""
I0406 12:27:17.648237       1 log.go:172] FLAG: --metrics-enabled="true"
I0406 12:27:17.648245       1 log.go:172] FLAG: --pushgateway-url="http://stash.stash.svc:56789"
I0406 12:27:17.648255       1 log.go:172] FLAG: --service-name="stash-operator"
I0406 12:27:17.648263       1 log.go:172] FLAG: --stderrthreshold="0"
I0406 12:27:17.648270       1 log.go:172] FLAG: --target-kind="Deployment"
I0406 12:27:17.648283       1 log.go:172] FLAG: --target-name="my-awesome-deployment"
I0406 12:27:17.648291       1 log.go:172] FLAG: --use-kubeapiserver-fqdn-for-aks="true"
I0406 12:27:17.648298       1 log.go:172] FLAG: --v="3"
I0406 12:27:17.648311       1 log.go:172] FLAG: --vmodule=""
W0406 12:27:17.650796       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0406 12:27:51.164902       1 volumesnapshot_cleanup_policy.go:154] VolumeSnapshot kept: 1 removed: 0
I0406 12:27:51.181599       1 status.go:104] Updating status of BackupSession: default/my-awesome-deployment-session for host: scaleway-pvc-test
I0406 12:27:51.248012       1 main.go:43] Exiting Stash Main

Backupsession is also in Succeeded phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants