Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

*: implement configurable backup timeout #1908

Merged

Conversation

fanminshi
Copy link
Contributor

ref: #1906

switch spec.StorageType {
case api.BackupStorageTypeS3:
bs, err := handleS3(b.kubecli, spec.S3, spec.EtcdEndpoints, spec.ClientTLSSecret, b.namespace)
bs, err := handleS3(b.kubecli, spec.S3, backupTimeout, spec.EtcdEndpoints, spec.ClientTLSSecret, b.namespace)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rethink about it.
Instead of defining the timeout to be "saving backup", we should define the timeout to be the entire process. Each type could spend different amount of time doing other work not on saving backup. As a user, I might just define 60s and want the backup to be made around that limit without caring about getting secret, etc.

Copy link
Member

@hongchaodeng hongchaodeng Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In code:

ctx, cancel := context.WithTimeout(context.Background(), backupTimeout)
handleS3(ctx, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hongchaodeng yeah, that's my original thinking. However, the issue with an timeout for entire process is that kube client doesn't take-in context so I can't include kubecli activities such as retrieving secret as part of the total timeout. Hence, the Timeout only applies to retrieving+saving backup.

@fanminshi fanminshi force-pushed the add_configurable_backup_timeout branch from 39b90a6 to 767a19d Compare February 6, 2018 00:00
@hongchaodeng
Copy link
Member

kube client doesn't take-in context

k8s client upstream issue ref: kubernetes/kubernetes#46503

It also looks like in the REST API level client-go provides context support: https://github.com/kubernetes/client-go/blob/4def1285ff0e4d1fee7cc9e2684ef9923dca591f/rest/request.go#L411
But this is not propogated onto upper level.

There is a default 30s Dialer.Timeout:
https://github.com/kubernetes/client-go/blob/f4d24c8a47016d9ffcc8a44c4011d153cb413367/transport/cache.go#L84-L87
But no default timeout for http request.

My current conclusion is that this is not limited to backup. This is a general issue we need to work around or solve in upstream. Otherwise if it keeps hanging, anything using client-go will have problems.

My current suggest would be to leave a TODO signifying the current limitation of timeout.

@fanminshi fanminshi force-pushed the add_configurable_backup_timeout branch 5 times, most recently from 1a1d508 to 7ce0bf9 Compare February 6, 2018 19:37
@fanminshi
Copy link
Contributor Author

cc/ @hongchaodeng @hasbro17

@fanminshi fanminshi force-pushed the add_configurable_backup_timeout branch from 7ce0bf9 to 89bb119 Compare February 6, 2018 20:15
Copy link
Member

@hongchaodeng hongchaodeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BackupWriter.Write() would also need to take in a context.
For S3 upload, there is a method called UploadWithContext().
For ABS, there is a Timeout int but not context. Leave a TODO for now?
@rjtsdl Can you also provide some thoughts?

@@ -84,7 +84,7 @@ type BackupSource struct {

// BackupPolicy defines backup policy.
type BackupPolicy struct {
// timeout is the maximal time of retriving plus saving an etcd backup.
// timeout is the maximal allowed time in second of the entire backup process.
Timeout int64 `json:"timeout,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought, maybe we should rename this TimeoutInSecond or TimeoutSecond

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeoutInSecond seems good.

mapEps := make(map[string]*clientv3.Client)
var maxClient *clientv3.Client
maxRev := int64(0)
errors := make([]string, 0)
for _, endpoint := range endpoints {
// TODO: update clientv3 to 3.2.x and thenuse ctx as in clientv3.Config.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thenuse -> then use

@fanminshi fanminshi force-pushed the add_configurable_backup_timeout branch 2 times, most recently from a0799fb to 74da62a Compare February 6, 2018 20:40
@fanminshi fanminshi force-pushed the add_configurable_backup_timeout branch from 74da62a to aabcd45 Compare February 6, 2018 20:41
@fanminshi
Copy link
Contributor Author

all fixed. PTAL cc/ @hongchaodeng

@hongchaodeng
Copy link
Member

After offline discussion, we are going to add ctx to BackupWriter.Write() method too.

@hongchaodeng
Copy link
Member

Let's move forward this and change BackupWriter in following PR.

@hongchaodeng hongchaodeng merged commit ceac222 into coreos:master Feb 6, 2018
bradjones1 added a commit to bradjones1/etcd-operator that referenced this pull request Nov 22, 2019
The backup timeout was made configurable in coreos#1908 but is not documented. This adds a note about the default value and makes it clear this is an optional feature.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants