Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locksmithcl: reset endpoints when we set a new one #13

Merged
merged 1 commit into from
Aug 27, 2021

Conversation

tormath1
Copy link
Contributor

This behavior was working fine previously but no more with the etcd/v2
upgrade.

In the case we have an endpoint like https://127.0.0.1:2379, this one
will be added to the list of defaultEndpoints which lead to this
endpoints:

[]string{
	"http://127.0.0.1:2379",
	"http://127.0.0.1:4001",
	"https://127.0.0.1:2379",
}

It's not an issue when we have a retry mechanism - but in this case if
etcd use HTTPS and we try an HTTP endpoint, we will get an io.EOF
error.

Signed-off-by: Mathieu Tortuyaux mathieu@kinvolk.io


Otherwise, we could just do this:

                if err != nil {
-                       if errors.Is(err, syscall.ECONNREFUSED) {
+                       if errors.Is(err, syscall.ECONNREFUSED) || errors.Is(err, io.EOF) {
                                continue
                        }

but, as a user, I would expect that if I run locksmithctl -endpoint https://127.0.0.1:2379 I would run locksmithcl only against this endpoint, not a bundle of defautEndpoints + endpoint.

This issue has been caught by the CI with the coreos.locksmith.tls - I will rerun the kola tests before merging this one.

This behavior was working fine previously but no more with the etcd/v2
upgrade.

In the case we have an endpoint like `https://127.0.0.1:2379`, this one
will be _added_ to the list of `defaultEndpoints` which lead to this
endpoints:
```go
[]string{
	"http://127.0.0.1:2379",
	"http://127.0.0.1:4001",
	"https://127.0.0.1:2379",
}
```

It's not an issue when we have a retry mechanism - but in this case if
`etcd` use HTTPS and we try an `HTTP` endpoint, we will get an io.EOF
error.

Signed-off-by: Mathieu Tortuyaux <mathieu@kinvolk.io>
@tormath1 tormath1 self-assigned this Aug 27, 2021
@tormath1 tormath1 marked this pull request as ready for review August 27, 2021 07:54
@tormath1 tormath1 requested review from a team and invidian August 27, 2021 07:54
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 27, 2021

@invidian - codebase is a bit confusing.

In the snippet you shared, we set globalFlags.Endpoints = defaultEndpoints in case the user did not pass -endpoint .... At this very moment, we have:

globalFlags.Endpoints = []string{
    "http://127.0.0.1:2379",
    "http://127.0.0.1:4001",
}

Later, in the code, if we run in daemon mode, the flagsFromEnv function is called:
https://github.com/kinvolk/locksmith/blob/9287272dff87cb8c35eafeff9dcc91190d0b5ba6/locksmithctl/locksmithctl.go#L149-L152

This method will pass through all the LOCKSMITHD_* environment variable and call the Set method of each flag.
https://github.com/kinvolk/locksmith/blob/63cef2d0036187e5a54ca680d9652786a8ea29fa/locksmithctl/locksmithctl.go#L268-L272.

If we have LOCKSMITHD_ENDPOINT=https://localhost:2879 - which is the case in the coreos.locksmith.tls test:

[Unit]\nAfter=etcd-member.service\nRequires=etcd-member.service\n[Service]\nEnvironment=LOCKSMITHD_ETCD_CERTFILE=/etc/ssl/certs/locksmith-cert.pem\nEnvironment=LOCKSMITHD_ETCD_KEYFILE=/etc/ssl/certs/locksmith-key.pem\nEnvironment=LOCKSMITHD_ETCD_CAFILE=/etc/ssl/certs/ca-etcd-cert.pem\nEnvironment=LOCKSMITHD_ENDPOINT=https://localhost:2379\nEnvironment=LOCKSMITHD_REBOOT_WINDOW_START=00:00\nEnvironment=LOCKSMITHD_REBOOT_WINDOW_LENGTH=23h59m"

and since the endpoint.Set appends the value to the existing endpoints, we are ending with:

globalFlags.Endpoints = []string{
    "http://127.0.0.1:2379",
    "http://127.0.0.1:4001",
    "https://localhost:2879",
}

So the getClient method we patched in #11 will loop through this 3 endpoints while we explicitly gave one endpoint - and this will lead to the error mentioned in the PR description. 😕

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, considering that main() in locksmithctl/locksmithctl.go should most likely be splitted into 2 different functions.

@tormath1 tormath1 merged commit b54e4c6 into flatcar-master Aug 27, 2021
@tormath1 tormath1 deleted the tormath1/fix-endpoints branch August 27, 2021 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants