Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locksmithctl/locksmithcl: fix endpoints resilience #11

Merged
merged 1 commit into from
Aug 26, 2021

Conversation

tormath1
Copy link
Contributor

@tormath1 tormath1 commented Aug 4, 2021

with the recent etcd upgrade (1b5eb50),
the endpoints resilience were not ensured -> having failing endpoints
would make the locksmithcl commands fail.

in this commit, we patch the software (and not the vendor), to reproduce
a kind of resilience: we loop over the endpoints - if one of them is not
reachable we continue to the over.
The loop could have been done elsewhere in the Create command, but we
don't have access to the client.httpKeysAPI which holds the config
endpoints.

Signed-off-by: Mathieu Tortuyaux mathieu@kinvolk.io

See also: flatcar-archive/coreos-overlay#1161

@tormath1 tormath1 requested review from invidian and a team August 4, 2021 10:34
@tormath1 tormath1 self-assigned this Aug 4, 2021
Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tormath1 maybe we could try upgrading to latest etcd client version, maybe there was a regression which is now fixed? I know there has been a lot of internal changes in etcd v3.5.0, which makes it much easier to consume via go modules.

@invidian
Copy link
Member

invidian commented Aug 4, 2021

Now it came to my mind, this should be quite easy to have integration tests, perhaps we can then ensure the desired behavior.

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 4, 2021

maybe we could try upgrading to latest etcd client version, maybe there was a regression which is now fixed? I know there has been a lot of internal changes in etcd v3.5.0, which makes it much easier to consume via go modules.

@invidian I spent quite some times to figure out the changes on the concerned files - but yeah maybe the issue has been fixed deeper elsewhere. I'll give a try to the upgrade - would be nice also in order to prepare any v3 migration.

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 4, 2021

For the record, I upgraded etcd to /v2 - we still have the same behavior (see: https://github.com/kinvolk/locksmith/commits/tormath1/upgrade-etcd)

@invidian
Copy link
Member

For the record, I upgraded etcd to /v2 - we still have the same behavior (see: https://github.com/kinvolk/locksmith/commits/tormath1/upgrade-etcd)

And what about v3 client?

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 10, 2021

@invidian I started to have a look last Thursday - there are two things to considere:

@invidian
Copy link
Member

invidian commented Aug 10, 2021

Thanks for checking @tormath1. You're perhaps right about migration to v3, it doesn't seem trivial, if the data is stored in a different place. Meaning upgrading only some clients to v3 can cause multiple nodes to be updated simultaneously. Perhaps we could create an issue for this.

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 13, 2021

@invidian I gave a deeper try to etcd/v3 upgrade - it compiles; let's run some tests to see the damages haha.

EDIT:

core@localhost ~ $ ./locksmithctl status
Available: 1
Max: 1
core@localhost ~ $ ./locksmithctl set-max 1234
Old-Max: 1
Max: 1234
core@localhost ~ $ ./locksmithctl status
Available: 1
Max: 1

works fine haha

@tormath1
Copy link
Contributor Author

closed in favor of: #12

@tormath1 tormath1 closed this Aug 13, 2021
@invidian invidian deleted the tormath1/fix-side-effect-upgrade branch August 14, 2021 12:01
@tormath1 tormath1 restored the tormath1/fix-side-effect-upgrade branch August 25, 2021 13:46
@tormath1 tormath1 reopened this Aug 25, 2021
@tormath1
Copy link
Contributor Author

@kinvolk/flatcar-maintainers @invidian let's bring back this PR to life.

Since the on-going PR (#12) to upgrade to etcd/v3 will take some time (need to properly test it, rewrite a few things); let's patch the current locksmith with the following changes to ship locksmith in the OS.

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested improvements to returned error messages and adding a comment to describe the situation, otherwise looks good 👍

}
ec, err := client.New(cfg)
if err != nil {
return nil, fmt.Errorf("unable to create etcd client: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("unable to create etcd client: %w", err)
return nil, fmt.Errorf("creating etcd client: %w", err)

if errors.Is(err, syscall.ECONNREFUSED) {
continue
}
return nil, fmt.Errorf("unable to create etcd lock client: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("unable to create etcd lock client: %w", err)
return nil, fmt.Errorf("creating etcd lock client: %w", err)

}
return lc, err

return nil, errors.New("unable to create an etcd client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, errors.New("unable to create an etcd client")
return nil, errors.New("no etcd endpoints available, tried: %s", strings.Join(",", globalFlags.Endpoints))

locksmithctl/locksmithctl.go Show resolved Hide resolved
Copy link
Member

@krnowak krnowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but the commit message is confusing. What Create command? I don't see locksmithctl having a create command. You probably meant something about etcd internal create command or something?

@tormath1
Copy link
Contributor Author

@krnowak right, the Create actually refers to the Create method in the KeysAPI interface but it's confusing and I don't remember what I wanted to say - I'll definitely reword the commit.

@tormath1
Copy link
Contributor Author

@invidian @krnowak thanks for your review - comments have been adressed.

New body's commit message will be:

with the recent etcd upgrade (1b5eb50),
the endpoints resilience were not ensured -> having failing endpoints
would make the `locksmithcl` commands fail.

in this commit, we patch the software (and not the vendor), to reproduce
a kind of resilience: we loop over the endpoints - if one of them is not
reachable we continue to the over.
this patch can be reverted once the upgrade to etcd/v3 will be done.

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic! Great work @tormath1 🎉

Copy link
Member

@invidian invidian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing, can we squash commits before merging?

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 26, 2021

@invidian obviously I'll squash the fixup! commits before merging if this is what you're talking about. 🥳

with the recent etcd upgrade (1b5eb50),
the endpoints resilience were not ensured -> having failing endpoints
would make the `locksmithcl` commands fail.

in this commit, we patch the software (and not the vendor), to reproduce
a kind of resilience: we loop over the endpoints - if one of them is not
reachable we continue to the over.
this patch can be reverted once the upgrade to etcd/v3 will be done.

Signed-off-by: Mathieu Tortuyaux <mathieu@kinvolk.io>
@tormath1 tormath1 force-pushed the tormath1/fix-side-effect-upgrade branch from 0cc129b to 860103c Compare August 26, 2021 07:15
@tormath1 tormath1 merged commit 63cef2d into flatcar-master Aug 26, 2021
@tormath1 tormath1 deleted the tormath1/fix-side-effect-upgrade branch August 26, 2021 07:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants