Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

app-admin/locksmith: bump commit ID #1161

Merged
merged 1 commit into from
Aug 27, 2021
Merged

Conversation

@tormath1 tormath1 self-assigned this Aug 3, 2021
@tormath1
Copy link
Contributor Author

tormath1 commented Aug 3, 2021

getting random failure on the etcd client when the endpoint is not explicit - not sure why but it might be related to the etcd upgrade maybe:

core@localhost ~ $ for i in {1..10}; do locksmithctl status && sleep 1; done
Error initializing etcd client: dial tcp 127.0.0.1:4001: connect: connection refused
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Error initializing etcd client: dial tcp 127.0.0.1:4001: connect: connection refused
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Error initializing etcd client: dial tcp 127.0.0.1:4001: connect: connection refused
Error initializing etcd client: dial tcp 127.0.0.1:4001: connect: connection refused
core@localhost ~ $ for i in {1..10}; do locksmithctl -endpoint http://localhost:2379 status && sleep 1; done
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1
Available: 1
Max: 1

@jepio
Copy link
Contributor

jepio commented Aug 3, 2021

The default endpoints list is [localhost:2379,localhost:4001]. Looking at the code it seems to that "go.etcd.io/etcd/client" shuffles the passed in endpoints. But the comments say that if one is unreachable it will try another one, which is not happening correctly.

@tormath1
Copy link
Contributor Author

tormath1 commented Aug 3, 2021

yeah I arrived to the very same conclusion - it's very weird, I'll try tomorrow to revert / upgrade the etcd lib to see if it changes something.

Reverting flatcar/locksmith@1b5eb50 fixes the issue; this PR seems to be the cause of the trouble: etcd-io/etcd#5888 but it seems it has been resolved in etcd-io/etcd#8515.

The issue is the following: when we create the etcd client, we call the getClient which initialize a lock etcd client. In the init, it calls a Create (a Set) methods which provides the isOneShotCtxValue -> if there is an error, the action won't be retry. So basically we don't have anymore the endpoints resilience - we could update the tests to run on a specific endpoint but it keeps the main issue: user with multiple endpoints can experiencing this kind of issue.

@tormath1 tormath1 added the on-hold this PR or this issue is on-hold - it's blocked by something external label Aug 13, 2021
@tormath1
Copy link
Contributor Author

tormath1 commented Aug 13, 2021

This PR is currently on-hold because locksmithctl behaves not correctly due to the etcd upgrade.

We are currently exploring various fixes, like a custom patch (flatcar/locksmith#11) or upgrade to etcd/v3 (flatcar/locksmith#12).

EDIT: we merged the following patch flatcar/locksmith#11 in order to properly make the upgrade to etcd/v3; this PR is no more on-hold.

@tormath1 tormath1 removed the on-hold this PR or this issue is on-hold - it's blocked by something external label Aug 26, 2021
@tormath1 tormath1 force-pushed the tormath1/update-locksmith-commit branch from dadb202 to f362762 Compare August 26, 2021 07:54
closes flatcar/Flatcar#407

Signed-off-by: Mathieu Tortuyaux <mathieu@kinvolk.io>
@tormath1 tormath1 force-pushed the tormath1/update-locksmith-commit branch from 5a29d24 to a8fcf2d Compare August 27, 2021 14:28
@tormath1 tormath1 marked this pull request as ready for review August 27, 2021 14:28
@tormath1 tormath1 requested a review from a team August 27, 2021 14:29
@tormath1 tormath1 merged commit 07cdc2f into main Aug 27, 2021
@tormath1 tormath1 deleted the tormath1/update-locksmith-commit branch August 27, 2021 15:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants