Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd client for kubeadm control plane #2237

Merged
merged 2 commits into from
Feb 4, 2020

Conversation

chuckha
Copy link
Contributor

@chuckha chuckha commented Jan 31, 2020

What this PR does / why we need it:

This adds a response formatter for the clientv3 etcd client as well as an adapter to add retries to the default clientv3 etcd client.

This client will be used in the health check client for the the decision to upgrade, scale up or scale down.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Related to #2243

cc @randomvariable

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 31, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chuckha

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 31, 2020
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 31, 2020
@ncdc ncdc added this to the v0.3.0 milestone Feb 3, 2020
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 3, 2020
@chuckha chuckha changed the title [WIP] [not ready for review] Etcd client Etcd client for kubeadm control plane Feb 3, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 3, 2020
Allows user to contact etcd with arbitrary connection types, useful
when proxying through the Kubernetes API server.

Protobuf types are wrapped such that downstream consumers do not
need to import etcd packages.

Signed-off-by: Naadir Jeewa <jeewan@vmware.com>
@detiber
Copy link
Member

detiber commented Feb 3, 2020

/assign @randomvariable
/assign @dlipovetsky

}

// Member retrieves the Member information for a given peer.
func (c *Client) Member(ctx context.Context, peerURL string) (*Member, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we plan to use this?

I can imagine returning the etcd member data based on the etcd member name would be useful, because if you know the Machine, you can find the NodeRef, whose Name will be equal to its etcd member name.

Copy link
Contributor

@dlipovetsky dlipovetsky Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be defined in the etcd interface? Nevermind--I think. The etcd interface isn't for this client 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we make this a filter function, e.g., MemberForName(members []*Member, name string) *Member?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on that a little more? What's the use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding your first comment -- I am more than happy to delete code that isn't useful! If you don't need this function we should absolutely delete it. I was keeping the interface almost the same as the original PR and looking for exactly this feedback.

If you don't need it, we should remove it.

Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending comments

}

// Status returns the cluster status as known by the connected etcd server.
func (c *Client) Status(ctx context.Context) (*clientv3.StatusResponse, error) {
Copy link
Contributor

@dlipovetsky dlipovetsky Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to return the raw clientv3.StatusResponse? I think we would want a wrapper struct like Member.

We can mostly mirror clientv3.StatusResponse, but probably want to make MemberID, ClusterID top-level fields (as opposed to being fields under the Header field).

There's some overlap with fields in Member. For example, clientv3.StatusResponse.[]Errors appears to hold the same information as we put in Member.Alarms, but I'm not certain.

I actually can't think of a single field from Status that scale up/down needs. Maybe we can remove this altogether for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove it if it's unneeded. We can always add it back later if we need it.

// Member struct defines an etcd member; it is used to avoid spreading
// github.com/coreos/etcd dependencies.
type Member struct {
// ID is the ID of this cluster member
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every member of the same cluster has the same ClusterID, which is included in the standard response headers of an etcd API request. It would help to record this value here, for example for any consumer that wants to easily verify that multiple members belong to the same cluster.

The value is defined by the first etcd member, and never changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh, nice. That could be much simpler than checking the peer list.

Copy link
Contributor Author

@chuckha chuckha Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we make this a stateful object/cache api responses?

Edit: This value does get populated, or at least, in theory it should, by pbMemberToMember.

return nil, errors.Wrap(err, "failed to get list of members for etcd cluster")
}

alarms, err := c.Alarms(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is that this makes a separate API call. However, it is one call, not N calls (where N is the number of members).

The upside is that the user does not need to query Alarms separately and correlate entries returned by Alarms to entries in the list of Members by using the MemberID.

I don't have strong feelings either way.

Copy link
Contributor Author

@chuckha chuckha Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I am thinking of Client in this package as basically a struct that uses the etcd client to fetch data to populate the defined data types (Member. Client methods are the place I would expect to wrap multiple API calls to etcd to build out a data structure.

Are you worried about an additional API call for performance reasons?

return nil, err
}
for _, m := range members {
if m.PeerURLs[0] == peerURL {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the member name to identify the member in the list... But if we use peerURL, then we should iterate over all peerURLs here.

Copy link
Contributor Author

@chuckha chuckha Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this to use name if you verify you need this functionality at all.

I think we should try to keep the interface as simple as possible without trying to make it overly performant before we decide this is actually a performance problem. The signature MemberForNames(members []*Members, name string) *Member indicates that you'd be keeping a reference to a list of members outside this API, something like this:

members := client.Members()
// some more work happens here
client.MemberForName(members, name)

But this has some issues due to the members list potentially becoming stale.

Let's always fetch the data to reduce split brain as much as possible with the understanding that if this becomes a performance problem then we can either add a caching layer or redesign the API. We have a lot of flexibility here since it's in the internal/ package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the member name to identify the member in the list

depending on the state of the joining members, they can end up being without a name.
URL or ID are unique at all times, so IMO any function used to "get" a member from a cached list should not use name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent was not to do any caching here, and any time you call into this client, it's explicitly going to go out and make some requests. How we choose to cache that is the consumer's concern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depending on the state of the joining members, they can end up being without a name.

You make a good point, but I want to clarify it a little: When a member is added, it has no name. When the member joins, it gets a name. So a member without a name is unstarted.

so IMO any function used to "get" a member from a cached list should not use name.

I didn't consider this function taking a "cached list" as input, because the user owns the list, not the etcd client.

The control plane controller will need to find a Member by its name, because the name is the one piece of information associated with a Machine (via its NodeRef). The member ID is not associated with a Machine in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this to use name if you verify you need this functionality at all.

Good question 🙂 In fact, I do not need Member in this client for scale up or down.

Signed-off-by: Chuck Ha <chuckh@vmware.com>
@chuckha
Copy link
Contributor Author

chuckha commented Feb 4, 2020

I ran this against a 3 node local cluster and the formatted responses look good (a super basic smoke test that tells us we will get values out of this if the connection is correct)

@randomvariable
Copy link
Member

Rather than overthink what we should do here, let's start consuming it for the healthchecks in the controller and make changes as needed.

@randomvariable
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 4, 2020
@chuckha
Copy link
Contributor Author

chuckha commented Feb 4, 2020

/test pull-cluster-api-test

A known flake, see #2257

@k8s-ci-robot k8s-ci-robot merged commit 5a616e2 into kubernetes-sigs:master Feb 4, 2020
@chuckha chuckha deleted the etcd-client branch February 4, 2020 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants