Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 KCP rollout maintains even distribution of failure domains #3405

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

benmoss
Copy link

@benmoss benmoss commented Jul 27, 2020

What this PR does / why we need it:
Makes it so that KCP will not include outdated machines in its determination of which failure domain to place new machines into. This was causing us to get into situations where we'd have an uneven distribution of machines across failure domains.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3396

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jul 27, 2020
@vincepri
Copy link
Member

/assign
/milestone v0.3.8

@k8s-ci-robot k8s-ci-robot added this to the v0.3.8 milestone Jul 27, 2020
@detiber
Copy link
Member

detiber commented Jul 28, 2020

/assign
reviewing this now

Copy link
Member

@detiber detiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would make sense to push the logic for retrieving the KubeadmConfigs and related infrastructure resources into FilterableMachineCollection somehow, that might allow for a better way to join the resources in the various places we intend to use them without having to pass in full maps of KubeadmConfigs and infrastructure resources in all the places that are consuming it currently.

@benmoss
Copy link
Author

benmoss commented Jul 28, 2020

/test pull-cluster-api-test
/test pull-cluster-api-e2e

@benmoss
Copy link
Author

benmoss commented Jul 28, 2020

I'm wondering if it would make sense to push the logic for retrieving the KubeadmConfigs and related infrastructure resources into FilterableMachineCollection somehow, that might allow for a better way to join the resources in the various places we intend to use them without having to pass in full maps of KubeadmConfigs and infrastructure resources in all the places that are consuming it currently.

Yeah I thought about this too. We could change FilterableMachineCollection to be a collection of something like

type Machine struct {
    machine clusterv1.Machine
    infraMachine unstructured.Unstructured
    kubeadmConfig bootstrap.KubeadmConfig
}

to make it so we keep the connections between those objects in the object graph.

@vincepri
Copy link
Member

@detiber wdyt of the above?

@detiber
Copy link
Member

detiber commented Jul 29, 2020

+1 from me to enriching the FilterableMachineCollection in the way suggested

@vincepri
Copy link
Member

@benmoss Any updates here?

@benmoss
Copy link
Author

benmoss commented Jul 30, 2020

+1 from me to enriching the FilterableMachineCollection in the way suggested

After looking into this a bit and spiking it a little here, I spoke with @detiber and we decided it was probably better to defer this refactoring since it's more involved than we imagined.

@detiber
Copy link
Member

detiber commented Jul 30, 2020

/test pull-cluster-api-e2e
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 30, 2020
@vincepri
Copy link
Member

@benmoss Squash commits?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2020
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 31, 2020
@vincepri
Copy link
Member

/assign @detiber @ncdc

@detiber
Copy link
Member

detiber commented Jul 31, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 31, 2020
@benmoss
Copy link
Author

benmoss commented Jul 31, 2020

Looks like prow is just a lil busted right now

@vincepri
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit 6311dc2 into kubernetes-sigs:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KCP rollout causes uneven failure domain distribution
5 participants