Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable setting failureDomain as part of a cluster's topology #5636

Closed
yastij opened this issue Nov 10, 2021 · 9 comments · Fixed by #5850
Closed

enable setting failureDomain as part of a cluster's topology #5636

yastij opened this issue Nov 10, 2021 · 9 comments · Fixed by #5850
Assignees
Labels
area/clusterclass Issues or PRs related to clusterclass kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@yastij
Copy link
Member

yastij commented Nov 10, 2021

User Story

As a user would like to reference a clusterClass in a cluster, and configure the failure domain of workers in its topology, to be able to re-use ClusterClass more broadly

Detailed Description

Today we don't offer users the ability to configure failureDomain for the various machineDeployment that is part of a ClusterClass which limits a bit the re-usability. We likely also need to define what is the default behaviour is when not setting such field in the cluster's topology.

Anything else you would like to add:

not the same, but #3358 might impact how we do this. We've had asks from CAPV users to spread machineDeployments across failure Domains (e.g. spread an MD that is GPU enabled)

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 10, 2021
@yastij
Copy link
Member Author

yastij commented Nov 10, 2021

@fabriziopandini
Copy link
Member

/area topology
/milestone v1.1
@yastij if I got this right it is your intention to work on this, right?

@k8s-ci-robot k8s-ci-robot added this to the v1.1 milestone Nov 10, 2021
@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 10, 2021

Should we consider also failure domains for control plane machines?

@vincepri
Copy link
Member

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Nov 10, 2021
@yastij
Copy link
Member Author

yastij commented Nov 10, 2021

@fabriziopandini - yeah, planning to work on this. As for "should we include control plane machines?" I'm not sure we want to.

The options for CP to me are:

  1. not specify anything and keep today's behaviour which read from the cluster's status
  2. add a failureDomain field, that might not achieve what we want as it'll colocate everything
  3. add a list of failureDomains, which is a subset of what's listed in the cluster's status. It gives more control to the user if they don't want to consume an AZ listed in the status, but that would need changes to KCP.

for ControlPlane I'm leaning towards 1, for Workers I'm leaning toward 3 as it has real use cases and value. Also not against aligning options for both (for the sake of consistency)

@yastij
Copy link
Member Author

yastij commented Nov 10, 2021

/assign

@vincepri
Copy link
Member

+1 to keep control plane delegated to the control plane provider

@CecileRobertMichon
Copy link
Contributor

+1

@fabriziopandini
Copy link
Member

/unassing @yastij
/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants