Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Specify placement group with Hetzner Autoscaler #5919

Closed
dominic-p opened this issue Jul 1, 2023 · 15 comments · May be fixed by #6999
Closed

Feature Request: Specify placement group with Hetzner Autoscaler #5919

dominic-p opened this issue Jul 1, 2023 · 15 comments · May be fixed by #6999
Labels
area/cluster-autoscaler area/provider/hetzner Issues or PRs related to Hetzner provider kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@dominic-p
Copy link

Which component are you using?:
cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Hetzner Placement Groups allow you to spread your VMs across different physical hardware which decreases the probability that some instances might fail together.

Describe the solution you'd like.:
It would be nice be able to specify a given placement group for nodes managed by the autoscaler just like we can currently specify things like the network or SSH key. Maybe a new env variable like HCLOUD_PLACEMENT_GROUP could be introduced.

Describe any alternative solutions you've considered.:
Of course, the status quo is fine. This would just make the cluster a bit more robust.

@dominic-p dominic-p added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 1, 2023
@apricote
Copy link
Member

Nice suggestion. Placement Groups have a limit of 10 instances, so using them for all your Nodes might become a Problem.

This can be introduced more nicely in the new JSON Format for Node Group configuration: HCLOUD_NODE_CONFIG

/area provider/hetzner

@k8s-ci-robot k8s-ci-robot added the area/provider/hetzner Issues or PRs related to Hetzner provider label Oct 20, 2023
@dominic-p
Copy link
Author

dominic-p commented Oct 23, 2023

Thanks for the reply! I wasn't aware of the new env variable. Yes, that doesn't seem like a good place to configure it. It's not entirely clear to me how it will interact with the exist variables it's replacing. If both HCLOUD_CLOUD_INIT and HCLOUD_CLUSTER_CONFIG are set, which one will take precedence?

So, if we were to add support for placement groups (acknowledging the 10 node limit), the JSON might look like this?

{
    "nodeConfigs": {
        "placementGroup": "name or ID here"
    }
}

Could a warning be shown in the logs and the setting disabled if a placement group is configured for a node pool that could have more than 10 nodes?

@apricote
Copy link
Member

apricote commented Nov 20, 2023

HCLOUD_CLUSTER_CONFIG will override HCLOUD_CLOUD_INIT:

cloudInit := n.manager.clusterConfig.LegacyConfig.CloudInit
if n.manager.clusterConfig.IsUsingNewFormat {
cloudInit = n.manager.clusterConfig.NodeConfigs[n.id].CloudInit
}

The JSON should look like this:

{
  "nodeConfigs": {
    "pool-1": {
      "placementGroup": "name or ID here"
    }
}

Could a warning be shown in the logs and the setting disabled if a placement group is configured for a node pool that could have more than 10 nodes?

I would prefer if we would fail more loudly by just refusing to start. This is how the config validation is done right now, see klog.Fatalf:

klog.Fatalf("Failed to parse pool spec `%s` provider: %v", nodegroupSpec, err)
}
validNodePoolName.MatchString(spec.name)
servers, err := manager.allServers(spec.name)
if err != nil {
klog.Fatalf("Failed to get servers for for node pool %s error: %v", nodegroupSpec, err)
}
if manager.clusterConfig.IsUsingNewFormat {
_, ok := manager.clusterConfig.NodeConfigs[spec.name]
if !ok {
klog.Fatalf("No node config present for node group id `%s` error: %v", spec.name, err)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 18, 2024
@dominic-p
Copy link
Author

Not stale for me.

@apricote
Copy link
Member

@dominic-p are you interested in implementing this? I can help out if you have any questions :)

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2024
@dominic-p
Copy link
Author

Thanks for following up. I can take a stab at it. Go's not my language, unfortunately, but the change may be trivial enough for me to muddle through. Can you point me in the direction (e.g. which file(s) I should start looking at)?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2024
@dominic-p
Copy link
Author

Still not stale for me. Still struggling to find the time to work on it.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 19, 2024
@dominic-p
Copy link
Author

Just checking in on this again.

If someone can point me in the right direction, I can take a stab at a PR for this. As I mentioned before, Go isn't a language I know, but it should be pretty straightforward.

@apricote
Copy link
Member

Some steps:

  1. Add new field PlacementGroup string to struct NodeConfig in hetzner_manager.go
  2. Add new field PlacementGroup *hcloud.PlacementGroup to struct hetznerNodeGroup in hetzner_node_group.go
  3. In hetzner_cloud_provider.go BuildHetzner in line 217: If a PlacementGroup is specified in NodeConfigs[spec.name]: check with the API that the placement group exists. Add it to the created hetznerNodeGroup in Line 219
  4. After the loop over all nodeGroups (Line 230): Check that each unique placement group has a max sum of 10 maxSize over all nodeGroups that have specified this placement group.

@dominic-p
Copy link
Author

Thank you so much for taking the time to lay that out for me. I took a shot at a PR. Let me know what you think.

As I said, go is not my language, but I think I was able to muddle through the steps you gave me.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/hetzner Issues or PRs related to Hetzner provider kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants