Feature Request: Specify placement group with Hetzner Autoscaler #5919

dominic-p · 2023-07-01T08:00:39Z

Which component are you using?:
cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Hetzner Placement Groups allow you to spread your VMs across different physical hardware which decreases the probability that some instances might fail together.

Describe the solution you'd like.:
It would be nice be able to specify a given placement group for nodes managed by the autoscaler just like we can currently specify things like the network or SSH key. Maybe a new env variable like HCLOUD_PLACEMENT_GROUP could be introduced.

Describe any alternative solutions you've considered.:
Of course, the status quo is fine. This would just make the cluster a bit more robust.

The text was updated successfully, but these errors were encountered:

apricote · 2023-10-20T05:52:00Z

Nice suggestion. Placement Groups have a limit of 10 instances, so using them for all your Nodes might become a Problem.

This can be introduced more nicely in the new JSON Format for Node Group configuration: HCLOUD_NODE_CONFIG

/area provider/hetzner

dominic-p · 2023-10-23T18:58:23Z

Thanks for the reply! I wasn't aware of the new env variable. Yes, that doesn't seem like a good place to configure it. It's not entirely clear to me how it will interact with the exist variables it's replacing. If both HCLOUD_CLOUD_INIT and HCLOUD_CLUSTER_CONFIG are set, which one will take precedence?

So, if we were to add support for placement groups (acknowledging the 10 node limit), the JSON might look like this?

{
    "nodeConfigs": {
        "placementGroup": "name or ID here"
    }
}

Could a warning be shown in the logs and the setting disabled if a placement group is configured for a node pool that could have more than 10 nodes?

apricote · 2023-11-20T07:16:13Z

HCLOUD_CLUSTER_CONFIG will override HCLOUD_CLOUD_INIT:

autoscaler/cluster-autoscaler/cloudprovider/hetzner/hetzner_node_group.go

Lines 416 to 420 in 81eed96

    
           cloudInit := n.manager.clusterConfig.LegacyConfig.CloudInit 
        
           if n.manager.clusterConfig.IsUsingNewFormat { 
        
           	cloudInit = n.manager.clusterConfig.NodeConfigs[n.id].CloudInit 
        
           }

The JSON should look like this:

{
  "nodeConfigs": {
    "pool-1": {
      "placementGroup": "name or ID here"
    }
}

Could a warning be shown in the logs and the setting disabled if a placement group is configured for a node pool that could have more than 10 nodes?

I would prefer if we would fail more loudly by just refusing to start. This is how the config validation is done right now, see klog.Fatalf:

autoscaler/cluster-autoscaler/cloudprovider/hetzner/hetzner_cloud_provider.go

Lines 203 to 215 in 81eed96

    
           	klog.Fatalf("Failed to parse pool spec `%s` provider: %v", nodegroupSpec, err) 
        
           } 
        
           validNodePoolName.MatchString(spec.name) 
        
           servers, err := manager.allServers(spec.name) 
        
           if err != nil { 
        
           	klog.Fatalf("Failed to get servers for for node pool %s error: %v", nodegroupSpec, err) 
        
           } 
        
           if manager.clusterConfig.IsUsingNewFormat { 
        
           	_, ok := manager.clusterConfig.NodeConfigs[spec.name] 
        
           	if !ok { 
        
           		klog.Fatalf("No node config present for node group id `%s` error: %v", spec.name, err)

k8s-triage-robot · 2024-02-18T08:06:12Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

dominic-p · 2024-02-19T20:29:37Z

Not stale for me.

apricote · 2024-02-20T06:23:54Z

@dominic-p are you interested in implementing this? I can help out if you have any questions :)

/remove-lifecycle stale

dominic-p · 2024-02-20T17:58:47Z

Thanks for following up. I can take a stab at it. Go's not my language, unfortunately, but the change may be trivial enough for me to muddle through. Can you point me in the direction (e.g. which file(s) I should start looking at)?

k8s-triage-robot · 2024-05-20T18:22:23Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

dominic-p · 2024-05-20T19:44:37Z

Still not stale for me. Still struggling to find the time to work on it.

k8s-triage-robot · 2024-06-19T20:34:49Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

dominic-p · 2024-06-19T21:39:57Z

Just checking in on this again.

If someone can point me in the right direction, I can take a stab at a PR for this. As I mentioned before, Go isn't a language I know, but it should be pretty straightforward.

apricote · 2024-06-28T07:08:48Z

Some steps:

Add new field PlacementGroup string to struct NodeConfig in hetzner_manager.go
Add new field PlacementGroup *hcloud.PlacementGroup to struct hetznerNodeGroup in hetzner_node_group.go
In hetzner_cloud_provider.go BuildHetzner in line 217: If a PlacementGroup is specified in NodeConfigs[spec.name]: check with the API that the placement group exists. Add it to the created hetznerNodeGroup in Line 219
After the loop over all nodeGroups (Line 230): Check that each unique placement group has a max sum of 10 maxSize over all nodeGroups that have specified this placement group.

See kubernetes#5919

dominic-p · 2024-07-03T06:27:21Z

Thank you so much for taking the time to lay that out for me. I took a shot at a PR. Let me know what you think.

As I said, go is not my language, but I think I was able to muddle through the steps you gave me.

k8s-triage-robot · 2024-08-02T07:15:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-08-02T07:15:42Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

dominic-p added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 1, 2023

jbartosik added the area/cluster-autoscaler label Jul 3, 2023

k8s-ci-robot added the area/provider/hetzner Issues or PRs related to Hetzner provider label Oct 20, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 18, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 19, 2024

dominic-p added a commit to dominic-p/autoscaler that referenced this issue Jul 3, 2024

WIP add support for node pool placement group config

5a781e6

See kubernetes#5919

dominic-p mentioned this issue Jul 3, 2024

Add support for node pool placement group config #6999

Open

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Specify placement group with Hetzner Autoscaler #5919

Feature Request: Specify placement group with Hetzner Autoscaler #5919

dominic-p commented Jul 1, 2023

apricote commented Oct 20, 2023

dominic-p commented Oct 23, 2023 •

edited

Loading

apricote commented Nov 20, 2023 •

edited

Loading

k8s-triage-robot commented Feb 18, 2024

dominic-p commented Feb 19, 2024

apricote commented Feb 20, 2024

dominic-p commented Feb 20, 2024

k8s-triage-robot commented May 20, 2024

dominic-p commented May 20, 2024

k8s-triage-robot commented Jun 19, 2024

dominic-p commented Jun 19, 2024

apricote commented Jun 28, 2024

dominic-p commented Jul 3, 2024

k8s-triage-robot commented Aug 2, 2024

k8s-ci-robot commented Aug 2, 2024

Feature Request: Specify placement group with Hetzner Autoscaler #5919

Feature Request: Specify placement group with Hetzner Autoscaler #5919

Comments

dominic-p commented Jul 1, 2023

apricote commented Oct 20, 2023

dominic-p commented Oct 23, 2023 • edited Loading

apricote commented Nov 20, 2023 • edited Loading

k8s-triage-robot commented Feb 18, 2024

dominic-p commented Feb 19, 2024

apricote commented Feb 20, 2024

dominic-p commented Feb 20, 2024

k8s-triage-robot commented May 20, 2024

dominic-p commented May 20, 2024

k8s-triage-robot commented Jun 19, 2024

dominic-p commented Jun 19, 2024

apricote commented Jun 28, 2024

dominic-p commented Jul 3, 2024

k8s-triage-robot commented Aug 2, 2024

k8s-ci-robot commented Aug 2, 2024

dominic-p commented Oct 23, 2023 •

edited

Loading

apricote commented Nov 20, 2023 •

edited

Loading