Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1beta1 Laundry List #1327

Closed
ellistarn opened this issue Feb 11, 2022 · 58 comments
Closed

v1beta1 Laundry List #1327

ellistarn opened this issue Feb 11, 2022 · 58 comments
Assignees
Labels
feature New feature or request roadmap We're thinking about next steps v1 Issues requiring resolution by the v1 milestone

Comments

@ellistarn
Copy link
Contributor

ellistarn commented Feb 11, 2022

Tell us about your request
This issue contains a list of breaking API changes for v1beta1.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@ellistarn ellistarn added the feature New feature or request label Feb 11, 2022
@ellistarn
Copy link
Contributor Author

ellistarn commented Feb 11, 2022

Fix casing on capacity type to conform w/ k8s value casing:
on-demand -> OnDemand
spot -> Spot

#1323 (comment)

@ellistarn
Copy link
Contributor Author

ellistarn commented Feb 11, 2022

I'd like to explore separating the spec.provider into a separate vendor specific CRD and using something like spec.providerRef.

@ellistarn
Copy link
Contributor Author

I'd like to rename spec.kubeletConfiguration to spec.kubelet

@ellistarn
Copy link
Contributor Author

I'd like to explore creating a decoupled Deprovisioner resource with spec.selector that encapsulates spec.ttlSecondsAfterEmpty, spec.ttlSecondsAfterExpired, and #1091.

@felix-zhe-huang
Copy link
Contributor

I'd like to explore adding spec.preferenceWeight to indicate the preference of which provisioner should be selected when multiple are compatible to the pod spec.

@ellistarn
Copy link
Contributor Author

I'd like to explore adding spec.preferenceWeight to indicate the preference of which provisioner should be selected when multiple are compatible to the pod spec.

Check out the related discussion here: #783

@felix-zhe-huang
Copy link
Contributor

felix-zhe-huang commented Feb 16, 2022

Another thought. Remove provisioner spec.labels. Everything will be described as spec.requirements. When Karpenter launches new nodes, for each requirement, a label will be added to the node with a valid value from the combined requirements (provisioner + pods).

@robertd
Copy link
Contributor

robertd commented Feb 18, 2022

Thanks for the transparency ... I'll definitely be monitoring this issue for changes that need to be applied to the cdk-karpenter: https://www.npmjs.com/package/cdk-karpenter 👍

@njtran njtran added the roadmap We're thinking about next steps label Feb 28, 2022
@ellistarn
Copy link
Contributor Author

Remove kubeletconfiguration.clusterdnsip in favor of discovery #1241 (comment)

@stevehipwell
Copy link
Contributor

I'd like to explore separating the spec.provider into a separate vendor specific CRD and using something like spec.providerRef.

Like IngressClassParams?

We set these options via LTs currently (good defaults could mitigate some of them), I'm happy to open separate issues for any of these which warrant it.

  • spec.kubelet.maxPods
  • spec.kubelet.kubeletReserved
  • Registry credentials (set in /var/lib/kubelet/config.json for AL2 and in TOML as of last BottleRocket release)
  • AWS provider container runtime for AL2

I've listed a couple of items above related to the ones already listed, but I think there might need to be some additional thought given to generic K8s settings vs provider settings. For example I'd suggest that spec.kubelet.maxPods should default to 110 and spec.kubelet.kubeletReserved should default to a dynamic calculation (the GKE one) but be overridable; the AWS provider might override these defaults based on it's settings (constraining down set values might also be desired).

@vumdao
Copy link

vumdao commented Apr 1, 2022

I'd like to add detailed property for provider in CRD https://github.com/aws/karpenter/blob/main/charts/karpenter/crds/karpenter.sh_provisioners.yaml so that I can import it into CDK8S API

cdk8s import https://raw.githubusercontent.com/aws/karpenter/main/charts/karpenter/crds/karpenter.sh_provisioners.yaml

Current property of provider is useless to reference

  /**
   * Provider contains fields specific to your cloudprovider.
   *
   * @schema ProvisionerSpec#provider
   */
  readonly provider?: any;

@ellistarn
Copy link
Contributor Author

@stevehipwell is #1327 (comment) still useful to you? We were curious about what the specific use case is. It's pretty straightforward to implement, but we weren't sure it was still necessary.

@ellistarn
Copy link
Contributor Author

Related: #967

@stevehipwell
Copy link
Contributor

@stevehipwell is #1327 (comment) still useful to you? We were curious about what the specific use case is. It's pretty straightforward to implement, but we weren't sure it was still necessary.

@ellistarn yes this is still very important to our use case.

@stevehipwell
Copy link
Contributor

@ellistarn is the provisioner failure logic documented anywhere? And is the specific failure logic for AWS also documented?

@Cagafuego
Copy link

Cagafuego commented Apr 21, 2022

@stevehipwell is #1327 (comment) still useful to you? We were curious about what the specific use case is. It's pretty straightforward to implement, but we weren't sure it was still necessary.

Just wanted to chime in on this with a use case I have, hope that is ok. We are setting up a multi-architecture cluster that supports both amd64 and arm64. Because of certain requirements we have on our end, we have to use custom launch templates and cannot allow Karpenter to generate them for us. Since we want to use multiple architectures, we then need to have multiple launch templates since the AMI will be different for each. That means we need multiple provisioners. If we have a pending pod that supports both architectures, we would like for Karpenter to prefer the arm64 nodes.

What we would ultimately want would be for Karpenter to select the cheapest option when multiple provisioners match with a pending pod or pods. However, just having a simple priority mechanism would be enough to get us most of the way there.

@stevehipwell
Copy link
Contributor

@Cagafuego I think there is a discussion somewhere with requests to move capabilities out of custom LTs you might want to add your requirements there?

@tzneal
Copy link
Contributor

tzneal commented Apr 21, 2022

Thanks @stevehipwell Yes, there is a PR with a doc at #1691 that talks about what we are looking at exposing to hopefully eliminate the need to use custom LTs. Feel free to add questions/comments there.

Ping @suket22

@stevehipwell
Copy link
Contributor

@tzneal I've looked over custom-user-data-and-amis.md and have some feedback.

@ellistarn
Copy link
Contributor Author

ellistarn commented May 23, 2022

I hate to relitigate this, but the syntactic sugar of aws-ids doesn't sit well with me due to the "comma-separated-string" semantic. This creates nested formatting within our yaml, which is tricky to read, write, and parse. I'd like to explore strongly typing this with something like:

securityGroupIDs:
 - ...
 - ...
subnetIDs:
  - ...
  - ...

@stevehipwell
Copy link
Contributor

I hate to relitigate this, but the syntactic sugar of aws-ids doesn't sit well with me due to the "comma-separated-string" semantic. This creates nested formatting within our yaml, which is tricky to read, write, and parse.

@ellistarn I agree with your view on this; I'd also add that the ID should be preferred for AWS resources with an ID. I think the launch template lookup is currently on name only?

@ellistarn
Copy link
Contributor Author

@ellistarn I agree with your view on this; I'd also add that the ID should be preferred for AWS resources with an ID. I think the launch template lookup is currently on name only?

We prefer user controlled identifiers (names, tags) rather than system controlled identifiers (ids) since it removes a serialized creation requirement where each resource needs to know the output of its dependencies (i.e. "wire it up"). This enables static config files and minimizes logic for things like cfn/terraform.

The securitygroup/subnet ID problem emerged from a limitation on shared VPCs where tags could not be read, which broke the selector semantic.

@stevehipwell
Copy link
Contributor

@ellistarn I'm not sure of any benefits to using names, but can see lots of limitations. By providing an ID I'm guaranteed that if something significant changes I'll get an error and not unexpected behaviour. Names are effectively sliding identifiers in AWS so just as you wouldn't use the latest tag for an OCI image in production a resource name shouldn't be used in a desired state system.

In addition to the above infrastructure is likely to be layered, so adding a tag to a subnet for Karpenter isn't going to be possible or even desirable for a lot of people. I personally tried adding Karpenter to an EKS cluster built with a complex Terraform module and not being able to use ID for everything breaks our architecture practices.

@jonathan-innis
Copy link
Contributor

Rename AWSNodeTemplate to NodeTemplate under the api group karpenter.k8s.aws. This allows for k get nodetemplates regardless which cloud you're on, and the API group varies.

You could do this with short names, without having to rename the CRD name

@ellistarn
Copy link
Contributor Author

ellistarn commented Mar 2, 2023

@jonathan-innis, WDYT about the idea of moving from instanceProfile to iamRole for user configuration. Instance Profiles are kind of awkward, since they don't appear in the AWS console, and are specific to EC2 instance launches.

You could imagine that we automatically create an instance profile per AWSNodeTemplate, and attach the corresponding IAMRole to it. Users would specify the IAMRole at the global configmap or Provisioner CRD level.

The permission scope is relatively benign, since creating instance profiles isn't a scary permission, and we already have the PassRole permission required to attach an IAMRole to an InstanceProfile

Related: #3051

@jonathan-innis
Copy link
Contributor

WDYT about the idea of moving from instanceProfile to iamRole for user configuration

Yeah, I think this makes good sense as a usability improvement, considering how many users I have seen with issues around creating and tracking the instance profile.

@sftim
Copy link
Contributor

sftim commented Mar 2, 2023

We can tag the instance profile too. I think that's a helpful thing to do.

I might want to have different roles for different provisioners though. For example, nodes made through the the monkey-of-chaos provisioner are allowed to invoke ec2:SendDiagnosticInterrupt to themselves (and only to themselves). The other nodes aren't allowed to use that API at all.

Does this change limit my options there? I know I could use a DaemonSet, but IRSA gives all the Pods in a DaemonSet the same role - and no node-level identity.

@sftim
Copy link
Contributor

sftim commented Mar 2, 2023

(answering my own query - I think I can have 2 provisioners that use two different AWSNodeTemplates, and all is good)

@sidewinder12s
Copy link
Contributor

Role should be attached to node template. We use one role and do not attach any more permissions than needed due to how painful configuring aws-auth configmap is.

@mbevc1
Copy link
Contributor

mbevc1 commented Mar 3, 2023

@ellistarn yeah instanceProfile is an odd one not being properly reflected actoss API/console 🙄 . Using IAM roles could work nicely and offer better visibility 👍 . But also at the same time, not super bothered by profiles either 🤷

@tzneal
Copy link
Contributor

tzneal commented Mar 14, 2023

Consider defaulting to Bottlerocket if not specified for v1beta1?

@sftim
Copy link
Contributor

sftim commented Mar 14, 2023

If it's feasible, leave the defaulting to be separate from the API - either default at admission time with a mutating admission controller, or have a config option for the controller, something like that.

I'm asking because I'd the API to be especially unopinionated about cloud providers, whilst still giving cluster operators the option.
A Helm chart or whatever can absolutely have an opinion on Bottlerocket; I hope that the API itself does not.

@jonathan-innis
Copy link
Contributor

I hope that the API itself does not

I think the defaulting mechanism would live inside the AWS-specific cloudprovider defaulting logic so that it wouldn't be a default across cloudproviders.

leave the defaulting to be separate from the API

What do you mean by this, you mean don't contain it in the CRD definition?

@jonathan-innis
Copy link
Contributor

Consider pulling out tag-based AMI requirements. The assumption is that there are very few users that are doing this, if any at all and those who want to achieve something similar should be able to do so by creating different provisioners.

@sftim
Copy link
Contributor

sftim commented Mar 15, 2023

What do you mean by this, you mean don't contain it in the CRD definition?

Yep, exactly. APIs are not the best home for opinions.

@jonathan-innis
Copy link
Contributor

APIs are not the best home for opinions

I think in this case, you would probably want defaulting to expose it so that it's more clear what you are using. IMO, it seems more ambiguous if the default isn't propagated through to the api-server by the time the CRD is applied. i.e. having an empty string for AMIFamily and, in code, defaulting to BR

@engedaam
Copy link
Contributor

engedaam commented Apr 4, 2023

Deprecate the following labels in favor of:

  • karpenter.k8s.aws/instance-gpu-name -> karpenter.k8s.aws/instance-accelerator-name
  • karpenter.k8s.aws/instance-gpu-memory -> karpenter.k8s.aws/instance-accelerator-memory
  • karpenter.k8s.aws/instance-gpu-manufacturer -> karpenter.k8s.aws/instance-accelerator-manufacturer
  • karpenter.k8s.aws/instance-gpu-count -> karpenter.k8s.aws/instance-accelerator-count

@billrayburn billrayburn added the v1 Issues requiring resolution by the v1 milestone label May 10, 2023
@ellistarn
Copy link
Contributor Author

Consider renaming karpenter.sh/capacity-type to karpenter.sh/purchase-type or karpenter.sh/purchase-option. Right now capacity-type implies compute platform like ec2, lambda, or fargate.

@sftim
Copy link
Contributor

sftim commented May 14, 2023

Consider renaming karpenter.sh/capacity-type to karpenter.sh/purchase-type or karpenter.sh/purchase-option. Right now capacity-type implies compute platform like ec2, lambda, or fargate.

“purchase” isn't always right either. For example, you might be choosing between capitalalized, on-prem equipment vs. bursted capacity in the cloud.

@jonathan-innis
Copy link
Contributor

Change karpenter.sh/do-not-evict to karpenter.sh/do-not-disrupt. This aligns the concept of "disruption" across Karpenter and is more accurate, since the action that we are actually preventing with this annotation is voluntary disruption.

@ellistarn
Copy link
Contributor Author

Should we deprecate do-not-consolidate as part of kubernetes-sigs/karpenter#753?

@jonathan-innis
Copy link
Contributor

Consider removing the karpenter-global-settings ConfigMap in favor of static deployment configuration in environment variables and CLI arguments.

@ellistarn
Copy link
Contributor Author

Change karpenter.sh/do-not-evict to karpenter.sh/do-not-disrupt. This aligns the concept of "disruption" across Karpenter and is more accurate, since the action that we are actually preventing with this annotation is voluntary disruption.

Worth considering this in the context of Cluster Autoscaler annotations

@sftim
Copy link
Contributor

sftim commented Aug 9, 2023

Scheduling, preemption and eviction are things that happen to Pods; disruption happens to workloads.

The thing you disrupt (or not) is a set of Pods. So, I'd call the label exclude-from-disruption not do-not-disrupt.

@jonathan-innis
Copy link
Contributor

FYI: This laundry list is frozen at this point. We've captured the items that are being tracked in v1beta1 here: https://github.com/aws/karpenter/blob/main/designs/v1beta1-api.md. We'll plan to close this issue when v1beta1 is fully available and then open an issue to start tracking v1 laundry list items 👍

@jonathan-innis
Copy link
Contributor

v0.32.1 is the latest beta version! We've officially launched beta so I'm going to close this issue and make a v1 laundry list now!

@jonathan-innis
Copy link
Contributor

jonathan-innis commented Oct 31, 2023

If you have further breaking changes that you want to see be made in Karpenter, these can be added to the new kubernetes-sigs/karpenter#758 issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request roadmap We're thinking about next steps v1 Issues requiring resolution by the v1 milestone
Projects
None yet
Development

No branches or pull requests