Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add Carbon Efficient design document #4686

Closed
wants to merge 33 commits into from
Closed
Changes from 24 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
99c3c18
Add carbon-aware design document
JacobValdemar Sep 25, 2023
9920bda
Update designs/carbon-aware.md
JacobValdemar Sep 25, 2023
4ab9c26
Update designs/carbon-aware.md
JacobValdemar Sep 25, 2023
0ec5f12
Update designs/carbon-aware.md
JacobValdemar Sep 25, 2023
22f665e
Update designs/carbon-aware.md
JacobValdemar Sep 25, 2023
cfa42ce
Add option 3 to consolidation
JacobValdemar Sep 25, 2023
ba614e2
Update designs/carbon-aware.md
JacobValdemar Sep 25, 2023
4d73c8a
Add known limitation to emission data sources
JacobValdemar Sep 25, 2023
8d4304d
Add option 4, considerations, and re-structure doc
JacobValdemar Sep 26, 2023
216f45c
Merge branch 'main' into design/carbon-aware
JacobValdemar Sep 26, 2023
b4d938d
Add comment on launch strategy
JacobValdemar Sep 26, 2023
8b53ecd
Merge branch 'main' into design/carbon-aware
JacobValdemar Sep 28, 2023
f6184c0
Changes to data source
JacobValdemar Sep 28, 2023
54ee1c4
Changes to limitation
JacobValdemar Sep 28, 2023
5c8be31
Merge branch 'main' into design/carbon-aware
JacobValdemar Sep 28, 2023
81779a2
Improve document
JacobValdemar Oct 2, 2023
08364b0
Change featureGate versions
JacobValdemar Oct 2, 2023
3051b00
Merge branch 'main' into design/carbon-aware
JacobValdemar Oct 2, 2023
0a3a170
Fix footnote
JacobValdemar Oct 2, 2023
a56ece2
Fix option 4
JacobValdemar Oct 2, 2023
14657a8
Fix option 4
JacobValdemar Oct 2, 2023
b902dd5
Small fixes
JacobValdemar Oct 2, 2023
6757753
Merge branch 'main' into design/carbon-aware
JacobValdemar Oct 3, 2023
0ffcda7
Small revision
JacobValdemar Oct 3, 2023
a7134dd
Clarify instance lifetime
JacobValdemar Oct 4, 2023
af57352
Update designs/carbon-aware.md
JacobValdemar Nov 23, 2023
6a9948f
Update designs/carbon-aware.md
JacobValdemar Nov 23, 2023
c0cf244
Update designs/carbon-aware.md
JacobValdemar Nov 23, 2023
5465656
Update designs/carbon-aware.md
JacobValdemar Nov 23, 2023
e6038e6
Update designs/carbon-aware.md
JacobValdemar Nov 23, 2023
0ae48ea
Merge branch 'main' into design/carbon-aware
JacobValdemar Nov 23, 2023
653353b
Rephrase from Carbon Aware to Carbon Efficient
JacobValdemar Dec 3, 2023
d74b37b
Update section about BoaviztAPI limitations - discrepancy has been fixed
JacobValdemar Dec 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 197 additions & 0 deletions designs/carbon-aware.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
# Carbon Aware Karpenter: Optimizing Kubernetes Cluster Autoscaling for Environmental Sustainability
*Author: [@JacobValdemar](https://github.com/JacobValdemar)*

## Context & Problem
There is a growing concern about the environmental impact of Kubernetes clusters. Karpenter's opportunities within environmental sustainability is referenced in multiple comments that back [`karpenter-core`'s move to CNCF](https://github.com/kubernetes/org/issues/4258).

I am currently working on my master's thesis in Computer Engineering (Master of Science in Engineering) at Aarhus University located in Denmark. The objective of the thesis is to enable Karpenter to minimize carbon emissions from Kubernetes clusters that run on cloud infrastructure (scoped to AWS).

RFC: https://github.com/aws/karpenter/issues/4630

## Fundamentals of Green Software
I will try to keep it simple. The reader should be familiar with the following.

A cluster's emissions is made of two elements: embodied emissions and operational emissions. To get the total emissions, one can add them togeather.

- **Embodied carbon emissions**: Manufacturing emissions (CO₂e) amortized over instance lifetime (usually 4 years) divided by how long we use the instance

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify that "instance" in this sense refers to a percentage of a physical machine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis Good point. That is totally not obvious when reading the text. When we say instance lifetime we of course mean the lifetime of the physical machine that the instance is part of 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- **Operational carbon emissions**: Carbon emitted by electricity grid to produce electricity for the instance in the region where it is used, multiplied by PUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be different for regions that focus on more sustainable energy sources like hydroelectric or geothermal energy? Or are you forgoing the energy sources, and talking purely about the byproduct of the heat and other leftover elements in electricity grids?


There is a lot more to Green Software. If you want to learn more, I recommend you to visit [Green Software Practitioner](https://learn.greensoftware.foundation/) (a Green Software Foundation project - an affiliate of the Linux Foundation).

## Solution

### Feature Gate
JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved
The feature is proposed to be controlled using a [feature gate](https://karpenter.sh/docs/concepts/settings/#feature-gates).

| **Feature** | **Default** | **Config Key** | **Stage** | **Since** | **Until** |
| :---------: | :---------: | :-----------------------------: | :-------: | :-------------: | :-------: |
| CarbonAware | false | featureGates.carbonAwareEnabled | Alpha | v0.32.0/v0.33.0 | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small detail: we're planning to drop the config key so this feature flag would probably have the same format a-la Kubernetes upstream. If we take the last solution, this feature flag might look like PricingOverridesEnabled=true


### Carbon emissions data source
Currently the best option is to create estimates based on the methodology used in [Boaviztapi](https://github.com/Boavizta/boaviztapi).

[Try out Boaviztapi on the Datavizta demo website](https://datavizta.boavizta.org/cloudimpact).

#### Licensing
Boaviztapi is licensed under [`GNU Affero General Public License v3.0`](https://github.com/Boavizta/boaviztapi/blob/main/LICENSE). Therefore, as far as I know, we must license their data under the same license if used in the Karpenter repository.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be conflicts in licensing their data according to these requirements and the CNCF guidance for licensing the karpenter-core source itself?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis So I am in no way a lawyer, but I guess that there wont be licensing issues. We probably just have to put a different license notice in the top of files that is based on their works. Currently, I would expect the data with that license only to be in cloud provider repos (e.g. https://github.com/aws/karpenter) and not karpenter-core.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am in no way a lawyer, but I guess that there wont be licensing issues

Famous last words.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd definitely need to check with AWS legal for this (while AWS owns it) and consider how this interacts with the CNCF guidance since this would probably live in the karpenter-core repo if it was gen-ed. I could also see this living in its own separate repo that provided configuration plugins to the pricing overrides that we are thinking about here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The particular license that Boaviztapi chooses to use suggests that it's using the "copyleft malware" approach (for Good, not Evil, of course) to ensure that its definitions of freedom (as in both beer and speech) are enforced upon downstream projects. So that's the only reason to bring this up. AWS and CNCF are probably going to combine w/ such a license in their own unique interesting ways. Non-zero chance there will be friction.


#### Limitations
There is a discrepancy between the available instances known to Karpenter and instances know to Boaviztapi. This means that as it is right now, it is not possible to get carbon emissions data for all instances types. This is mostly the case for new instance types such as m7g. Around 290 out of 700 instance types is missing data. See full comparison in [this Gist](https://gist.github.com/JacobValdemar/e1342013c0f5c980126f6a1feb66b4a1).
JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved

I will attempt to eleminate this discrepancy, but it might not be possible. It will probably not always be possible to have an updated list of estimated carbon emissions for all instances as AWS continue to release new instance types. We should consider what to do with instance types that we do not have carbon emission estimates for.
JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved

Approaches to handle this:
JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not exclude instances that are not emission-priced when the carbonAwareEnabled feature gate is enabled? As a user if I were strict about this accounting I would not want the risk of polluting my reckoning with non-determinate inputs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis I agree. I also think an exclusion is the best option in case we can't estimate the emissions accurately. The method I thought about excluding them was to assigning them an absurd high price so they will never be picked voluntarily. However, I want to improve the dataset that we depend on, so that we can use as many instance types as possible. See this issue for reference Boavizta/boaviztapi#232

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal take is that if I were a customer I would prefer not to use something at all that isn't carbon-priced.

  1. I would be biased to suspect that such a thing was more likely than not to be carbon-inefficient (i.e., a previous generation of tech)
  2. I think that exclusion is a better forcing function for incentivizing such things to do the work of calculating cost, and/or marginalizing them from the marketplace (i.e., replacing them w/ newer, more carbon-efficient generations of tech)

1. Estimate extremely high emissions to effectively filter out unknown instance types (recommended)
2. Estimate zero emissions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero is probably wrong. There may be a third option here which would be to do some estimation function. What on? I haven't thought through it that hard; however, I agree with @jackfrancis that the easy way out is to just exclude for now.


### Launch strategy
To enable emission based priotization, the launch strategy should be changed from `lowest-price` to `prioritized`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to do capacity-optimized-prioritized to properly handle spot


### Changes to consolidation (karpenter-core)
Single Machine Consolidation (`singlemachineconsolidation.go`) and Multi Machine Consolidation (`multimachineconsolidation.go`) as well as `consolidation.go` is currently consolidating nodes to reduce costs. We want to change this when Carbon Aware is enabled. They should consolidate to minimize carbon emissions.

### Changes to Provisioning
Currently, provisioning (roughly) filter instances based on requirements, sort instances by price, and launch the cheapest instance. We want to change this when Carbon Aware is enabled. It should sort instances by carbon emissions and launch the instance which has the lowest Global Warming Potential[^1].
Copy link
Contributor

@ellistarn ellistarn Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might we want a balanced approach to this that factors in price and emissions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be interesting, like the carbon tax discussed in option 3, but I think that I something we should consider it something we add later


### Option 1: Use Carbon Aware provisioning and concolidation methods
JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved

#### Consolidation
Create two new consolidation methods `carbonawaresinglemachineconsolidation.go` and `carbonawaremultimachineconsolidation.go` that will be used when Carbon Aware is enabled.

<details>

<summary>Change to `karpenter-core/pkg/controllers/deprovisioning/controller.go`</summary>

```diff
-func NewController(clk clock.Clock, kubeClient client.Client, provisioner *provisioning.Provisioner,
- cp cloudprovider.CloudProvider, recorder events.Recorder, cluster *state.Cluster) *Controller {
+func NewController(ctx context.Context, clk clock.Clock, kubeClient client.Client, provisioner *provisioning.Provisioner,
+ cp cloudprovider.CloudProvider, recorder events.Recorder, cluster *state.Cluster) *Controller {

+ if settings.FromContext(ctx).CarbonAwareEnabled {
+ return &Controller{
+ clock: clk,
+ kubeClient: kubeClient,
+ cluster: cluster,
+ provisioner: provisioner,
+ recorder: recorder,
+ cloudProvider: cp,
+ lastRun: map[string]time.Time{},
+ deprovisioners: []Deprovisioner{
+ NewExpiration(clk, kubeClient, cluster, provisioner, recorder),
+ NewDrift(kubeClient, cluster, provisioner, recorder),
+ NewEmptiness(clk),
+ NewEmptyMachineConsolidation(clk, cluster, kubeClient, provisioner, cp, recorder),
+ NewCarbonAwareMultiMachineConsolidation(clk, cluster, kubeClient, provisioner, cp, recorder),
+ NewCarbonAwareSingleMachineConsolidation(clk, cluster, kubeClient, provisioner, cp, recorder),
+ },
+ }
+ }

return &Controller{
clock: clk,
kubeClient: kubeClient,
cluster: cluster,
provisioner: provisioner,
recorder: recorder,
cloudProvider: cp,
lastRun: map[string]time.Time{},
deprovisioners: []Deprovisioner{
NewExpiration(clk, kubeClient, cluster, provisioner, recorder),
NewDrift(kubeClient, cluster, provisioner, recorder),
NewEmptiness(clk),
NewEmptyMachineConsolidation(clk, cluster, kubeClient, provisioner, cp, recorder),
NewMultiMachineConsolidation(clk, cluster, kubeClient, provisioner, cp, recorder),
NewSingleMachineConsolidation(clk, cluster, kubeClient, provisioner, cp, recorder),
},
}
}
```
</details>

#### Provisioning
In `karpenter-core`, create a new method `types.go/OrderByCarbonEmissions` and use that in `nodeclaimtemplate.go/ToMachine` and `nodeclaimtemplate.go/ToNodeClaim` instead of `types.go/OrderByPrice` when Carbon Aware is enabled.

In `karpenter`, create a new method `CarbonAwareCreate` in `pkg/providers/instance/instance.go` that is used in `pkg/cloudprovider/cloudprovider.go/Create` instead of `pkg/providers/instance/instance.go/Create` when Carbon Aware is enabled.

#### Considerations
1. 👍 Current consolidation methods are unaffected.
1. 👎 There might be copy-paste of code from the original consolidation methods to the carbon aware consolidators.

### Option 2: Use Carbon Aware filtering/sorting methods

#### Consolidation
Create carbon aware implementations of low-level functions like `filterByPrice`, `filterOutSameType`, `getCandidatePrices`, etc. that is used when Carbon Aware is enabled. Usage of aforementioned functions might assume that it is price that they are getting, but in reality it is data about carbon emissions.
JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved

#### Provisioning
Use same changes to provisioning as in [option 1](#option-1-use-carbon-aware-provisioning-and-concolidation-methods).

#### Considerations
1. 👍 Less code copy-paste.
1. 👍 Improvements to original consolidation methods also improve the Carbon Aware feature.
1. 👎 Has a risk of breaking undocumented invariants.
1. 👎 Adds complexity to the original consolidation methods.

### Option 3: Override instance price with carbon price (recommended)
Minimize carbon emissions by defining a price per kgCO₂e and override the instance price with the carbon price (USD/kgCO₂e). Using the `prioritized` launch strategy, carbon emissions will be minimized during provisioning. Consolidation will unknowingly consolidate to minimize carbon emissions.

The carbon price will depend on on `region` and `instanceType` and assume constant resource utilization (e.g. always 80% utilization). The carbon price will be generated in a "hack" and included as consts (same method as used for generating initial pricing[^2]). The carbon price / emission estimates can be updated with new versions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify how "assume constant resource utilization" concept affects carbon price?

Copy link
Author

@JacobValdemar JacobValdemar Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackfrancis This is one place I am constrained in my knowledge about Karpenter, so please correct me if this is incorrect: as far as I know, Karpenter can not provide information about current resource utilization (CPU/RAM) nor about expected utilization for nodes considered to be created from provisioning or consolidation.

To calculate the operational Global Warming Potential (kgCO2e) we must know how much power (W) that an instance ("server") consume. The power consumption is not static, but depends on how large the workload (%) is (e.g. see image). Power consumption is not a linear function of the workload. Therefore, CPU Power Consumption profiles (based on empirical data, aka. server testing) are used to estimate how much power a CPU consume depending on the workload. When calculating the operational carbon emissions we therefore should know what the current/expected workload is to be able to accurately estimate the operational footprint. However, if we do not know the workload (%) we must make an assumption about how large it is to complete the calculation.

I can elaborate further on this in the call today (Wednesday) if you would like, because the calculation is a bit complex to explain in a comment ;) You can read more about the calculation in the Boavizta API (v.dev) docs here.

See an illustration of the concept in this picture (AGPL-3.0, Boavizta API):
billede

JacobValdemar marked this conversation as resolved.
Show resolved Hide resolved

Another feature (added later) can be to add carbon price to instance price to simulate a [carbon tax](https://en.wikipedia.org/wiki/Carbon_tax). Administrators could configure a custom carbon price or use a default.

#### Considerations
1. 👍 Change is constrained to the pricing domain, so most of Karpenter's logic remains unaffected.
1. 👍👍 A simulated carbon tax could be appealing for *Beta* or *General Availability*[^3] as it combines the real price with the carbon price.
1. 👎 Adds complexity to the *price* concept. Price is not *just* price, but rather becomes an optimization function.
1. 👎 Depending on implementation, the `karpenter_cloudprovider_instance_type_price_estimate` metric *may* represent more than just price when Carbon Aware is enabled.

### Option 4: Enable custom instance price overrides
Enable administrators to configure custom instance price overrides, e.g. in a ConfigMap. A configuration using emission factors (varying with region and instance type) masked as prices can be pre-generated. Administrators then copy-paste a Carbon Aware `priceOverride` into their environment.

```yaml
priceOverrides:
- instanceType: "m5.large"
region: "eu-west-1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's some misunderstanding of region/zone going on.

capacityType: OnDemand
price: 0.007712
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd want to lean toward calling this cost now. We are starting to talk more about a "cost-optimization" function vs. a "price-optimization" function, especially if we are going to reason in terms of things that definitely aren't price

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The term price does not concisely express what the variable has/can/will become.

- instanceType: "m5.xlarge"
region: "eu-west-1"
capacityType: OnDemand
price: 0.015424
```

<details>
<summary>Alternative interface</summary>

Alternatively, a more flexible interface could be:
Copy link
Contributor

@ellistarn ellistarn Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oddly, I think of this as more rigid. We risk inventing a DSL for pricing in yaml.


```yaml
priceModification:
operator: Add # Add or Override
modifications:
- instanceType: "m5.large"
region: "eu-west-1"
capacityType: OnDemand
price: 0.007712
- instanceType: "m5.xlarge"
region: "eu-west-1"
capacityType: OnDemand
price: 0.015424
```
</details>

A ConfigMap with price overrides for all combinations of instance types and regions will be very huge. 632 instances * 29 regions = 18,328 pairs. Four lines per pair gives a file with 73,312 lines. The file/configmap will approximately have a size of 2 MB. That exceeds the [`1 MiB` limit on ConfigMap size in Kubernetes](https://kubernetes.io/docs/concepts/configuration/configmap/#motivation).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EC2 launches new instance types regularly, and we should expect this to grow.


#### Considerations
1. 👍 Simple solution.
1. 👍 Can be used for other purposes.
2. 👎👎 ConfigMap cannot contain all data.
3. 👎 Hard to discover the carbon aware "feature".
4. 👎 Carbon emission price cannot be combined with actual price.
5. 👎 Carbon emissions are completely static without possibility to improve it in the future.
7. 👎 Feature can not be enabled as a toggle.
8. 👎 Depending on implementation, the `karpenter_cloudprovider_instance_type_price_estimate` metric *may* represent more than just price when Carbon Aware is enabled.

[^1]: The potential impact of greenhouse gases on global warming. Measured in terms of CO₂e.
[^2]: See [prices_gen.go](/hack/code/prices_gen.go) and [zz_generated.pricing.go](/pkg/providers/pricing/zz_generated.pricing.go)
[^3]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/#feature-stages