-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3000: Image Promotion and Distribution Policy #3079
Conversation
hh
commented
Dec 7, 2021
- One-line PR description: KEP-3000: Artifact Promotion and Distribution Policy
- Issue link: [KEP] Sharing the load of K8s Artifacts distribution between vendors to reduce the cost burden on the Kubernetes community infrastructure #3055
/assign @justaugustus @dims |
|
||
### Non-Goals | ||
|
||
Anything related to creation of artifacts, bom, digital signatures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
artifacts
The title of the KEP implies artifacts are concerned by this KEP. We need more clarification about which type of artifacts are non-goals.
Then the promotion process occurs | ||
``` | ||
|
||
#### Cloud Customer - Installing K8s via kubeadm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to consider non-cloud users ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The greatest percentage of cost comes from cloud customers, so we will focus on them for this KEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Mentioned on Slack first)
@hh -- I've left a few tweaks (which will also fix the presubmits) here: https://github.com/justaugustus/enhancements/tree/MST-3000
/assign @spiffxp |
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there aare more questions I am forgetting to ask
|
||
### Goals | ||
|
||
A policy and procedure for use by SIG Release to promote container images and release binaries to multiple registries and mirrors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should break this into 2 major phases:
- Container images
- Other artifacts
We may even want to break it to 2 KEPs so we can "finish" one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thought! This KEP just focuses on container images now.
|
||
## Design Details | ||
|
||
### Artifact Promotion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document and consider two main approaches:
-
Push. Each mirror provider gives us a mechanism and credentials to push container images. As we promote images from staging to prod, we push to all mirrors. We need to consider credential security and rotation.
-
Pull. We publish a log (git repo?) of image changes and mirrors are expected to sync changes in a reasonable period of time (99p @ 10 mins?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opted into a push based mechanism, where sig-k8s-infra manages the content of the buckets
|
||
### Artifact Distribution | ||
|
||
#### Policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to detail how to on-board a mirror. E.g.:
- provide some guarantees of service (e.g. a contract with CNCF) and point of contact
- provide an emergency contact in case of outage
- provide a mapping of client IPs to mirror (maybe through a git repo)
Then we can add the mirror and have the front-end server start redirecting traffic.
We will want to periodically healthcheck each mirror (e.g. pull a random blob, measure latency). If HC fails, remove mirror until it passes N times. We need a site or something indicating which mirrors are healthy, maybe stats.
We will want to log all redirects and set up a PII-anonymizing process so we can publish some aggregated information about how much traffic is going to each mirror, top images globally, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide some guarantees of service (e.g. a contract with CNCF) and point of contact
Most of the managed services for Container registries have SLAs. We need to agree about minimum level of SLA. We now plan to usage object storage services
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going off of what @ameukam says here, we are working closely with the providers who consume the most to bring up infra that we manage.
|
||
#### Policy | ||
|
||
#### Process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to detail how we turn up the new DNS name and redirector and how we plan to convert users of old GCR name into the new name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Members of sig-k8s-infra have made PRs against projects like kops and Kubernetes to change the defaults.
There's also e2e testing through changing the domain for various jobs running in Prow.
|
||
Anything related to creation of artifacts, bom, digital signatures. | ||
|
||
## Proposal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also document that we're explicitly OK with a model where the management of the mirror is opaque to us as long as the other criteria are met.
approvers: | ||
- "@ameukam" | ||
- "@justaugustus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approvers: | |
- "@ameukam" | |
- "@justaugustus" | |
approvers: | |
- "@ameukam" | |
- "@dims" | |
- "@justaugustus" | |
- "@saschagrunert" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in 6bfecc8
#### Cloud Customer - Installing K8s via kubeadm | ||
|
||
```feature | ||
As a CLOUD end-user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not stick the user story to a cloud environment. We don't want to break the existing way of consuming those container images produced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the most of the spend is only cloud users, this should be fine to focus on for this KEP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled out User Stories for this merge
|
||
Given some compute resources at CLOUD | ||
When I use kubeadm to deploy Kubernetes | ||
Then I will be redirected to a local CLOUD registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be more clear about the meaning of "local":
Then I will be redirected to a local CLOUD registry | |
Then I will be redirected to the closest network endpoint of the registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved in f960d6f
|
||
### How much is this going to save us? | ||
|
||
![Cost of K8s Artifact hosting - Data Studio Graphs](https://i.imgur.com/LAn4UIE.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a link to the full report and not just a screenshot for better transparency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to show AWS involvement without exposing company specific usage patterns more context, in 7030358.
|
||
## Infrastructure Needed | ||
|
||
It would be good to request some donations for some larger providers, including one in China, via [Cloud Native Credits program](https://www.cncf.io/credits/). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should provide a list of what exactly is needed in terms of infrastructure to ensure this KEP go to implementable
.
source: Ben's doc (kubernetes-sigs/oci-proxy/cmd/archeio/docs/request-handling.md)
Co-Authored-By: Riaan Kleinhans <riaan@ii.coop>
Thanks! |
@justaugustus @saschagrunert this is ready (all conversations resolved). Please take a look and approve if appropriate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
Let's merge and iterate thanks @BobyMCbobs /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, hh, saschagrunert The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |