-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Adrian Reber <areber@redhat.com>
- Loading branch information
1 parent
c7d895d
commit 5529fa7
Showing
2 changed files
with
322 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,277 @@ | ||
# KEP-1999: Add checkpoint and restore to the API | ||
|
||
<!-- toc --> | ||
- [Release Signoff Checklist](#release-signoff-checklist) | ||
- [Summary](#summary) | ||
- [Motivation](#motivation) | ||
- [Goals](#goals) | ||
- [Non-Goals](#non-goals) | ||
- [Proposal](#proposal) | ||
- [User Stories (Optional)](#user-stories-optional) | ||
- [Story 1](#story-1) | ||
- [Story 2](#story-2) | ||
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) | ||
- [Risks and Mitigations](#risks-and-mitigations) | ||
- [Design Details](#design-details) | ||
- [Test Plan](#test-plan) | ||
- [Graduation Criteria](#graduation-criteria) | ||
- [Alpha -> Beta Graduation](#alpha---beta-graduation) | ||
- [Beta -> GA Graduation](#beta---ga-graduation) | ||
- [Removing a Deprecated Flag](#removing-a-deprecated-flag) | ||
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) | ||
- [Version Skew Strategy](#version-skew-strategy) | ||
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) | ||
- [Implementation History](#implementation-history) | ||
- [Drawbacks](#drawbacks) | ||
- [Alternatives](#alternatives) | ||
<!-- /toc --> | ||
|
||
## Release Signoff Checklist | ||
|
||
Items marked with (R) are required *prior to targeting to a milestone / release*. | ||
|
||
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) | ||
- [ ] (R) KEP approvers have approved the KEP status as `implementable` | ||
- [ ] (R) Design details are appropriately documented | ||
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input | ||
- [ ] (R) Graduation criteria is in place | ||
- [ ] (R) Production readiness review completed | ||
- [ ] Production readiness review approved | ||
- [ ] "Implementation History" section is up-to-date for milestone | ||
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] | ||
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes | ||
|
||
[kubernetes.io]: https://kubernetes.io/ | ||
[kubernetes/enhancements]: https://git.k8s.io/enhancements | ||
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes | ||
[kubernetes/website]: https://git.k8s.io/website | ||
|
||
## Summary | ||
|
||
This KEP tries to be a first step towards container migration. In its simplest | ||
form container migration is the process of saving the state of a running | ||
container to disk, transferring it to the migration destination and restarting | ||
the container from the saved state. To enable container migration the minimal | ||
primitives *checkpoint* and *restore* are needed independent of all other | ||
things. Therefore this KEP proposes to extend the API to provide a | ||
*checkpoint* and a *restore* interface. | ||
|
||
Discussions about how container migration can look like are not part of this | ||
KEP to keep this simple. | ||
|
||
Additionally, everything mentioned here relies on [CRIU](https://criu.org) and | ||
its integration in runc or crun. In theory it should be independent of the | ||
actual checkpoint/restore tool used by the OCI runtime, but currently CRIU | ||
seems to be the only tool capable of what is needed for container migration. | ||
|
||
## Motivation | ||
|
||
The motivation to write this KEP to add checkpoint and restore to the API | ||
is definitely to come a step closer to container migration. As mentioned | ||
above this explicitly is only about checkpoint and restore to keep it simple. | ||
Container migration is the motivation for this KEP but it is not the goal. | ||
|
||
Container migration can be seen in different container environments. The following | ||
gives an overview of CRIU integration to support container migration. As this | ||
KEP would not be possible without CRIU OpenVZ needs to be mentioned: | ||
|
||
* CRIU has been written to support container migration in OpenVZ, which | ||
ensures that CRIU based migration has been designed with containers | ||
in mind: | ||
* https://wiki.openvz.org/Checkpointing_and_live_migration | ||
|
||
* LXC/LXD also provides the possibility to migrate containers from one | ||
host to another: `lxc move <container> <remote>:<container>` | ||
|
||
In addition to the simplest form of container migration (checkpoint, | ||
transfer, restore) LXD also supports optimizations to decrease the | ||
downtime during migration by using CRIU's pre-copy migration support. | ||
|
||
* https://archive.fosdem.org/2018/schedule/event/containers_optimized_migration/ | ||
* https://lisas.de/~adrian/posts/2017-Dec-06-optimizing-live-container-migration-in-lxd.html | ||
|
||
* Borg uses CRIU to live migrate containers between hosts to free up resources | ||
on hosts which are missing resources under load: | ||
|
||
* [Task Migration at Google Using CRIU](https://www.linuxplumbersconf.org/event/2/contributions/209/) | ||
* [Update on Task Migration at Google Using CRIU](https://linuxplumbersconf.org/event/4/contributions/508/) | ||
|
||
See especially the second presentation for limitations Google is | ||
experiencing using CRIU based container migration in production. | ||
|
||
* Podman supports container migration in its simplest form (checkpoint, | ||
transfer, restore): https://criu.org/Podman | ||
|
||
The reason for mentioning Podman in this list is that adding | ||
checkpoint and restore to Podman can be seen as a preparation for | ||
this KEP. Especially as Podman uses CNI for network configuration | ||
it required changes to CRIU and runc to allow restoring into a | ||
previously set up network namespace instead of letting CRIU handle | ||
the network namespace restore: | ||
|
||
* hxxps://xithub.com/opencontainers/runc/pull/1849 | ||
|
||
This work lead to CRIU's support to allow restoring processes | ||
out of and into PID namespaces: | ||
|
||
* hxxps://xithub.com/checkpoint-restore/criu/pull/1056 | ||
* hxxps://xithub.com/opencontainers/runc/pull/2525 | ||
|
||
As well as the runc support to allow restoring into all kinds of | ||
namespaces: hxxps://xithub.com/opencontainers/runc/pull/2583 | ||
|
||
The support to allow restoring into all kinds of namespaces is | ||
especially important when checkpointing and restoring containers | ||
out of and into pods. | ||
|
||
The motivation is to get closer to container migration by taking the first | ||
step and providing support for simple checkpoint and restore. The motivation | ||
is existing checkpoint/restore/migration support in other container | ||
environments showing that it is useful and production ready. | ||
|
||
### Goals | ||
|
||
The goal of this KEP is to introduce *checkpoint* and *restore* to the API. | ||
|
||
### Non-Goals | ||
|
||
Out of scope of this KEP are high level discussions about how to implement | ||
container migration. This is only about low level primitives to add | ||
*checkpoint* and *restore* to the API. | ||
|
||
|
||
## Proposal | ||
|
||
```patch | ||
--- a/api.proto | ||
+++ b/api.proto | ||
@@ -87,6 +87,11 @@ service RuntimeService { | ||
// Once it returns error, new container log file MUST NOT be created. | ||
rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {} | ||
|
||
+ // CheckpointContainer checkpoints a container | ||
+ rpc CheckpointContainer(CheckpointContainerRequest) returns (CheckpointContainerResponse) {} | ||
+ // RestoreContainer restores a container | ||
+ rpc RestoreContainer(RestoreContainerRequest) returns (RestoreContainerResponse) {} | ||
+ | ||
// ExecSync runs a command in a container synchronously. | ||
rpc ExecSync(ExecSyncRequest) returns (ExecSyncResponse) {} | ||
// Exec prepares a streaming endpoint to execute a command in the container. | ||
@@ -1308,3 +1313,23 @@ message ReopenContainerLogRequest { | ||
|
||
message ReopenContainerLogResponse{ | ||
} | ||
+ | ||
+message CheckpointContainerRequest { | ||
+ // ID of the container to be checkpointed | ||
+ string container_id = 1; | ||
+} | ||
+ | ||
+message CheckpointContainerResponse{ | ||
+} | ||
+ | ||
+message RestoreContainerRequest { | ||
+ // ID of the container to be restored | ||
+ string container_id = 1; | ||
+ // ID of the sandbox into which the container should be restored | ||
+ // If this is not specified the container will be restored | ||
+ // in the sandbox it was checkpointed from. | ||
+ string pod_sandbox_id = 2; | ||
+} | ||
+ | ||
+message RestoreContainerResponse{ | ||
+} | ||
|
||
``` | ||
|
||
There are already draft pull requests opened to show a possible implementation | ||
of this proposal: | ||
|
||
* hxxps://xithub.com/cri-o/cri-o/pull/4199 | ||
* hxxps://xithub.com/kubernetes-sigs/cri-tools/pull/662 | ||
|
||
### User Stories (Optional) | ||
|
||
As this KEP is explicitly not about container migration the user stories | ||
are not mentioning container migration based user stories. If this would | ||
be helpful to understand this KEP better it can be easily added. | ||
|
||
#### Story 1 | ||
|
||
Although containers are supposed to be stateless there are still containers | ||
which either require some time to startup or which have data cached. | ||
If a system is rebooted to update the kernel these containers can be | ||
checkpointed. Once the system has been rebooted into the new kernel the | ||
container can be restored and continue from its previous memory state | ||
without the need to wait for long start up times or to reload cached data | ||
from disk. | ||
|
||
#### Story 2 | ||
|
||
Just as in Story 1 a stateful container needs to be moved to another pod. | ||
Using checkpoint the container can be written to disk and using restore | ||
the container can be restored in another pod while keeping its state and | ||
all data already loaded into memory. | ||
|
||
### Notes/Constraints/Caveats (Optional) | ||
|
||
Not sure, but probably not applicable as it is only an API change at this | ||
stage. | ||
|
||
### Risks and Mitigations | ||
|
||
|
||
Not sure, but probably not applicable as it is only an API change at this | ||
stage. | ||
|
||
## Design Details | ||
|
||
See Proposal. This adds *checkpoint* and *restore* to the API. | ||
For *checkpoint* it has right now only one parameter: the container | ||
For *restore* it has right now two parameters: the container and optionally | ||
the pod into which the container should be restored to. | ||
|
||
### Test Plan | ||
|
||
Not clear yet and not familiar enough with the existing tests, but happy to add | ||
a test if possible. | ||
|
||
### Graduation Criteria | ||
|
||
Not clear yet and not familiar enough with the KEP process. | ||
|
||
#### Alpha -> Beta Graduation | ||
|
||
Not clear yet and not familiar enough with the KEP process. | ||
|
||
#### Beta -> GA Graduation | ||
|
||
Not clear yet and not familiar enough with the KEP process. | ||
|
||
#### Removing a Deprecated Flag | ||
|
||
Probably not required. | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
Probably not required. | ||
|
||
### Version Skew Strategy | ||
|
||
Probably not required. | ||
|
||
## Production Readiness Review Questionnaire | ||
|
||
Not clear yet and not familiar enough with the KEP process. | ||
|
||
## Implementation History | ||
|
||
* 2020-09-16: Initial version of this KEP | ||
|
||
## Drawbacks | ||
|
||
Not sure. | ||
|
||
## Alternatives | ||
|
||
No other approaches considered so far. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
title: KEP Template | ||
kep-number: 1999 | ||
authors: | ||
- "@adrianreber" | ||
owning-sig: sig-node | ||
participating-sigs: | ||
- TBD | ||
status: provisional | ||
creation-date: 2020-09-16 | ||
reviewers: | ||
- TBD | ||
approvers: | ||
- TBD | ||
prr-approvers: | ||
- TBD | ||
see-also: | ||
- TBD | ||
replaces: | ||
- TBD | ||
|
||
# The target maturity stage in the current dev cycle for this KEP. | ||
stage: alpha | ||
|
||
# The most recent milestone for which work toward delivery of this KEP has been | ||
# done. This can be the current (upcoming) milestone, if it is being actively | ||
# worked on. | ||
latest-milestone: "v1.19" | ||
|
||
# The milestone at which this feature was, or is targeted to be, at each stage. | ||
milestone: | ||
alpha: "v1.19" | ||
beta: "v1.20" | ||
stable: "v1.22" | ||
|
||
# The following PRR answers are required at alpha release | ||
# List the feature gate name and the components for which it must be enabled | ||
feature-gates: | ||
- name: TBD | ||
components: | ||
- TBD | ||
disable-supported: true | ||
|
||
# The following PRR answers are required at beta release | ||
metrics: | ||
- TBD |