Skip to content

Commit

Permalink
Add checkpoint-restore KEP
Browse files Browse the repository at this point in the history
Signed-off-by: Adrian Reber <areber@redhat.com>
  • Loading branch information
adrianreber committed Sep 17, 2020
1 parent c7d895d commit 5529fa7
Show file tree
Hide file tree
Showing 2 changed files with 322 additions and 0 deletions.
277 changes: 277 additions & 0 deletions keps/sig-node/1999-checkpoint-restore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
# KEP-1999: Add checkpoint and restore to the API

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [User Stories (Optional)](#user-stories-optional)
- [Story 1](#story-1)
- [Story 2](#story-2)
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
- [Removing a Deprecated Flag](#removing-a-deprecated-flag)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
<!-- /toc -->

## Release Signoff Checklist

Items marked with (R) are required *prior to targeting to a milestone / release*.

- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] (R) Graduation criteria is in place
- [ ] (R) Production readiness review completed
- [ ] Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website

## Summary

This KEP tries to be a first step towards container migration. In its simplest
form container migration is the process of saving the state of a running
container to disk, transferring it to the migration destination and restarting
the container from the saved state. To enable container migration the minimal
primitives *checkpoint* and *restore* are needed independent of all other
things. Therefore this KEP proposes to extend the API to provide a
*checkpoint* and a *restore* interface.

Discussions about how container migration can look like are not part of this
KEP to keep this simple.

Additionally, everything mentioned here relies on [CRIU](https://criu.org) and
its integration in runc or crun. In theory it should be independent of the
actual checkpoint/restore tool used by the OCI runtime, but currently CRIU
seems to be the only tool capable of what is needed for container migration.

## Motivation

The motivation to write this KEP to add checkpoint and restore to the API
is definitely to come a step closer to container migration. As mentioned
above this explicitly is only about checkpoint and restore to keep it simple.
Container migration is the motivation for this KEP but it is not the goal.

Container migration can be seen in different container environments. The following
gives an overview of CRIU integration to support container migration. As this
KEP would not be possible without CRIU OpenVZ needs to be mentioned:

* CRIU has been written to support container migration in OpenVZ, which
ensures that CRIU based migration has been designed with containers
in mind:
* https://wiki.openvz.org/Checkpointing_and_live_migration

* LXC/LXD also provides the possibility to migrate containers from one
host to another: `lxc move <container> <remote>:<container>`

In addition to the simplest form of container migration (checkpoint,
transfer, restore) LXD also supports optimizations to decrease the
downtime during migration by using CRIU's pre-copy migration support.

* https://archive.fosdem.org/2018/schedule/event/containers_optimized_migration/
* https://lisas.de/~adrian/posts/2017-Dec-06-optimizing-live-container-migration-in-lxd.html

* Borg uses CRIU to live migrate containers between hosts to free up resources
on hosts which are missing resources under load:

* [Task Migration at Google Using CRIU](https://www.linuxplumbersconf.org/event/2/contributions/209/)
* [Update on Task Migration at Google Using CRIU](https://linuxplumbersconf.org/event/4/contributions/508/)

See especially the second presentation for limitations Google is
experiencing using CRIU based container migration in production.

* Podman supports container migration in its simplest form (checkpoint,
transfer, restore): https://criu.org/Podman

The reason for mentioning Podman in this list is that adding
checkpoint and restore to Podman can be seen as a preparation for
this KEP. Especially as Podman uses CNI for network configuration
it required changes to CRIU and runc to allow restoring into a
previously set up network namespace instead of letting CRIU handle
the network namespace restore:

* hxxps://xithub.com/opencontainers/runc/pull/1849

This work lead to CRIU's support to allow restoring processes
out of and into PID namespaces:

* hxxps://xithub.com/checkpoint-restore/criu/pull/1056
* hxxps://xithub.com/opencontainers/runc/pull/2525

As well as the runc support to allow restoring into all kinds of
namespaces: hxxps://xithub.com/opencontainers/runc/pull/2583

The support to allow restoring into all kinds of namespaces is
especially important when checkpointing and restoring containers
out of and into pods.

The motivation is to get closer to container migration by taking the first
step and providing support for simple checkpoint and restore. The motivation
is existing checkpoint/restore/migration support in other container
environments showing that it is useful and production ready.

### Goals

The goal of this KEP is to introduce *checkpoint* and *restore* to the API.

### Non-Goals

Out of scope of this KEP are high level discussions about how to implement
container migration. This is only about low level primitives to add
*checkpoint* and *restore* to the API.


## Proposal

```patch
--- a/api.proto
+++ b/api.proto
@@ -87,6 +87,11 @@ service RuntimeService {
// Once it returns error, new container log file MUST NOT be created.
rpc ReopenContainerLog(ReopenContainerLogRequest) returns (ReopenContainerLogResponse) {}

+ // CheckpointContainer checkpoints a container
+ rpc CheckpointContainer(CheckpointContainerRequest) returns (CheckpointContainerResponse) {}
+ // RestoreContainer restores a container
+ rpc RestoreContainer(RestoreContainerRequest) returns (RestoreContainerResponse) {}
+
// ExecSync runs a command in a container synchronously.
rpc ExecSync(ExecSyncRequest) returns (ExecSyncResponse) {}
// Exec prepares a streaming endpoint to execute a command in the container.
@@ -1308,3 +1313,23 @@ message ReopenContainerLogRequest {

message ReopenContainerLogResponse{
}
+
+message CheckpointContainerRequest {
+ // ID of the container to be checkpointed
+ string container_id = 1;
+}
+
+message CheckpointContainerResponse{
+}
+
+message RestoreContainerRequest {
+ // ID of the container to be restored
+ string container_id = 1;
+ // ID of the sandbox into which the container should be restored
+ // If this is not specified the container will be restored
+ // in the sandbox it was checkpointed from.
+ string pod_sandbox_id = 2;
+}
+
+message RestoreContainerResponse{
+}

```

There are already draft pull requests opened to show a possible implementation
of this proposal:

* hxxps://xithub.com/cri-o/cri-o/pull/4199
* hxxps://xithub.com/kubernetes-sigs/cri-tools/pull/662

### User Stories (Optional)

As this KEP is explicitly not about container migration the user stories
are not mentioning container migration based user stories. If this would
be helpful to understand this KEP better it can be easily added.

#### Story 1

Although containers are supposed to be stateless there are still containers
which either require some time to startup or which have data cached.
If a system is rebooted to update the kernel these containers can be
checkpointed. Once the system has been rebooted into the new kernel the
container can be restored and continue from its previous memory state
without the need to wait for long start up times or to reload cached data
from disk.

#### Story 2

Just as in Story 1 a stateful container needs to be moved to another pod.
Using checkpoint the container can be written to disk and using restore
the container can be restored in another pod while keeping its state and
all data already loaded into memory.

### Notes/Constraints/Caveats (Optional)

Not sure, but probably not applicable as it is only an API change at this
stage.

### Risks and Mitigations


Not sure, but probably not applicable as it is only an API change at this
stage.

## Design Details

See Proposal. This adds *checkpoint* and *restore* to the API.
For *checkpoint* it has right now only one parameter: the container
For *restore* it has right now two parameters: the container and optionally
the pod into which the container should be restored to.

### Test Plan

Not clear yet and not familiar enough with the existing tests, but happy to add
a test if possible.

### Graduation Criteria

Not clear yet and not familiar enough with the KEP process.

#### Alpha -> Beta Graduation

Not clear yet and not familiar enough with the KEP process.

#### Beta -> GA Graduation

Not clear yet and not familiar enough with the KEP process.

#### Removing a Deprecated Flag

Probably not required.

### Upgrade / Downgrade Strategy

Probably not required.

### Version Skew Strategy

Probably not required.

## Production Readiness Review Questionnaire

Not clear yet and not familiar enough with the KEP process.

## Implementation History

* 2020-09-16: Initial version of this KEP

## Drawbacks

Not sure.

## Alternatives

No other approaches considered so far.
45 changes: 45 additions & 0 deletions keps/sig-node/1999-checkpoint-restore/kep.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
title: KEP Template
kep-number: 1999
authors:
- "@adrianreber"
owning-sig: sig-node
participating-sigs:
- TBD
status: provisional
creation-date: 2020-09-16
reviewers:
- TBD
approvers:
- TBD
prr-approvers:
- TBD
see-also:
- TBD
replaces:
- TBD

# The target maturity stage in the current dev cycle for this KEP.
stage: alpha

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.19"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.19"
beta: "v1.20"
stable: "v1.22"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
feature-gates:
- name: TBD
components:
- TBD
disable-supported: true

# The following PRR answers are required at beta release
metrics:
- TBD

0 comments on commit 5529fa7

Please sign in to comment.