Skip to content

Commit

Permalink
Add initial command line spec
Browse files Browse the repository at this point in the history
# Commands

## create

The --bundle [start-pr-bundle] and --pid-file options and ID argument
[runc-start-id] match runC's interface.

One benefit of the early-exit 'create' is that the exit code does not
conflate container process exits with "failed to setup the sandbox"
exits.  We can take advantage of that and use non-zero 'create' exits
to allow stderr writing (so the runtime can log errors while dying
without having to successfully connect to syslog or some such).
Trevor still likes the long-running 'create' API because it makes
collecting the exit code easier, see the entry under rejected-for-now
avenues at the end of this commit message.

### --pid-file

You can get the PID by calling 'state' [container-pid-from-state], and
container PIDs may not be portable [container-pid-not-portable].  But
it's a common way for interfacing with init systems like systemd
[systemd-pid], and for this first pass at the command line API folks
are ok with some Linux-centrism [linux-centric].

### Document LISTEN_FDS for passing open file descriptors

This landed in runC with [runc-listen-fds], but the bundle-author <->
runtime specs explicitly avoided talking about how this is set (since
the bundle-author didn't care about the runtime-caller <-> runtime
interface) [runtime-spec-caller-api-agnostic].  This commit steps away
from that agnosticism.

Trevor left LISTEN_PID [sd_listen_fds,listen-fds-description] out,
since he doesn't see how the runtime-caller would choose anything
other than 1 for its value.  It seems like something that a process
would have to set for itself (because guessing the PID of a child
before spawning it seems racy ;).  In any event, the runC
implementation seems to set this to 1 regardless of what systemd
passes to it [listen-fds-description].

We've borrowed Shishir's wording for the example
[listen-fds-description].

## state [state-pr]

Partially catch up with opencontainers/runtime-spec@7117ede7 (Expand
on the definition of our ops, 2015-10-13,
opencontainers#225, v0.4.0).  The state example is
adapted from runtime.md, but we defer the actual specification of the
JSON to that file.

The encoding for the output JSON (and all standard-stream activity) is
covered by the "Character encodings" section.  In cases where the
runtime ignores the SHOULD (still technically compliant), RFC 7159
makes encoding detection reasonably straightforward [rfc7159-s8.1].
The obsolete RFC 4627 has some hints as well [rfc4627-s3] (although
these were dropped in RFC 7518 [rfc7518-aA], probably as a result of
removing the constraint that "JSON text" be an object or array
[rfc7518-aA]).  The hints should still apply to the state output,
because we know it will be an object.  If that ends up being too dicey
and we want to certify runtimes that do not respect their
operating-system conventions, we can add an --encoding option later.

## kill

Partially catch up with opencontainers/runtime-spec@be594153 (Split
create and start, 2016-04-01, opencontainers#384).  The
interface is based on POSIX [posix-kill], util-linux
[util-linux-kill], and GNU coreutils [coreutils-kill].

The TERM/KILL requirement is a minimum portability requirement for
soft/hard stops.  Windows lacks POSIX signals [windows-signals], and
currently supports soft stops in Docker with whatever is behind
hcsshim.ShutdownComputeSystem [docker-hcsshim].  The docs we're
landing here explicitly allow that sort of substitution, because we
need to have soft/hard stop on those platforms but *can't* use POSIX
signals.  They borrow wording from
opencontainers/runtime-spec@35b0e9ee (config: Clarify MUST for
platform.os and .arch, 2016-05-19, opencontainers#441) to
recommend runtime authors document the alternative technology so
bundle-authors can prepare (e.g. by installing the equivalent to a
SIGTERM signal handler).

# Command style

Use imperative phrasing for command summaries, to follow the practice
recommended by Python's PEP 257 [pep257-docstring]:

  The docstring is a phrase ending in a period. It prescribes the
  function or method's effect as a command ("Do this", "Return that"),
  not as a description; e.g. don't write "Returns the pathname ...".

The commands have the following layout:

  ### {command name}

  {one-line description}

  * *Options:* ...
  ...
  * *Exit code:* ...

  {additional notes}

  #### Example

  {example}

The four-space list indents follow opencontainers/runtime-spec@7795661
(runtime.md: Fix sub-bullet indentation, 2016-06-08,
opencontainers#495).  From [markdown-syntax]:

  List items may consist of multiple paragraphs.  Each subsequent
  paragraph in a list item must be indented by either 4 spaces or one
  tab...

Trevor expects that's intended to be read with "block element" instead
of "paragraph", in which case it applies to nested lists too.

And while GitHub supports two-space indents [github-lists]:

  You can create nested lists by indenting lines by two spaces.

it seems that pandoc does not.

# Versioning

The command-line interface is largely orthogonal to the config format,
and config authors and runtime callers may be entirely different sets
of people.  Zhang Wei called for more explicit versioning for the CLI
[interface-versioning], and the approach taken here follows the
approach taken by Python's email package [python-email-version].

Wedging multiple, independently versioned entities into a single
repository can be awkward, but earlier proposals to put the CLI in its
own repository [separate-repository-proposed] were unsuccessful
because compliance testing requires both a CLI and a config
specification [separate-repository-refused].  Trevor doesn't think
that's a solid reason [separate-repository-refusal-rebutted], but
discussion along that line stalled out, so the approach taken here is
to keep both independently versioned entities in the same repository.

# Global options

This section is intended to allow runtimes to extend the command line
API with additional options and commands as they see fit without
interfering with the commands and options specified in this document.
The last line in this section makes it explicit that any later
specification (e.g. "MUST print the state JSON to its stdout") do not
apply to cases where the caller has included an unspecified option or
command (e.g. --format=protobuf).  For extensive discussion on this
point see [extensions-unspecified].

With regard to the statement "Command names MUST NOT start with
hyphens", the rationale behind this decision is to distinguish
unrecognized commands from unrecognized options
[distinguish-unrecognized-commands] because we want to allow (but not
require) runtimes to fail fast when faced with an unrecognized
command [optional-fail-fast].

# Long options

Use GNU-style long options to avoid ambiguous, one-character options
in the spec, while still allowing the runtime to support one-character
options with packing.  We don't specify one-character options in this
spec, because portable callers can use the long form, and not
specifying short forms leaves runtimes free to assign those as they
see fit.

# Character encodings

Punt to the operating system for character encodings.  Without this,
the character set for the state JSON or other command output seemed
too ambiguous.

Trevor wishes there were cleaner references for the
{language}.{encoding} locales like en_US.UTF-8 and UTF-8.  But
[wikipedia-utf-8,wikipedia-posix-locale] seems too glib, and he can't
find a more targetted UTF-8 link than just dropping folks into a
Unicode chapter (which is what [wikipedia-utf-8] does):

  The Unicode Standard, Version 6.0, §3.9 D92, §3.10 D95 (2011)

With the current v8.0 (2015-06-17), it's still §3.9 D92 and §3.10 D95.

The TR35 link is for:

  In addition, POSIX locales may also specify the character encoding,
  which requires the data to be transformed into that target encoding.

and the POSIX §6.2 link is for:

  In other locales, the presence, meaning, and representation of any
  additional characters are locale-specific.

# Standard streams

The "MUST NOT attempt to read from its stdin" means a generic caller
can safely exec the command with a closed or null stdin, and not have
to worry about the command blocking or crashing because of that.  The
stdout spec for start/delete is more lenient, because runtimes are
unlikely to change their behavior because they are unable to write to
stdout.  If this assumption proves troublesome, we may have to tighten
it up later.

Aleksa Sarai also raised concerns over the safety of potentially
giving the container process access to terminal ioctl escapes
[stdio-ioctl] and feels like the stdio file-descriptor pass-through is
surprising [stdio-surprise].

# Console socket protocol

Based on in-flight work by Aleksa in opencontainers/runc#1018, this
commit makes the following choices:

* SOCK_SEQPACKET instead of SOCK_STREAM, because this is a
  message-based protocol, so it seems more natural to use a
  message-oriented socket type.

* A string 'type' field for all messages, so we can add additional
  message types in the future without breaking backwards compatibility
  (new console-socket servers will still support old clients).  Aleksa
  favored splitting this identifier into an integer 'type' and
  'version' fields [runc-socket-type-version], but I don't see the
  point if they're both opaque integers without internal structure.
  And I expect this protocol to be stable enough that it's not worth
  involving SemVer and its structured versioning.

* Response messages, so the client can tell whether the request was
  received and processed successfully or not.  That gives the client a
  way to bail out if, for example, the server does not support the
  'terminal' message type.

* Add a sub-package specs-go/socket.  Even though there aren't many
  new types, these are fairly different from the rest of specs-go and
  that namespace was getting crowded.

# Event triggers

The "Callers MAY block..." wording is going to be hard to enforce, but
with the runC model, clients rely on the command exits to trigger
post-create and post-start activity.  The longer the runtime hangs
around after completing its action, the laggier those triggers will
be.

For an alternative event trigger approach, see the discussion of an
'event' command in the rejected-for-now avenues at the end of this
commit message.

# Lifecycle notes

These aren't documented in the current runtime-spec, and may no longer
be true.  But they were true at one point, and informed the
development of this specification.

## Process cleanup

On IRC on 2015-09-15 (with PDT timestamps):

  10:56 < crosbymichael> if the main process dies in the container,
    all other process are killed
  ...
  10:58 < julz> crosbymichael: I'm assuming what you mean is you kill
    everything in the cgroup -> everything in the container dies?
  10:58 < crosbymichael> julz: yes, that is how its implemented
  ...
  10:59 < crosbymichael> julz: we actually freeze first, send the
    KILL, then unfreeze so we don't have races

## Container IDs for namespace joiners

You can create a config that adds no isolation vs. the runtime
namespace or completely joins another set of existing namespaces.  It
seems odd to call that a new "container", but the ID is really more of
a process ID, and less of a container ID.  The "container" phrasing is
just a useful hint that there might be some isolation going on.  And
we're always creating a new "container process" with 'create'.

# Other changes

This commit also moves the file-descriptor docs from runtime-linux.md
into runtime.md and the command-line docs.  Both affect runtime
authors, but:

* The runtime.md entry is more useful for bundle authors than the old
  wording, because it gives them confidence that the runtime caller
  will have the power to set these up as they see fit (within POSIX's
  limits).  It is also API-agnostic, so bundle authors know they won't
  have to worry about which API will be used to launch the container
  before deciding whether it is safe to rely on runtime-caller
  file-descriptor control.

* The command line entry is more useful for runtime-callers than the
  old wording, because it tells you how to setup the file descriptors
  instead of just telling you that they MAY be setup.

I moved the bundle-author language from runtime-linux.md to runtime.md
because it's relying on POSIX primitives that aren't Linux-specific.

# Avenues pursued but rejected (for now)

* Early versions of this specification had 'start' taking '--config'
  and '--runtime', but this commit uses '--bundle' [start-pr-bundle].

  The single config file change [single-config-proposal] went through,
  but Trevor would also like to be able to pipe a config into the
  'funC start' command (e.g. via a /dev/fd/3 pseudo-filesystem path)
  [runc-config-via-stdin], and he has a working example that supports
  this without difficulty [ccon-config-via-stdin].  But since
  [runc-bundle-option] landed on 2015-11-16, runC has replaced their
  --config-file and --runtime-file flags with --bundle, and the
  current goal of this API is "keeping as much similarity with the
  existing runC command-line as possible", not "makes sense to Trevor"
  ;).  It looks like runC was reacting [runc-required-config-file] to
  strict wording in the spec [runtime-spec-required-config-file], so
  we might be able to revisit this if/when we lift that restriction.

* Having 'start' (now 'create') take a --state option to write state
  to a file [start-pr-state].  This is my preferred approach to
  sharing container state, since it punts a persistent state registry
  to higher-level tooling [punt-state-registry].  But runtime-spec
  currently requires the runtime to maintain such a registry
  [state-registry], and we don't need two ways to do that ;).

  On systems like Solaris, the kernel maintains a registry of
  container IDs directly, so they don't need an external registry
  [solaris-kernel-state].

* Having 'start' (now 'create') take an --id option instead of a
  required ID argument, and requiring the runtime to generate a unique
  ID if the option was not set.  When there is a long-running host
  process waiting on the container process to perform cleanup, the
  runtime-caller may not need to know the container ID.  However, runC
  has been requiring a user-specified ID since [runc-start-id], and
  this spec follows the early-exit 'create' from [runc-create-start],
  so we require one here.  We can revisit this if we regain a
  long-running 'create' process.

* Having 'create' take a '--console-socket PATH' option (required when
  process.terminal is true) with a path to a SOCK_SEQPACKET Unix
  socket for use with the console-socket protocol.  The current
  'LISTEN_FDS + 3' approach was proposed by Michael Crosby
  [console-socket-fd], but Trevor doesn't have a clear idea of the
  motivation for the change and would have preferred '--console-socket
  FD'.

* Having a long-running 'create' process.  Trevor is not a big fan of
  this early-exit 'create', which requires platform-specific magic to
  collect the container process's exit code.  The ptrace idea in this
  commit is from Mrunal [mrunal-ptrace].

  Trevor has a proposal for an 'event' operation [event] which would
  provide a convenient created trigger.  With 'event' in place, we
  don't need the 'create' process exit to serve as that trigger, and
  could have a long-running 'create' that collects the container
  process exit code using the portable waitid() family.  But the
  consensus after this week's meeting was to table that while we land
  docs for the runC API [mimic-runc].

* Having a 'version' command to make it easy for a caller to report
  which runtime they're using.  But we don't have a use-case that
  makes it strictly necessary for interop, so we're leaving it out for
  now [no-version].

* Using 'sh' syntax highlighting [syntax-highlighting] for the fenced
  code blocks.  The 'sh' keyword comes from [linguist-languages].  But
  the new fenced code blocks are shell sessions, not scripts, and we
  don't want shell-syntax highlighting in the command output.

[ccon-config-via-stdin]: https://github.com/wking/ccon/tree/v0.4.0#configuration
[console-socket-fd]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-10-19-21.00.log.html#l-30
[container-pid-from-state]: https://github.com/opencontainers/runtime-spec/pull/511/files#r70353376
  Subject: Add initial pass at a cmd line spec
[container-pid-not-portable]: opencontainers#459
  Subject: [ Runtime ] Allow for excluding pid from state
[coreutils-kill]: http://www.gnu.org/software/coreutils/manual/html_node/kill-invocation.html
[distinguish-unrecognized-commands]: https://github.com/wking/oci-command-line-api/pull/8/files#r46898167
  Subject: Clarity for commands vs global options
[docker-hcsshim]: https://github.com/docker/docker/pull/16997/files#diff-5d0b72cccc4809455d52aadc62329817R230
  moby/moby@bc503ca8 (Windows: [TP4] docker kill handling,
  2015-10-12, moby/moby#16997)
[event]: opencontainers#508
  Subject: runtime: Add an 'event' operation for subscribing to pushes
[extensions-unspecified]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-27.log.html#t2016-07-27T16:37:56
[github-lists]: https://help.github.com/articles/basic-writing-and-formatting-syntax/#lists
[interface-versioning]: opencontainers#513 (comment)
[linguist-languages]: https://github.com/github/linguist/blob/master/lib/linguist/languages.yml
[linux-centric]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-39
[listen-fds-description]: opencontainers/runc#231 (comment)
  Subject: Systemd integration with runc, for on-demand socket
    activation
[markdown-syntax]: http://daringfireball.net/projects/markdown/syntax#list
[mimic-runc]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-13-17.03.log.html#l-15
[mrunal-ptrace]: http://ircbot.wl.linuxfoundation.org/eavesdrop/%23opencontainers/%23opencontainers.2016-07-13.log.html#t2016-07-13T18:58:54
[no-version]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-75
[optional-fail-fast]: wking/oci-command-line-api@527f3c6#commitcomment-14835617
  Subject: Use RFC 2119's keywords (MUST, MAY, ...)
[pep257-docstring]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
[posix-kill]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
[punt-state-registry]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2015/opencontainers.2015-12-02-18.01.log.html#l-79
[python-email-version]: https://docs.python.org/3/library/email.html#package-history
[rfc4627-s3]: https://tools.ietf.org/html/rfc4627#section-3
[rfc7158-aA]: https://tools.ietf.org/html/rfc7158#appendix-A
  RFC 7518 is currently identical to 7519.
[rfc7159-s8.1]: https://tools.ietf.org/html/rfc7159#section-8.1
[runc-bundle-option]: opencontainers/runc#373
  Subject: adding support for --bundle
[runc-config-via-stdin]: opencontainers/runc#202
  Subject: Can runc take its configuration on stdin?
[runc-listen-fds]: opencontainers/runc#231
  Subject: Systemd integration with runc, for on-demand socket
    activation
[runc-required-config-file]: opencontainers/runc#310 (comment)
  Subject: specifying a spec file on cmd line?
[runc-socket-type-version]: opencontainers/runc#1018 (comment)
  Subject: Consoles, consoles, consoles.
[runc-start-id]: opencontainers/runc#541
  opencontainers/runc@a7278cad (Require container id as arg1,
  2016-02-08, opencontainers/runc#541)
[runtime-spec-caller-api-agnostic]: opencontainers#113 (comment)
  Subject: Add fd section for linux container process
[runtime-spec-required-config-file]: https://github.com/opencontainers/runtime-spec/pull/210/files#diff-8b310563f1c6f616aa98e6aeffc4d394R14
  106ec2d (Cleanup bundle.md, 2015-10-02,
  opencontainers#210)
[sd_listen_fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[separate-repository-proposed]: opencontainers#513 (comment)
[separate-repository-refused]: opencontainers#513 (comment)
[separate-repository-refusal-rebutted]: opencontainers#513 (comment)
[single-config-proposal]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/0QbyJDM9fWY
  Subject: Single, unified config file (i.e. rolling back specs#88)
  Date: Wed, 4 Nov 2015 09:53:20 -0800
  Message-ID: <20151104175320.GC24652@odin.tremily.us>
[solaris-kernel-state]: wking/oci-command-line-api#3 (comment)
  Subject: Drop exec, pause, resume, and signal
[start-pr-bundle]: wking/oci-command-line-api#11
  Subject: start: Change --config and --runtime to --bundle
[start-pr-state]: wking/oci-command-line-api#14
  Subject: start: Add a --state option
[state-pr]: wking/oci-command-line-api#16
  Subject: runtime: Add a 'state' command
[state-registry]: https://github.com/opencontainers/runtime-spec/pull/225/files#diff-b84a8d65d8ed53f4794cd2db7e8ea731R61
  7117ede (Expand on the definition of our ops, 2015-10-13,
  opencontainers#225)
[stdio-ioctl]: opencontainers#513 (comment)
[stdio-surprise]: opencontainers#513 (comment)
[syntax-highlighting]: https://help.github.com/articles/github-flavored-markdown/#syntax-highlighting
[systemd-pid]: http://ircbot.wl.linuxfoundation.org/meetings/opencontainers/2016/opencontainers.2016-07-20-21.03.log.html#l-69
[util-linux-kill]: http://man7.org/linux/man-pages/man1/kill.1.html
[wikipedia-utf-8]: https://en.wikipedia.org/wiki/UTF-8
[wikipedia-posix-locale]: https://en.wikipedia.org/wiki/Locale#POSIX_platforms
[windows-singals]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/PlGKu7QUwLE
  Subject: Fwd: Windows support for OCI stop/signal/kill (runtime-spec#356)
  Date: Thu, 26 May 2016 11:03:29 -0700
  Message-ID: <20160526180329.GL17496@odin.tremily.us>

Signed-off-by: Julian Friedman <julz.friedman@uk.ibm.com>
Hopefully-Signed-off-by: Mike Brown <brownwm@us.ibm.com>
Signed-off-by: W. Trevor King <wking@tremily.us>
Reviewed-by: Jesse Butler <jeeves.butler@gmail.com>
  • Loading branch information
julz authored and wking committed Feb 1, 2017
1 parent 3297cd5 commit d75087c
Show file tree
Hide file tree
Showing 9 changed files with 355 additions and 9 deletions.
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ DOC_FILES := \
bundle.md \
runtime.md \
runtime-linux.md \
command-line-interface.md \
config.md \
config-linux.md \
config-solaris.md \
Expand Down
260 changes: 260 additions & 0 deletions command-line-interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,260 @@
# Runtime Command Line Interface

This section defines the runtime command line interface version 1.0.0.
It is one of potentially several [runtime APIs defined by this specification](README.md).

## Versioning

The command line interface is versioned with [SemVer v2.0.0][semver].
The command line interface version is independent of the OCI Runtime Specification as a whole (which is tied to the [configuration format](config.md#specification-version)).
For example, if a caller is compliant with version 1.1 of the command line interface, they are compatible with all runtimes that support any 1.1 or later release of the command line interface, but are not compatible with a runtime that supports 1.0 and not 1.1.

## Global usage

The [runtime][] MUST provide an executable (called `funC` in the following examples).
That executable MUST support commands with the following template:

```
$ funC [global-options] <COMMAND> [command-specific-options] <command-specific-arguments>
```

## Global options

None are required, but the runtime MAY support options that start with at least one hyphen.
Global options MAY take positional arguments (e.g. `--log-level debug`).
Command names MUST NOT start with hyphens.
The option parsing MUST be such that `funC <COMMAND>` is unambiguously an invocation of `<COMMAND>` (even for commands not specified in this document).
If the runtime is invoked with an unrecognized command, it MUST exit with a nonzero exit code and MAY log a warning to stderr.
Beyond the above rules, the behavior of the runtime in the presence of commands and options not specified in this document is unspecified.

## Character encodings

This API specification does not cover character encodings, but runtimes SHOULD conform to their native operating system.
For example, POSIX systems define [`LANG` and related environment variables][posix-lang] for [declaring][posix-locale-encoding] [locale-specific character encodings][posix-encoding], so a runtime in an `en_US.UTF-8` locale SHOULD write its [state](#state) to stdout in [UTF-8][].

## Commands

### create

[Create][create] a container from a [bundle directory][bundle].

* *Arguments*
* *`<ID>`* Set the container ID to create.
* *Options*
* *`--bundle <PATH>`* Override the path to the [bundle directory][bundle] (defaults to the current working directory).
* *`--pid-file <PATH>`* The runtime MUST write the container PID to this path.
* *Standard streams:*
* If [`process.terminal`][process] is true:
* *stdin:* The runtime MUST NOT attempt to read from its stdin.
* *stdout:* The handling of stdout is unspecified.
* *stderr:* The runtime MAY print diagnostic messages to stderr, and the format for those lines is not specified in this document.
* If [`process.terminal`][process] is not true:
* *stdin:* The runtime MUST pass its stdin file descriptor through to the container process without manipulation or modification.
"Without manipulation or modification" means that the runtime MUST not seek on the file descriptor, or close it, or read or write to it, or [`ioctl`][ioctl.3] it, or perform any other action on it besides passing it through to the container process.

When using a container to drop privileges, note that providing a privileged terminal's file descriptor may allow the container to [execute privileged operations via `TIOCSTI`][TIOCSTI-security] or other [TTY ioctls][tty_ioctl.4].
On Linux, [`TIOCSTI` requires `CAP_SYS_ADMIN`][capabilities.7] unless the target terminal is the caller's [controlling terminal][controlling-terminal].
* *stdout:* The runtime MUST pass its stdout file descriptor through to the container process without manipulation or modification.
* *stderr:* When `create` exists with a zero code, the runtime MUST pass its stderr file descriptor through to the container process without manipulation or modification.
When `create` exits with a non-zero code, the runtime MAY print diagnostic messages to stderr, and the format for those lines is not specified in this document.
* *Environment variables*
* *`LISTEN_FDS`:* The number of file descriptors passed.
For example, `LISTEN_FDS=2` would mean that the runtime MUST pass file descriptors 3 and 4 to the container process (in addition to the standard streams) to support [socket activation][systemd-listen-fds].
* *Additional file descriptors*
* If [`process.terminal`][process] is true, the caller MUST provide an open [`AF_UNIX` socket][unix-socket] on file descriptor `$LISTEN_FDS + 3`.
The runtime MUST pass the [pseudoterminal master][posix_openpt.3] through the socket; the protocol is [described below](#console-socket).
* *Exit code:* Zero if the container was successfully created and non-zero on errors.

Callers MAY block on this command's successful exit to trigger post-create activity.

#### Console socket

The [`AF_UNIX`][unix-socket] used by [`--console-socket`](#create) handles request and response messages between a runtime and server.
The socket type MUST be [`SOCK_SEQPACKET`][socket-types] or [`SOCK_STREAM`][socket-types].
The server MUST send a single response for each runtime request.
The [normal data][socket-queue] ([`msghdr.msg_iov*`][socket.h]) of all messages MUST be [UTF-8][] [JSON](glossary.md#json).

There are [JSON Schemas](schema/README.md) and [Go bindings](specs-go/socket/socket.go) for the messages specified in this section.

##### Requests

All requests MUST contain a **`type`** property whose value MUST one of the following strings:

* `terminal`, if the request is passing a [pseudoterminal master][posix_openpt.3].
When `type` is `terminal`, the request MUST also contain the following properties:

* **`container`** (string, REQUIRED) The container ID, as set by [create](#create).

The message's [ancillary data][socket-queue] (`msg_control*`) MUST contain at least one [`cmsghdr`][socket.h]).
The first `cmsghdr` MUST have:

* `cmsg_type` set to [`SOL_SOCKET`][socket.h],
* `cmsg_level` set to [`SCM_RIGHTS`][socket.h],
* `cmsg_len` greater than or equal to `CMSG_LEN(sizeof(int))`, and
* `((int*)CMSG_DATA(cmsg))[0]` set to the pseudoterminal master file descriptor.

##### Responses

All responses MUST contain a **`type`** property whose value MUST one of the following strings:

* `success`, if the request was successfully processed.
* `error`, if the request was not successfully processed.

In addition, responses MAY contain any of the following properties:

* **`message`** (string, OPTIONAL) A phrase describing the response.

#### Example

```
# in a bundle directory with a process that echos "hello" and exits 42
$ test -t 1 && echo 'stdout is a terminal'
stdout is a terminal
$ funC create hello-1 <&- >stdout 2>stderr
$ echo $?
0
$ wc stdout
0 0 0 stdout
$ funC start hello-1
$ echo $?
0
$ cat stdout
hello
$ block-on-exit-and-collect-exit-code hello-1
$ echo $?
42
$ funC delete hello-1
$ echo $?
0
```

#### Container process exit

The [example's](#example) `block-on-exit-and-collect-exit-code` is platform-specific and is not specified in this document.
On Linux, it might involve an ancestor process which had set [`PR_SET_CHILD_SUBREAPER`][prctl.2] and collected the container PID [from the state][state], or a process that was [ptracing][ptrace.2] the container process for [`exit_group`][exit_group.2], although both of those race against the container process exiting before the watcher is monitoring.

### start

[Start][start] the user-specified code from [`process`][process].

* *Arguments*
* *`<ID>`* The container to start.
* *Standard streams:*
* *stdin:* The runtime MUST NOT attempt to read from its stdin.
* *stdout:* The handling of stdout is unspecified.
* *stderr:* The runtime MAY print diagnostic messages to stderr, and the format for those lines is not specified in this document.
* *Exit code:* Zero if the container was successfully started and non-zero on errors.

Callers MAY block on this command's successful exit to trigger post-start activity.

See [create](#example) for an example.

### state

[Request][state-request] the container [state][state].

* *Arguments*
* *`<ID>`* The container whose state is being requested.
* *Standard streams:*
* *stdin:* The runtime MUST NOT attempt to read from its stdin.
* *stdout:* The runtime MUST print the [state JSON][state] to its stdout.
* *stderr:* The runtime MAY print diagnostic messages to stderr, and the format for those lines is not specified in this document.
* *Exit code:* Zero if the state was successfully written to stdout and non-zero on errors.

#### Example

```
# in a bundle directory with a process that sleeps for several seconds
$ funC start --id sleeper-1 &
$ funC state sleeper-1
{
"ociVersion": "1.0.0-rc1",
"id": "sleeper-1",
"status": "running",
"pid": 4422,
"bundlePath": "/containers/sleeper",
"annotations" {
"myKey": "myValue"
}
}
$ echo $?
0
```

### kill

[Send a signal][kill] to the container process.

* *Arguments*
* *`<ID>`* The container being signaled.
* *Options*
* *`--signal <SIGNAL>`* The signal to send (defaults to `TERM`).
The runtime MUST support `TERM` and `KILL` signals with [the POSIX semantics][posix-signals].
The runtime MAY support additional signal names.
On platforms that support [POSIX signals][posix-signals], the runtime MUST implement this command using POSIX signals.
On platforms that do not support POSIX signals, the runtime MAY implement this command with alternative technology as long as `TERM` and `KILL` retain their POSIX semantics.
Runtime authors on non-POSIX platforms SHOULD submit documentation for their TERM implementation to this specificiation, so runtime callers can configure the container process to gracefully handle the signals.
* *Standard streams:*
* *stdin:* The runtime MUST NOT attempt to read from its stdin.
* *stdout:* The handling of stdout is unspecified.
* *stderr:* The runtime MAY print diagnostic messages to stderr, and the format for those lines is not specified in this document.
* *Exit code:* Zero if the signal was successfully sent to the container process and non-zero on errors.
Successfully sent does not mean that the signal was successfully received or handled by the container process.

#### Example

```
# in a bundle directory with a process ignores TERM
$ funC start --id sleeper-1 &
$ funC kill sleeper-1
$ echo $?
0
$ funC kill --signal KILL sleeper-1
$ echo $?
0
```

### delete

[Release](#delete) container resources after the container process has exited.

* *Arguments*
* *`<ID>`* The container to delete.
* *Standard streams:*
* *stdin:* The runtime MUST NOT attempt to read from its stdin.
* *stdout:* The handling of stdout is unspecified.
* *stderr:* The runtime MAY print diagnostic messages to stderr, and the format for those lines is not specified in this document.
* *Exit code:* Zero if the container was successfully deleted and non-zero on errors.

See [create](#example) for an example.

[bundle]: bundle.md
[controlling-terminal]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap11.html#tag_11_01_03
[create]: runtime.md#create
[delete]: runtime.md#delete
[exit_group.2]: http://man7.org/linux/man-pages/man2/exit_group.2.html
[ioctl.3]: http://pubs.opengroup.org/onlinepubs/9699919799/
[kill]: runtime.md#kill
[kill.2]: http://man7.org/linux/man-pages/man2/kill.2.html
[process]: config.md#process-configuration
[posix-encoding]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_02
[posix-lang]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
[posix-locale-encoding]: http://www.unicode.org/reports/tr35/#Bundle_vs_Item_Lookup
[posix_openpt.3]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_openpt.html
[posix-signals]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html#tag_13_42_03
[prctl.2]: http://man7.org/linux/man-pages/man2/prctl.2.html
[ptrace.2]: http://man7.org/linux/man-pages/man2/ptrace.2.html
[runtime]: glossary.md#runtime
[semver]: http://semver.org/spec/v2.0.0.html
[socket-queue]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_10_11
[socket-types]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_10_06
[socket.h]: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html
[standard-streams]: https://github.com/opencontainers/specs/blob/v0.1.1/runtime-linux.md#file-descriptors
[start]: runtime.md#start
[state]: runtime.md#state
[state-request]: runtime.md#query-state
[systemd-listen-fds]: http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[TIOCSTI-security]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=628843
[tty_ioctl.4]: http://man7.org/linux/man-pages/man4/tty_ioctl.4.html
[unix-socket]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_10_17
[UTF-8]: http://www.unicode.org/versions/Unicode8.0.0/ch03.pdf
6 changes: 0 additions & 6 deletions runtime-linux.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,5 @@
# <a name="linuxRuntime" />Linux Runtime

## <a name="runtimeLinuxFileDescriptors" />File descriptors

By default, only the `stdin`, `stdout` and `stderr` file descriptors are kept open for the application by the runtime.
The runtime MAY pass additional file descriptors to the application to support features such as [socket activation](http://0pointer.de/blog/projects/socket-activated-containers.html).
Some of the file descriptors MAY be redirected to `/dev/null` even though they are open.

## <a name="runtimeLinuxDevSymbolicLinks" /> Dev symbolic links

After the container has `/proc` mounted, the following standard symlinks MUST be setup within `/dev/` for the IO.
Expand Down
11 changes: 11 additions & 0 deletions runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,14 @@ Once a container is deleted its ID MAY be used by a subsequent container.
## Hooks
Many of the operations specified in this specification have "hooks" that allow for additional actions to be taken before or after each operation.
See [runtime configuration for hooks](./config.md#hooks) for more information.

## Container environment

The following sections discuss portions of the container environment which are not configurable via the [configuration JSON](config.md).
Beyond the sections below, there are additional platform-specific requirements for [Linux](runtime-linux.md).

### Standard streams

All OCI-specified runtime APIs MUST allow runtime-callers to specify the file descriptors which will be inherited by the container process, with the [POSIX *exec* caveat][exec] that if file descriptor 0, 1, or 2 would otherwise be closed, the container process may have unspecified files opened for those file descriptors.

[exec]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.html
2 changes: 2 additions & 0 deletions schema/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The layout of the files is as follows:
* [config-linux.json](config-linux.json) - the [Linux-specific configuration sub-structure](../config-linux.md)
* [config-solaris.json](config-solaris.json) - the [Solaris-specific configuration sub-structure](../config-solaris.md)
* [config-windows.json](config-windows.json) - the [Windows-specific configuration sub-structure](../config-windows.md)
* [socket-terminal-request-schema.json](socket-terminal-request-schema.json) - the primary entrypoint for the [`terminal` request](../command-line-interface.md#requests)
* [socket-response-schema.json](socket-response-schema.json) - the primary entrypoint for the [`success` and `error` responses](../command-line-interface.md#responses)
* [state-schema.json](state-schema.json) - the primary entrypoint for the [state JSON](../runtime.md#state) schema
* [defs.json](defs.json) - definitions for general types
* [defs-linux.json](defs-linux.json) - definitions for Linux-specific types
Expand Down
24 changes: 24 additions & 0 deletions schema/socket-response-schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "Open Container Runtime Socket Response Schema",
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "https://opencontainers.org/schema/runtime/socket/response",
"type": "object",
"properties": {
"type": {
"id": "https://opencontainers.org/schema/runtime/socket/response/type",
"type": "string",
"enum": [
"success",
"error"
]
},
"message": {
"id": "https://opencontainers.org/schema/runtime/socket/response/message",
"description": "A phrase describing the response.",
"type": "string"
}
},
"required": [
"type"
]
}
24 changes: 24 additions & 0 deletions schema/socket-terminal-request-schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"description": "Open Container Runtime Socket Terminal Request Schema",
"$schema": "http://json-schema.org/draft-04/schema#",
"id": "https://opencontainers.org/schema/runtime/socket/terminal-request",
"type": "object",
"properties": {
"type": {
"id": "https://opencontainers.org/schema/runtime/socket/terminal-request/type",
"type": "string",
"enum": [
"terminal"
]
},
"container": {
"id": "https://opencontainers.org/schema/runtime/state/id",
"description": "the container's ID",
"type": "string"
}
},
"required": [
"type",
"container"
]
}
13 changes: 10 additions & 3 deletions spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,16 @@

The [Open Container Initiative](http://www.opencontainers.org/) develops specifications for standards on Operating System process and application containers.


Protocols defined by this specification are:
* Linux containers: [runtime.md](runtime.md), [config.md](config.md), [config-linux.md](config-linux.md), and [runtime-linux.md](runtime-linux.md).
* Solaris containers: [runtime.md](runtime.md), [config.md](config.md), and [config-solaris.md](config-solaris.md).
* Windows containers: [runtime.md](runtime.md), [config.md](config.md), and [config-windows.md](config-windows.md).

* Linux containers: [runtime.md](runtime.md), [config.md](config.md), [config-linux.md](config-linux.md), [runtime-linux.md](runtime-linux.md), and at least one API.
* Solaris containers: [runtime.md](runtime.md), [config.md](config.md), [config-solaris.md](config-solaris.md), and at least one API.
* Windows containers: [runtime.md](runtime.md), [config.md](config.md), [config-windows.md](config-windows.md), and at least one API.

APIs defined by this specification are:

* The [command line API](command-line-interface.md).

# Table of Contents

Expand All @@ -15,6 +21,7 @@ Protocols defined by this specification are:
- [Filesystem Bundle](bundle.md)
- [Runtime and Lifecycle](runtime.md)
- [Linux-specific Runtime and Lifecycle](runtime-linux.md)
- [Runtime Command Line Interface](command-line-interface.md)
- [Configuration](config.md)
- [Linux-specific Configuration](config-linux.md)
- [Solaris-specific Configuration](config-solaris.md)
Expand Down
Loading

0 comments on commit d75087c

Please sign in to comment.