Implement long running interactive session and sending build context incrementally #32677

tonistiigi · 2017-04-18T02:03:49Z

~~depends on #31984~~
fixes #31829

This PR adds a new POST /session endpoint to the daemon that can be used for running interactive long-running protocols between client and the daemon.

Not ready for code review yet, but open for feedback on the design. Currently, this is missing cache handling features and validations for the fs change stream.

Client calls this endpoint and exposes features that daemon can call back to. The first use case for this, also included in this PR, is the implementation to send incremental context changes for the builder. There is work going on to use this for accessing pull credentials. Later it could be used for accessing secrets from the client, transferring build artifacts back to client etc.

When client calls /session endpoint it requests upgrade to a different transport protocol it exposes possible callback features in the request headers. Only possible transport method currently supported is gRPC.

Client API package adds only one new method client.DialSession that is used for requesting a hijacked connection over the HTTP API. The other logic is in separate packages so that api package doesn't add any extra dependencies.

client/session (location tbd) contains the actual code for creating the interactive sessions. This is called by the cli and takes client.DialSession as a dependency. This package also defines the interfaces that need to be implemented to expose new features or new transports on top of the session.

The implementation for the incrementally sending filesystem changes is in client/session/fssession. This uses the fs changes stream algorithm from containerd that is modified to compare files from the client with the files from the previous build instead of comparing rwlayer to rolayer. It currently uses helper package tonistiigi/fsutil for the file checks, this package would likely go away in the future and the responsibility will be split by containerd and continuity repos.

fscache component implements the access point for the filesystem data has been transferred from external sources. After building the build context remains there so it can be used by the next invocations. GC routine of docker prune(tbd) can delete this data any time and would cause the client to resend full context next time.

Testing out:

The feature is behind experimental flag so to test it, daemon needs to be started with --experimental.

To invoke a builder with using the session stream instead of sending the tar context you have to set the --stream flag.

Currently, for testing individual components this will use the session but still transfer the files with a tar archive(so there shouldn't be meaningful performance gain). To use the protocol that can do rsync-like incremental updates, set env variable BUILD_STREAM_PROTOCOL to diffcopy.

# BUILD_STREAM_PROTOCOL=diffcopy docker build --stream .
Streaming build context to Docker daemon  109.3MB
Step 1/2 : FROM busybox
 ---> d9551b4026f0
Step 2/2 : COPY . .
 ---> a4c80257278d
Removing intermediate container 36029a58b3c4
Successfully built a4c80257278d

# BUILD_STREAM_PROTOCOL=diffcopy docker build --stream .
Streaming build context to Docker daemon      55B
Step 1/2 : FROM busybox
 ---> d9551b4026f0
Step 2/2 : COPY . .
 ---> Using cache
 ---> a4c80257278d
Successfully built a4c80257278d

# echo "foobar" > foo
# BUILD_STREAM_PROTOCOL=diffcopy docker build --stream .
Streaming build context to Docker daemon      90B
Step 1/2 : FROM busybox
 ---> d9551b4026f0
Step 2/2 : COPY . .
 ---> d50e63e1fe1d
Removing intermediate container 0561b43df53e
Successfully built d50e63e1fe1d

simonferquel · 2017-04-18T16:28:32Z

As discussed while working on credential pulls, the actual session ID should not be part of the docker remote context identifier: in the future we will want to leverage this session for other stuff than build context, and possibly we will have case where context streaming is off, but other features like credential pulls or copy-out would be required.
You can see a candidate design here: https://github.com/simonferquel/docker/tree/remote-context-credentials-ondemand.

Additionaly, client-side services seem a bit clunky both to declare and to discover. I would really like an api on client like: session.Expose(reflect.TypeOf((*myInterface)(nil)).Elem(), myImplementation) and on the backend, having svc, ok := session.Discover((reflect.TypeOf((*myInterface)(nil)).Elem()) (there might be a better way to get hold on an interface type though). That would certainly require describing shared interface in .proto files, but that would feel both more robust and cleaner than using strings for method names and maps of slices of strings for parameters).

dnephin · 2017-04-20T14:13:03Z

builder/dockerfile/builder.go

@@ -63,52 +70,94 @@ type Builder struct {
 	cacheBusted   bool
 	buildArgs     *buildArgs
 	escapeToken   rune
+	dockerfile    *parser.Result


I think maybe there was a bad merge here? This was just removed

tonistiigi · 2017-04-24T18:20:50Z

@simonferquel @dnephin @cpuguy83 @dmcgowan What's your thoughts on the transport interface for this? I only made the generic interface https://github.com/moby/moby/pull/32677/files#diff-8d234f28fa68cff9ee839315dbeaeb49R51 because I couldn't decide in favor of a single transport(grpc callbacks, websocket, single hijacked grpc stream, ssh etc). If we only support grpc then this could be described more with proto files. I still wouldn't prefer to use grpc generation directly because I think the callbacks should follow the object.method pattern instead of exposing functions. But we could solve it with a proto plugin that wraps grpc code and adds this and also automatically registers the headers for supported methods. With current testing, grpc seems to work quite well. One of the issues is that the proto parser currently does unnecessary allocations https://github.com/grpc/grpc-go/blob/38df39bad1892a821a4feac7f7506a19a13661f2/rpc_util.go#L231-L233 but that could be patched.

I also made another poc implementation using this, for exposing ssh signers to the builder with docker build --ssh . tonistiigi@a175773 Currently missing limiting auth socket to specific keys and forbidding key management requests.

stevvooe · 2017-04-24T18:44:44Z

client/session/grpctransport/grpc.go

+	return gc, nil
+}
+
+func (gc *grpcCaller) Call(ctx context.Context, id, method string, opt map[string][]string) (session.Stream, error) {


This is looking fairly low-level. This is a major risk in depending on grpc this tightly and will couple us with the internal implementation.

Is there any reason that interceptors won't work here?

On API level this doesn't depend on gRPC at all atm(but could change by the comment above). It only depends on the transport interface. The API exposes what method/transport combination is supported for the negotiation. If that is successful it is in gRPC control that they both use the same method type(currently it is fixed) and only configuration aspect passed in by the API is the URL that needs to match on both sides. I don't think there is a higher chance to break this because then it would break generated proto mismatches as well. If we would switch to gRPC only then this would go away and equivalent code will be generated from proto. Then the user needs to make sure that these proto files don't go out of sync.

Interceptors would let us not predefine the handlers and handle them lazily when a request comes in. That's not really an issue atm as the client would send the methods it supports anyway so that daemon components can make correct decisions.

@tonistiigi Sounds like I need a little more context here. Are we abstracting RPC calls to a central service some where?

@stevvooe The abstraction is just so that you can write isolated features(currently rsync context sending, client-side token queries and auth requests, ssh signing) that can be exposed through remote API, without the features knowing how they would be accessed on the wire. The remote API starts by making a POST request with exposed method names and then hijacks the tcp connection for a transport. The abstraction was added because no transport method seemed clearly better than rest. Currently, there are no plans to ship support for multiple transport implementations.

I am not sure we should abstract too much from gRPC. Actually we already have an hidden coupling to protocol buffer here (which seems worse than an explicit dependency) as the Stream interface implementation only works with proto messages (even if the interface does not hint to that). IMO, If we want a real transport abstraction, we should not have this restriction.
Additionaly, we already have the sync-ness of proto issue, at the message level. So I don't think it would be an issue to describe services in the proto files.

In another way, we could also want to abstract away from grpc, but still enforce protocol buffer as the message serialization mechanism. But if we go this way, we should not abstract the object stream, but only the way []byte payloads go accross the wire.

In a user code perspective, the transport API feels a bit too complex anyway. On the exposing side, the handler registration code seems overly complex (having to define a method, that takes a callback, that you must call for each method you want to expose feels so 80's in a world with introspection-enabled languages).
On the consuming side, having to call methods by putting their name in a string, passing arguments as a combination of a map of string slices and object streaming is much lower level that I would expect from a modern transport stack.

Ultimately what I want is a way to expose a simple request/response interface implementation like

type AuthDelegate interface{ GetBasicCredentials(registry string)(username, password string, err error) GetOAuthAccessToken(registry string, oauthParams OAuthParams) (accessToken string, err error) }

and for streaming based services like diffcopy, interfaces like:

type DiffCopyService interface{ Proceed(outputFilePackets chan<- FilePacket, inputFileContentRequests <-chan ContentRequest) error }

simonferquel · 2017-05-02T14:00:06Z

I have added 2 PRs directly on Tonis branch to fix issues on Windows:

Respect tar entries modes when rewriting them on Windows tonistiigi/docker#2 (which is basically a cherry-pick of Respect tar entries modes when rewriting them on Windows #32959, fixing issues with tarsums)
Fix an issue on Windows, where dockerfile is not removed from daemon-side build context tonistiigi/docker#3 which fixes an issue with dockerfile not being removed from the context on the daemon side. This should make all tests pass on Windows.

thaJeztah · 2017-05-02T22:37:26Z

Looks like this needs a rebase

tonistiigi · 2017-05-16T16:46:55Z

I've removed the WIP indicator, rebased after cli split and added the persistent cache and garbage collection. I removed the transport interface and replaced it with grpc only logic similar to what @simonferquel suggested. The discovery part is optional, independent layer. It is generated from proto but the grpc itself does not rely on it. gRPC itself only uses the generated proto code to connect a client with the handler.

# docker system df
TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              3                   1                   479.5MB             479.5MB (100%)
Containers          2                   0                   0B                  0B
Local Volumes       0                   0                   0B                  0B
Build Cache                                                 178MB               178MB

I've separated out the implementation of client session itself and file syncing that is one example feature on top of that but they could be reviewed separately or the order which we use to merge actual features doesn't matter. I've also included the CLI update to ease testing, that I will remove before merge.

Base: master...tonistiigi:client-session-base-nocli
tonistiigi/docker-cli@master...tonistiigi:client-session-base-nocli

File syncing:
tonistiigi/docker@tonistiigi:client-session-base-nocli...client-session-fssession-nocli
tonistiigi/docker-cli@tonistiigi:client-session-base-nocli...client-session-fssession-nocli

@dnephin @simonferquel @dmcgowan @tiborvass

simonferquel · 2017-05-17T14:44:42Z

I am still not sure about the Supports method at the method level. If we want to keep it, we should at least encapsulate that in the fssession package (version negociation of client side services should happen at the protocol level). So from fscache we should just have a function call like:

fssyncProtocol, err := fssession.NegociateSyncProtocol(caller)

This way client code does not have to know about the notion of method url etc.

In term of versioning, I propose that we experiment a little with the following scenario:

Client and server have version n they both know only protocol a for file sync
Client and server have version n+1 they both know protocol a and b, and when they negotiate, they should use b as it is more efficient / provides more features
Client has version n and server has version n+1 they should negociate to find the best common protocol
Client has version n+1 and server has version n they should negociate to find the best common protocol

In the fs cache scenario, we have a quite easy scenario as both protocol provide the same value (with more or less efficiency), but if we take the exemple of authentication, we could have:

v1: only exposes GetRegistryAuthConfig(registry string) types.AuthConfig
v2: also exposes GetAccessToken(registry string, oauthParams OAuthParams) string

In the case where client cli only implements v1, a v2 daemon should implement v2 on top of a v1 client (getting oauth access token using AuthConfig provided by the v1 client), in a transparent manner

dnephin

LGTM

Testing nits can be handled later

dnephin · 2017-06-22T17:14:17Z

builder/fscache/fscache_test.go

+	defer fscache.Close()
+
+	err = fscache.RegisterTransport("test", &testTransport{})
+	assert.Nil(t, err)


Very minor nit: I generally use require.Nil(t, err) for errors. The difference is that assert is not fatal, it keeps running the rest of the function, where as require is fatal. In the case of errors I generally think it should end the test, because the state is probably going to make the rest of the assertions fail. Using assert here will make some test failures a bit harder to read, sometimes you end up with a nil panic after the assertions error.

I think it's fine for now.

dnephin · 2017-06-22T17:16:05Z

builder/fscache/fscache_test.go

+	assert.Nil(t, err)
+	assert.Equal(t, string(dt), "data")
+
+	// same id doesn't recalculate anything


It would be nice to have these as different cases so that the cause of failures are more obvious.

These do use the same state that is generated by previous cases.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

Also exposes shared cache and garbage collection/prune for the source data. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

vieux · 2017-06-22T21:38:44Z

LGTM, let's take care the nits if needed in a following PR

GordonTheTurtle added the status/0-triage label Apr 18, 2017

vdemeester added status/1-design-review and removed status/0-triage labels Apr 18, 2017

vdemeester requested review from dnephin, tiborvass, simonferquel and thaJeztah April 18, 2017 10:16

tiborvass force-pushed the builder-remote-context-4 branch 2 times, most recently from 4bcb02b to 8a5114e Compare April 18, 2017 21:50

dnephin reviewed Apr 20, 2017

View reviewed changes

simonferquel mentioned this pull request Apr 21, 2017

[WIP] Implement long running interactive session and sending build context incrementally - with Windows fix #32760

Closed

stevvooe reviewed Apr 24, 2017

View reviewed changes

This was referenced May 1, 2017

Build Secrets #30637

Closed

builder: Proposal: Add support for RUN --mount #32507

Closed

simonferquel mentioned this pull request May 2, 2017

[WIP] Long running sessions with delegated authentication #32967

Closed

GordonTheTurtle assigned aluzzardi May 3, 2017

tonistiigi force-pushed the builder-remote-context-4 branch 2 times, most recently from d4a4fd1 to 60126d9 Compare May 3, 2017 04:54

simonferquel mentioned this pull request May 5, 2017

[WIP] Builder long running session - easier publishing and consuming #33047

Closed

tonistiigi force-pushed the builder-remote-context-4 branch 4 times, most recently from 65a7926 to cb5dfa6 Compare May 16, 2017 05:37

tonistiigi changed the title ~~[WIP] Implement long running interactive session and sending build context incrementally~~ Implement long running interactive session and sending build context incrementally May 16, 2017

dnephin approved these changes Jun 22, 2017

View reviewed changes

tonistiigi added 3 commits June 22, 2017 11:22

Add long-running client session endpoint

ec7b623

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

session: update swagger yaml

4f51ca1

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

vendor: update deps for fssession

7cfcf74

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

tonistiigi force-pushed the builder-remote-context-4 branch from f9f7f8a to 0dcbd38 Compare June 22, 2017 18:36

tonistiigi added 2 commits June 22, 2017 11:52

Implement incremental file sync using client session

5c3d2d5

Also exposes shared cache and garbage collection/prune for the source data. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

Improve routes initialization

8f68adf

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>

tonistiigi force-pushed the builder-remote-context-4 branch from 0dcbd38 to 8f68adf Compare June 22, 2017 18:58

vieux merged commit 050c1bb into moby:master Jun 22, 2017

tonistiigi mentioned this pull request Jun 22, 2017

Incrementally sending build context docker/cli#231

Merged

thaJeztah added impact/api impact/changelog impact/dockerfile labels Jun 22, 2017

thaJeztah mentioned this pull request Jun 22, 2017

Documentation updates for interactive sessions #33786

Merged

This was referenced Jun 23, 2017

Add prune command to cache manager moby/buildkit#25

Closed

Add mutable metadata for snapshots moby/buildkit#27

Closed

Port over interactive builder session moby/buildkit#35

Closed

Add path filtering to build session client #33859

Merged

thaJeztah mentioned this pull request Jul 6, 2017

com.docker.osxfs memory leak? docker/for-mac#1815

Closed

thaJeztah mentioned this pull request Jul 27, 2017

The build and copy steps take an absurdly long time #30414

Closed

thaJeztah mentioned this pull request Sep 25, 2017

docker build throws an error if a file anywhere in the build context tree can't be read by the current user, even if that file has nothing to do with the build. #34711

Closed

thaJeztah mentioned this pull request Oct 25, 2017

What is the meaning of "Uploading context" when executing docker build? #2342

Closed

dlorenc mentioned this pull request Mar 14, 2018

Only send the build context for changed files to the Docker daemon GoogleContainerTools/skaffold#195

Closed

tonistiigi mentioned this pull request Mar 23, 2018

Proposal: caching the context #9553

Closed

thaJeztah mentioned this pull request Jun 14, 2019

Proposal: add support for multiple (named) build-contexts #37129

Closed

This was referenced Oct 2, 2019

build: remove --stream (was experimental) docker/cli#2105

Merged

builder: remove legacy build's session handling #39983

Merged

thaJeztah added the kind/experimental label Jul 8, 2021

thaJeztah mentioned this pull request Jul 8, 2021

Update /system/df example swagger response #42605

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement long running interactive session and sending build context incrementally #32677

Implement long running interactive session and sending build context incrementally #32677

tonistiigi commented Apr 18, 2017 •

edited

Loading

simonferquel commented Apr 18, 2017

dnephin Apr 20, 2017

tonistiigi commented Apr 24, 2017

stevvooe Apr 24, 2017

tonistiigi Apr 24, 2017

stevvooe Apr 24, 2017

tonistiigi Apr 24, 2017

simonferquel May 3, 2017

simonferquel May 3, 2017

simonferquel commented May 2, 2017

thaJeztah commented May 2, 2017

tonistiigi commented May 16, 2017

simonferquel commented May 17, 2017

dnephin left a comment •

edited

Loading

dnephin Jun 22, 2017

dnephin Jun 22, 2017

tonistiigi Jun 22, 2017

vieux commented Jun 22, 2017

Implement long running interactive session and sending build context incrementally #32677

Implement long running interactive session and sending build context incrementally #32677

Conversation

tonistiigi commented Apr 18, 2017 • edited Loading

simonferquel commented Apr 18, 2017

Choose a reason for hiding this comment

tonistiigi commented Apr 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonferquel commented May 2, 2017

thaJeztah commented May 2, 2017

tonistiigi commented May 16, 2017

simonferquel commented May 17, 2017

dnephin left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vieux commented Jun 22, 2017

tonistiigi commented Apr 18, 2017 •

edited

Loading

dnephin left a comment •

edited

Loading