Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement long running interactive session and sending build context incrementally #32677

Merged
merged 5 commits into from
Jun 22, 2017

Conversation

tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Apr 18, 2017

depends on #31984
fixes #31829

This PR adds a new POST /session endpoint to the daemon that can be used for running interactive long-running protocols between client and the daemon.

Not ready for code review yet, but open for feedback on the design. Currently, this is missing cache handling features and validations for the fs change stream.

Client calls this endpoint and exposes features that daemon can call back to. The first use case for this, also included in this PR, is the implementation to send incremental context changes for the builder. There is work going on to use this for accessing pull credentials. Later it could be used for accessing secrets from the client, transferring build artifacts back to client etc.

When client calls /session endpoint it requests upgrade to a different transport protocol it exposes possible callback features in the request headers. Only possible transport method currently supported is gRPC.

Client API package adds only one new method client.DialSession that is used for requesting a hijacked connection over the HTTP API. The other logic is in separate packages so that api package doesn't add any extra dependencies.

client/session (location tbd) contains the actual code for creating the interactive sessions. This is called by the cli and takes client.DialSession as a dependency. This package also defines the interfaces that need to be implemented to expose new features or new transports on top of the session.

The implementation for the incrementally sending filesystem changes is in client/session/fssession. This uses the fs changes stream algorithm from containerd that is modified to compare files from the client with the files from the previous build instead of comparing rwlayer to rolayer. It currently uses helper package tonistiigi/fsutil for the file checks, this package would likely go away in the future and the responsibility will be split by containerd and continuity repos.

fscache component implements the access point for the filesystem data has been transferred from external sources. After building the build context remains there so it can be used by the next invocations. GC routine of docker prune(tbd) can delete this data any time and would cause the client to resend full context next time.

Testing out:

The feature is behind experimental flag so to test it, daemon needs to be started with --experimental.

To invoke a builder with using the session stream instead of sending the tar context you have to set the --stream flag.

Currently, for testing individual components this will use the session but still transfer the files with a tar archive(so there shouldn't be meaningful performance gain). To use the protocol that can do rsync-like incremental updates, set env variable BUILD_STREAM_PROTOCOL to diffcopy.

# BUILD_STREAM_PROTOCOL=diffcopy docker build --stream .
Streaming build context to Docker daemon  109.3MB
Step 1/2 : FROM busybox
 ---> d9551b4026f0
Step 2/2 : COPY . .
 ---> a4c80257278d
Removing intermediate container 36029a58b3c4
Successfully built a4c80257278d

# BUILD_STREAM_PROTOCOL=diffcopy docker build --stream .
Streaming build context to Docker daemon      55B
Step 1/2 : FROM busybox
 ---> d9551b4026f0
Step 2/2 : COPY . .
 ---> Using cache
 ---> a4c80257278d
Successfully built a4c80257278d

# echo "foobar" > foo
# BUILD_STREAM_PROTOCOL=diffcopy docker build --stream .
Streaming build context to Docker daemon      90B
Step 1/2 : FROM busybox
 ---> d9551b4026f0
Step 2/2 : COPY . .
 ---> d50e63e1fe1d
Removing intermediate container 0561b43df53e
Successfully built d50e63e1fe1d

@simonferquel
Copy link
Contributor

As discussed while working on credential pulls, the actual session ID should not be part of the docker remote context identifier: in the future we will want to leverage this session for other stuff than build context, and possibly we will have case where context streaming is off, but other features like credential pulls or copy-out would be required.
You can see a candidate design here: https://github.com/simonferquel/docker/tree/remote-context-credentials-ondemand.

Additionaly, client-side services seem a bit clunky both to declare and to discover. I would really like an api on client like: session.Expose(reflect.TypeOf((*myInterface)(nil)).Elem(), myImplementation) and on the backend, having svc, ok := session.Discover((reflect.TypeOf((*myInterface)(nil)).Elem()) (there might be a better way to get hold on an interface type though). That would certainly require describing shared interface in .proto files, but that would feel both more robust and cleaner than using strings for method names and maps of slices of strings for parameters).

@tiborvass tiborvass force-pushed the builder-remote-context-4 branch 2 times, most recently from 4bcb02b to 8a5114e Compare April 18, 2017 21:50
@@ -63,52 +70,94 @@ type Builder struct {
cacheBusted bool
buildArgs *buildArgs
escapeToken rune
dockerfile *parser.Result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe there was a bad merge here? This was just removed

@tonistiigi
Copy link
Member Author

@simonferquel @dnephin @cpuguy83 @dmcgowan What's your thoughts on the transport interface for this? I only made the generic interface https://github.com/moby/moby/pull/32677/files#diff-8d234f28fa68cff9ee839315dbeaeb49R51 because I couldn't decide in favor of a single transport(grpc callbacks, websocket, single hijacked grpc stream, ssh etc). If we only support grpc then this could be described more with proto files. I still wouldn't prefer to use grpc generation directly because I think the callbacks should follow the object.method pattern instead of exposing functions. But we could solve it with a proto plugin that wraps grpc code and adds this and also automatically registers the headers for supported methods. With current testing, grpc seems to work quite well. One of the issues is that the proto parser currently does unnecessary allocations https://github.com/grpc/grpc-go/blob/38df39bad1892a821a4feac7f7506a19a13661f2/rpc_util.go#L231-L233 but that could be patched.

I also made another poc implementation using this, for exposing ssh signers to the builder with docker build --ssh . tonistiigi@a175773 Currently missing limiting auth socket to specific keys and forbidding key management requests.

return gc, nil
}

func (gc *grpcCaller) Call(ctx context.Context, id, method string, opt map[string][]string) (session.Stream, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking fairly low-level. This is a major risk in depending on grpc this tightly and will couple us with the internal implementation.

Is there any reason that interceptors won't work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On API level this doesn't depend on gRPC at all atm(but could change by the comment above). It only depends on the transport interface. The API exposes what method/transport combination is supported for the negotiation. If that is successful it is in gRPC control that they both use the same method type(currently it is fixed) and only configuration aspect passed in by the API is the URL that needs to match on both sides. I don't think there is a higher chance to break this because then it would break generated proto mismatches as well. If we would switch to gRPC only then this would go away and equivalent code will be generated from proto. Then the user needs to make sure that these proto files don't go out of sync.

Interceptors would let us not predefine the handlers and handle them lazily when a request comes in. That's not really an issue atm as the client would send the methods it supports anyway so that daemon components can make correct decisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi Sounds like I need a little more context here. Are we abstracting RPC calls to a central service some where?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe The abstraction is just so that you can write isolated features(currently rsync context sending, client-side token queries and auth requests, ssh signing) that can be exposed through remote API, without the features knowing how they would be accessed on the wire. The remote API starts by making a POST request with exposed method names and then hijacks the tcp connection for a transport. The abstraction was added because no transport method seemed clearly better than rest. Currently, there are no plans to ship support for multiple transport implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure we should abstract too much from gRPC. Actually we already have an hidden coupling to protocol buffer here (which seems worse than an explicit dependency) as the Stream interface implementation only works with proto messages (even if the interface does not hint to that). IMO, If we want a real transport abstraction, we should not have this restriction.
Additionaly, we already have the sync-ness of proto issue, at the message level. So I don't think it would be an issue to describe services in the proto files.

In another way, we could also want to abstract away from grpc, but still enforce protocol buffer as the message serialization mechanism. But if we go this way, we should not abstract the object stream, but only the way []byte payloads go accross the wire.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a user code perspective, the transport API feels a bit too complex anyway. On the exposing side, the handler registration code seems overly complex (having to define a method, that takes a callback, that you must call for each method you want to expose feels so 80's in a world with introspection-enabled languages).
On the consuming side, having to call methods by putting their name in a string, passing arguments as a combination of a map of string slices and object streaming is much lower level that I would expect from a modern transport stack.

Ultimately what I want is a way to expose a simple request/response interface implementation like

type AuthDelegate interface{
 GetBasicCredentials(registry string)(username, password string, err error)
 GetOAuthAccessToken(registry string, oauthParams OAuthParams) (accessToken string, err error)
}

and for streaming based services like diffcopy, interfaces like:

type DiffCopyService interface{
 Proceed(outputFilePackets chan<- FilePacket, inputFileContentRequests <-chan ContentRequest) error
}

@simonferquel
Copy link
Contributor

I have added 2 PRs directly on Tonis branch to fix issues on Windows:

@thaJeztah
Copy link
Member

Looks like this needs a rebase

@tonistiigi tonistiigi force-pushed the builder-remote-context-4 branch 2 times, most recently from d4a4fd1 to 60126d9 Compare May 3, 2017 04:54
@tonistiigi tonistiigi force-pushed the builder-remote-context-4 branch 4 times, most recently from 65a7926 to cb5dfa6 Compare May 16, 2017 05:37
@tonistiigi tonistiigi changed the title [WIP] Implement long running interactive session and sending build context incrementally Implement long running interactive session and sending build context incrementally May 16, 2017
@tonistiigi
Copy link
Member Author

I've removed the WIP indicator, rebased after cli split and added the persistent cache and garbage collection. I removed the transport interface and replaced it with grpc only logic similar to what @simonferquel suggested. The discovery part is optional, independent layer. It is generated from proto but the grpc itself does not rely on it. gRPC itself only uses the generated proto code to connect a client with the handler.

# docker system df
TYPE                TOTAL               ACTIVE              SIZE                RECLAIMABLE
Images              3                   1                   479.5MB             479.5MB (100%)
Containers          2                   0                   0B                  0B
Local Volumes       0                   0                   0B                  0B
Build Cache                                                 178MB               178MB

I've separated out the implementation of client session itself and file syncing that is one example feature on top of that but they could be reviewed separately or the order which we use to merge actual features doesn't matter. I've also included the CLI update to ease testing, that I will remove before merge.

Base: master...tonistiigi:client-session-base-nocli
tonistiigi/docker-cli@master...tonistiigi:client-session-base-nocli

File syncing:
tonistiigi/docker@tonistiigi:client-session-base-nocli...client-session-fssession-nocli
tonistiigi/docker-cli@tonistiigi:client-session-base-nocli...client-session-fssession-nocli

@dnephin @simonferquel @dmcgowan @tiborvass

@simonferquel
Copy link
Contributor

I am still not sure about the Supports method at the method level. If we want to keep it, we should at least encapsulate that in the fssession package (version negociation of client side services should happen at the protocol level). So from fscache we should just have a function call like:

fssyncProtocol, err := fssession.NegociateSyncProtocol(caller)

This way client code does not have to know about the notion of method url etc.

In term of versioning, I propose that we experiment a little with the following scenario:

  • Client and server have version n they both know only protocol a for file sync
  • Client and server have version n+1 they both know protocol a and b, and when they negotiate, they should use b as it is more efficient / provides more features
  • Client has version n and server has version n+1 they should negociate to find the best common protocol
  • Client has version n+1 and server has version n they should negociate to find the best common protocol

In the fs cache scenario, we have a quite easy scenario as both protocol provide the same value (with more or less efficiency), but if we take the exemple of authentication, we could have:

  • v1: only exposes GetRegistryAuthConfig(registry string) types.AuthConfig
  • v2: also exposes GetAccessToken(registry string, oauthParams OAuthParams) string

In the case where client cli only implements v1, a v2 daemon should implement v2 on top of a v1 client (getting oauth access token using AuthConfig provided by the v1 client), in a transparent manner

Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Testing nits can be handled later

defer fscache.Close()

err = fscache.RegisterTransport("test", &testTransport{})
assert.Nil(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor nit: I generally use require.Nil(t, err) for errors. The difference is that assert is not fatal, it keeps running the rest of the function, where as require is fatal. In the case of errors I generally think it should end the test, because the state is probably going to make the rest of the assertions fail. Using assert here will make some test failures a bit harder to read, sometimes you end up with a nil panic after the assertions error.

I think it's fine for now.

assert.Nil(t, err)
assert.Equal(t, string(dt), "data")

// same id doesn't recalculate anything
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have these as different cases so that the cause of failures are more obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do use the same state that is generated by previous cases.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Also exposes shared cache and garbage collection/prune
for the source data.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
@vieux
Copy link
Contributor

vieux commented Jun 22, 2017

LGTM, let's take care the nits if needed in a following PR

@vieux vieux merged commit 050c1bb into moby:master Jun 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Incremental build context sending
10 participants