Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

kim built images are asynchronously replicated into k8s.io namespace #84

Open
milas opened this issue Oct 21, 2021 · 1 comment
Open

Comments

@milas
Copy link

milas commented Oct 21, 2021

Current Behavior

The kim agent watches for containerd image events:

func (a *Agent) syncImageContent(ctx context.Context, ctr *containerd.Client) {
events, errors := ctr.EventService().Subscribe(ctx, `topic~="/images/"`)
for {
select {
case <-ctx.Done():
return
case err, ok := <-errors:
if !ok {
return
}
logrus.Errorf("sync-image-content: %v", err)
case evt, ok := <-events:
if !ok {
return
}
if evt.Namespace != buildkitNamespace {
continue
}
if err := handleImageEvent(ctx, ctr, evt.Event); err != nil {
logrus.Errorf("sync-image-content: handling %#v returned %v", evt, err)
}
}
}
}

On image create/update events, the handler copies the new/updated image to the k8s.io namespace so that it's visible to CRI/usable by kubelet.

This all happens asynchronously / in its own goroutine. kim build is unaware this is happening and does not block on it.

Desired Behavior

It'd be nice to be able to (optionally?) wait for the sync to have finished when calling kim build to guarantee the image is ready for use.

Context

As it stands, it's possible to build an image with kim and attempt to use it in a Deployment before the sync has finished, resulting in errors/retries on the K8s side.

We're seeing this with Tilt, where we have a kim_build extension - Tilt calls kim and then applies the updated YAML to the cluster, resulting in some retries/backoff because the sync might not be done yet.

@dweomer
Copy link
Contributor

dweomer commented Oct 22, 2021

I was thinking about this when I was working on #79 to fix #74 (one of the reason I hadn't merged #79 yet: for the edge case I was attempting to fix this asynchronicity became more pronounced). My idea to fix #79 is to refactor the content copy to happen on a calling context initiated by the client but still mediated by the backend agent, similar to how pull/fetch works. Then the default client implementation would be to block on copy progress (with a reasonable timeout) which would address the problem that you have encountered.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants