Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use resolved digest for image pulls #4157

Merged
merged 1 commit into from
May 7, 2024

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented Apr 25, 2024

Summary

This PR updates image pull logic to use a resolved image manifest digest if one is available. Image manifest digests are resolved during container transition to MANIFEST_PULLED state. The change ensures that the pulled image is the same as pointed by the resolved digest.

Implementation details

  • Add a new method TagImage to DockerClient interface and its implementation. The method tags an image on the host. The implementation performs retries using a new ConstantBackoff strategy that backs off the same duration every time. A constant backoff retry strategy is fine in this case as there is no external service involved.
  • Add a new ConstantBackoff backoff strategy under ecs-agent module. The strategy returns the same amount of backoff duration regardless of how many times its Duration method is called.
  • Update *DockerTaskEngine.pullAndUpdateContainerReference method that is used for pulling container images so that it uses the container's ImageDigest field to prepare a canonical reference to the image to be pulled. The method tags the pulled image with Container.Image if a different image reference was used to pull the image so that image caching and image cleanup continue to work as before.
  • Add a new method GetCanonicalRef to agent/utils/reference package that returns a canonical image reference given an image reference and a manifest digest.
  • Test updates and new tests.

Testing

  • Added a new integration test named TestPullContainerWithAndWithoutDigestInteg to check that *DockerTaskEngine.pullContainer can pull images for containers with and without an ImageDigest set.
  • Added a new integration test named TestPullContainerWithAndWithoutDigestConsistency to check that *DockerTaskEngine.pullContainer pulls the same image with or without a digest set and the image can be inspected with container.Image field in both cases.

In addition to the integration tests above, performed the following manual tests.

  • Ran a variety of tasks with Agent configured to use always image pull behavior. Checked that all tasks ran as expected. Images were pulled using digests and tagged with the image reference in the task definition. Images were cleaned up without any issues.
  • Ran a variety of tasks with Agent configured to use once and then prefer-cached image pull behavior. Checked that all tasks ran as expected. Cached images were used in both cases when found. Image cleanup worked as expected with once image pull behavior. Image pull is disabled when prefer-cached image pull behavior is used.
  • Ran a simple task multiple times with an Agent built with changes in this PR and again with an Agent built against master branch. Both Agents were configured to use always image pull behavior to force image pulls. Measured the task average start times (startedAt - createdAt) and task pull times (pullStoppedAt - pullStartedAt) for both cases. Changes to resolve image manifest digest in Implement transition function for MANIFEST_PULLED state #4152 caused an additional delay in task start times that ranged from 300ms (ECR) to 900ms (Dockerhub), however, with this PR the image pulls are now slightly faster. Pull time for an image that's already available on the host is reduced from ~700ms (Dockerhub), ~250ms (public ECR), and ~130ms (private ECR) to ~260ms (Dockerhub), ~100ms (public ECR), and ~50ms (private ECR). Combined with changes to resolve image manifest digests (Implement transition function for MANIFEST_PULLED state #4152) the overall average increase in task start time in my test environment is ~500ms (Dockerhub) and ~150ms (ECR).

New tests cover the changes: yes

Description for the changelog

Does this PR include breaking model changes? If so, Have you added transformation functions?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 changed the base branch from master to feature/digest-resolution April 25, 2024 00:10
@amogh09 amogh09 force-pushed the pull-with-digest branch 4 times, most recently from 122f9e0 to 78a1bfa Compare April 29, 2024 16:55
@amogh09 amogh09 changed the title Pull with digest Use resolved digest for image pulls Apr 29, 2024
@amogh09 amogh09 marked this pull request as ready for review April 29, 2024 21:47
@amogh09 amogh09 requested a review from a team as a code owner April 29, 2024 21:47
// Given an image reference and a manifest digest string, returns a canonical reference
// for the image.
// If the image reference has a digest then the canonical reference will still use the provided
// manifest digest overwriting the exiting digest in the image reference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking): typo in exiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix this in a follow-up PR right after.

// Tag the pulled image so that it can be found using the image reference in the task.
ctx, cancel := context.WithTimeout(engine.ctx, tagImageTimeout)
defer cancel()
err := engine.client.TagImage(ctx, imageRef, container.Image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are retagging the image with mutable tags (such as nginx:latest), and we have a subsequent task whose container uses the same image but using digest (such as nginx:shaXYZ)

  1. will the new task be able to discover that the local nginx:latest is identical to nginx:shaXYZ and reuse it?
  2. will Agent be able to keep the correct image reference counter on nginx:latest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @yinyic .

  1. will the new task be able to discover that the local nginx:latest is identical to nginx:shaXYZ and reuse it?

If nginx:latest still resolves to nginx:shaXYZ then docker image pull nginx:shaXYZ will behave just like docker image pull nginx:latest and pull will end after docker determines that all image layers are already present on the host. If nginx:shaXYZ is a different image than locally available nginx:latest then Docker will pull the image from the repository and two images will be available on the host after that.

In case of prefer-cached image pull behavior, Agent will do a docker image inspect nginx:shaXYZ. If previously pulled nginx:latest image had shaXYZ digest then inspect will succeed and Agent will use the local image again otherwise inspect will fail and Agent will pull nginx:shaXYZ image.

In case of once image pull behavior, Agent will not find nginx:shaXYZ image in its internal cache and pull the image. After the pull Agent will add nginx:shaXYZ to its internal cache.

The behavior explained above is the same as it exists currently and this PR does not intend to change the "net" image pull behavior of Agent.

  1. will Agent be able to keep the correct image reference counter on nginx:latest?

Agent's image reference counting will continue to work as before. If nginx:shaXYZ and nginx:latest image on the host have the same image ID then Agent will have two references for the same underlying image on the host. If nginx:shaXYZ has a different image ID than local nginx:latest then Agent will have one reference each for the two images. This is the current Agent behavior and this PR does not intend to make any changes to it.

@amogh09 amogh09 merged commit d2ae88e into aws:feature/digest-resolution May 7, 2024
40 checks passed
amogh09 added a commit that referenced this pull request May 7, 2024
amogh09 added a commit that referenced this pull request May 7, 2024
amogh09 added a commit that referenced this pull request May 13, 2024
amogh09 added a commit that referenced this pull request May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants