Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume plugin changes to fix EFS unmount #4053

Merged
merged 13 commits into from
Dec 21, 2023

Conversation

amogh09
Copy link
Contributor

@amogh09 amogh09 commented Dec 8, 2023

Summary

This PR updates Amazon ECS Volume Plugin, that is used for provisioning and deprovisioning EFS volumes on ECS container instances, so that it mounts and unmounts volumes when Mount and Unmount requests are made to it instead of Create and Remove requests, respectively.

For EFS tasks in awsvpc network mode the task network namespace is torn down at task stop which causes EFS volume unmounting to hang when a Remove request is made to the plugin for the task's volumes at task cleanup time later. With this change, EFS volumes will be unmounted from the host when the task is being stopped and while task network namespace is still in place and configured.

After this change, Create, Mount, Unmount, and Remove requests will be handled by the plugin as described below.

  • Create - Create mount path on the host and store information about the volume in plugin state but don't mount the volume yet.
  • Mount - If there are no active mounts for the volume then mount it on the host. Add the mount to the volume's set of active mounts in plugin state.
  • Unmount - Remove the mount from the volume's set of active mounts in plugin state. If the volume's active mount count has decreased to zero then unmount the volume from the host.
  • Remove - If the volume is still mounted on the host then unmount it (this is to handle unmounting of volumes created by older versions of the plugin that are still in state). Remove the mount path from the host and remove the volume from plugin's state.

Implementation details

  • Moved code for mounting an EFS volume from Create method to Mount method of AmazonECSVolumePlugin. Added relevant tests in TestPluginMount function. Also added roll-back steps to Mount method to make sure its effects are atomic.
  • Added code for unmounting an EFS volume to Unmount method of AmazonECSVolumePlugin. Added relevant tests in TestPluginUnmount function. Also added roll-back steps to Unmount method to make sure its effects are atomic.
  • Updated Remove method to make it unmount the volume only if it's currently mounted. Added a new IsMounted method to VolumeDriver interface. Implemented IsMounted method for ECSVolumeDriver that checks if the volume is present in driver's state.
  • Updated Remove method of ECSVolumeDriver to swallow unmount errors caused due to the volume not being mounted on the host. The method already had that logic as it swallows errors with "not mounted" in them but during my testing I found out that error "no mount point specified" is also returned by umount when the volume is not mounted. This is to make Remove method not fail for already unmounted volumes.
  • Updated Create method of ECSVolumeDriver to not check its internal state for volume's mount state before mounting the requested volume. This is because ECSVolumeDriver does not track active mounts on the volume and an unmounted but not yet removed volume will reappear in its state if plugin restarts and with the check in place the driver will not allow the Mount request handler to perform volume mounts. The check does not server any purpose currently as the driver's internal state is completely controlled by AmazonECSVolumePlugin that has its own checks to prevent duplicate mount attempts. In future we should make ECSVolumeDriver completely stateless and have all state handling in AmazonECSVolumePlugin only but that's out of scope of this PR.

Testing

Did extensive manual testing with the new plugin. Tested that -

  • awsvpc tasks clean up without issues ✅
  • Host mode tasks work fine ✅
  • Bridge mode tasks work fine ✅
  • New version of the plugin is be able to unmount and remove volumes created by an older version of the plugin ✅
  • Volume plugin restart between Unmount and Remove requests for a volume doesn't cause an error during Remove request ✅
  • New plugin version is able to handle restarts without any issues ✅
  • Old version of the plugin is compatible with the state file created by the new version. ✅
  • Tasks with multiple containers with EFS volumes work as expected ✅
  • Task with multiple EFS volumes work as expected ✅
  • A task with EFS volumes can be run multiple times on the same container instance without any issues ✅
  • Tested that rollback works as expected for Mount request. For this test I deleted the parent directory of the state file directory of the plugin to make state save operations fail. ✅

A new functional test has been added that runs an AWSVPC EFS task twice in succession on an instance with a cleanup duration of 1 second. The test fails on instances with current version of the volume plugin but passes on instances with the version of the volume plugin in this PR.

Ran Agent functional tests on instances with a new version of volume plugin installed. All tests passed.

Added many unit tests.

New tests cover the changes: yes

Description for the changelog

Bugfix: Fix EFS unmount hanging issue for awsvpc tasks

Does this PR include breaking model changes? If so, Have you added transformation functions?

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@amogh09 amogh09 force-pushed the volume-plugin-redesign branch 3 times, most recently from 19d8343 to 534eac3 Compare December 15, 2023 16:21
@amogh09 amogh09 changed the title [WIP] Volume plugin changes to fix EFS unmount Volume plugin changes to fix EFS unmount Dec 15, 2023
@amogh09 amogh09 marked this pull request as ready for review December 18, 2023 17:21
@amogh09 amogh09 requested a review from a team as a code owner December 18, 2023 17:21
@sparrc
Copy link
Contributor

sparrc commented Dec 18, 2023

Handle mounting and unmounting of EFS volumes at container start and stop instead of task start and cleanup, respectively, to fix EFS volume unmounting issue for awsvpc tasks.

bit of a nitpick but for the changelog I would word this as: Bugfix: <briefly describe bug being fixed here>

sparrc
sparrc previously approved these changes Dec 18, 2023
ecs-init/volumes/ecs_volume_plugin.go Show resolved Hide resolved
ecs-init/volumes/ecs_volume_plugin.go Show resolved Hide resolved
ecs-init/volumes/ecs_volume_plugin.go Show resolved Hide resolved
ecs-init/volumes/ecs_volume_plugin.go Outdated Show resolved Hide resolved
@amogh09 amogh09 merged commit d995258 into aws:dev Dec 21, 2023
38 checks passed
@mye956 mye956 mentioned this pull request Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants