Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch 'boltdb' into dev #2562

Merged
merged 22 commits into from
Aug 12, 2020
Merged

Merge branch 'boltdb' into dev #2562

merged 22 commits into from
Aug 12, 2020

Conversation

fenxiong
Copy link
Contributor

@fenxiong fenxiong commented Aug 11, 2020

Summary

Merge branch 'boltdb' into dev.

Implementation details

No merge conflict.

Testing

Functional test/manual test on the boltdb branch.

Description for the changelog

Enhancement - Agent's internal state management mechanism is changed from a custom json state file to boltdb. This change is made to reduce its resource consumption especially under high task density/mutation rate.

Licensing

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

fenxiong and others added 22 commits June 17, 2020 11:52
Added the new data interface for data management in agent.
Added dummy boltdb implementation of the interface and its initialization.
Details:
* data: implemented SaveContainer, SaveDockerContainer, DelContainer, GetContainers, SaveTask, DelTask, GetTasks. Added a few helper functions to cover common boltdb interaction.
* api/container: added a new field TaskARN to allow easier generation of key when saving a container to db. This field will be populated with correct value in PostUnmarshalTask in a later code change.
* utils: added a helper function to get task id from task arn.
Changes are made in following packages:
* acs/handler: save task to boltdb after adding the task to task engine;
* api/task: populate task arn to container in PostUnmarshalTask. this is used to generate database key when saving the container;
* app/agent: initialize boltdb data client and pass it to task engine, acs handler and event handler;
* data: added a no-op client which is used in testing and when ECS_CHECKPOINT is set to false;
* engine:
  - added a file data.go which covers interaction with boltdb;
  - task engine: remove task and containers data from database when cleaning up the task; added a method SetDataClient to set the client similar to how the state manager was set;
  - task manager: save task to boltdb when its desired/known status changes and when resource known status changes; save container to boltdb when its desired/known status changes and when updating its metadata. these are done in handleDesiredStatusChange, handleContainerChange and handleResourceStateChange in task_manager.go;
* eventhandler: save task/container in boltdb after updating their sent status.
Save various metadata to the metadata bucket in boltdb. Details:
* acs/handler: made changes to save task manifest seq num to boltdb;
* app: made changes to save agent version, availability zone, cluster name, container instance arn and ec2 instance id to boltdb; removed a redundant unit test TestDoStartHappyPath from agent_unix_test.go as it is covered by TestDoStartRegisterAvailabilityZone in agent_test.go which is basically the same, and renamed the latter as TestDoStartHappyPath;
* data: implemented SaveMetadata and GetMetadata.
…dated ImageManager to use data Client instead of state manager to persist image states.
For a task in awsvpc network mode, the task engine state holds a mapping between the task's local ip address and the task, and the mapping is saved as part of the state via state manager. With migration to boltdb, this mapping is not saved. So to maintain this information in boltdb, the ip address is added as a field of the task struct and it is saved together with the task.
…. Updated agent to use data Client instead of state manager to persist eni attachment data
Implemented logic for loading data from boltdb upon startup, while preseving backward compatibility by falling back to loading from state file. Details:
* app:
 - In data.go, implement method `loadData` that loads data from previous data file, either boltdb or state file. In the later case, data is migrated to boltdb after loading. Behavior of three cases are considered:
    1. Agent starts from fresh instance (no previous state):
       (1) Try to load from boltdb, get nothing;
       (2) Try to load from state file, get nothing;
       (3) Return empty data.

    2. Agent starts with previous state stored in boltdb:
       (1) Try to load from boltdb, get the data;
       (2) Return loaded data.

    3. Agent starts with previous state stored in state file (i.e. it was just upgraded from an old agent that uses state file):
       (1) Try to load from boltdb, get nothing;
       (2) Try to load from state file, get something;
       (3) Save loaded data to boltdb;
       (4) Return loaded data.

 - In agent.go, invoke `loadData` method to load data, replacing the existing few lines of code that uses state manager to load data.
 - Update a few unit tests to use actual task engine state instead of mock one because with the changes, it would be tedious to list all the expected calls to the engine state.
* engine: added a method SaveState which saves the whole task engine state to boltdb.
Also added logic to save attachment sent status as part of task state change, in agent/eventhandler/task_handler.go.
Previous commit lowers the test coverage by 0.1% without obvious reason. Raising code coverage in a poorly covered package instead.
Remove unnecessary db saves and use batch for db update
@fenxiong fenxiong added this to the 1.44.0 milestone Aug 11, 2020
@fenxiong fenxiong marked this pull request as ready for review August 12, 2020 18:39
@fenxiong fenxiong requested a review from a team August 12, 2020 18:40
@fenxiong fenxiong merged commit 8c9cc5b into aws:dev Aug 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants