-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add host resource manager methods #3700
Add host resource manager methods #3700
Conversation
"github.com/aws/amazon-ecs-agent/agent/ecs_client/model/ecs" | ||
"github.com/aws/amazon-ecs-agent/agent/logger" | ||
"github.com/aws/amazon-ecs-agent/agent/utils" | ||
) | ||
|
||
// TODO remove this once resource, consume are used | ||
//lint:file-ignore U1000 Ignore all unused code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static check directive will be removed in a future change once consume and release are being used.
// false, err -> resources map has errors, something went wrong | ||
// true, nil -> successfully consumed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add what's going to happen to the task in those two scenarios, similar to the first case?
// false, nil -> did not consume, should stay pending | ||
// false, err -> resources map has errors, something went wrong | ||
// true, nil -> successfully consumed | ||
func (h *HostResourceManager) consume(taskArn string, resources map[string]*ecs.Resource) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function be exported (and also "release" below?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see these functions to be used only in engine package. Unless there are good reasons to export them, I believe we can keep them as it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we put host resource manager in its own package? How are consumers of host resource manager supposed to know what methods form its public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We put it in engine (and not in it's own package) because we only see HostResourceManager
as being used to enable task scheduling, specifically in task_manager
and docker_task_engine
, and hence did not export the methods as well.
h.hostResourceManagerRWLock.Lock() | ||
defer h.hostResourceManagerRWLock.Unlock() | ||
|
||
defer logger.Debug("Consumed resources after task consume call", logger.Fields{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - can probably have a printResources
method that can pretty print a resource map
}) | ||
|
||
// Check if already consumed | ||
_, ok := h.consumedResource[taskArn] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we mean to check taskConsumed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching it! will correct that
cpuResource, ok := resources["CPU"] | ||
if ok { | ||
if *(h.initialHostResource["CPU"].IntegerValue) < *(h.consumedResource["CPU"].IntegerValue)+*(cpuResource.IntegerValue) { | ||
return false, nil | ||
} | ||
} else { | ||
return false, fmt.Errorf("No CPU in task resources") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have multiple integer type resources, consider having a common checkConsumableIntType(resourceName string, resources map[string]*ecs.Resource)
Similar for string set type.
return false, nil | ||
} | ||
} else { | ||
return false, fmt.Errorf("No MEMORY in task resources") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who's going to be making sure the key is going to be present? the "converter" that takes a task and outputs the task resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of ToHostResources(#3706) will be used here, and that will always all these entries
defer logger.Debug("Consumed resources after task release call", logger.Fields{ | ||
"taskArn": taskArn, | ||
"CPU": *h.consumedResource["CPU"].IntegerValue, | ||
"MEMORY": *h.consumedResource["MEMORY"].IntegerValue, | ||
"PORTS_TCP": h.consumedResource["PORTS_TCP"].StringSetValue, | ||
"PORTS_UDP": h.consumedResource["PORTS_UDP"].StringSetValue, | ||
"GPU": *h.consumedResource["GPU"].IntegerValue, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a helper print method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will create the recommended printResources
method
cpuResource, ok := resources["CPU"] | ||
if ok { | ||
if *(h.initialHostResource["CPU"].IntegerValue) < *(h.consumedResource["CPU"].IntegerValue)+*(cpuResource.IntegerValue) { | ||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a log here to indicate which resource type is the task blocked on
portsResource, ok = resources["PORTS_UDP"] | ||
if ok { | ||
taskPortsSlice := portsResource.StringSetValue | ||
for _, port := range taskPortsSlice { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All consumed ports for a single task are going to be in a contiguous sub-slice in consumedResource["PORTS_UDP"]
, since we add them together. Therefore, instead of removing ports one at a time, I think we can find the starting and ending index and remove all at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of implementing as a generic slice check, but can get this optimization in as well
initialHostResource map[string]*ecs.Resource | ||
consumedResource map[string]*ecs.Resource | ||
initialHostResource map[string]*ecs.Resource | ||
consumedResource map[string]*ecs.Resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a meta question here. IIUC the keys to these two maps are resource types such as CPU, memory, and ports. Is the full range of resource types managed by HostResourceManager
type static or dynamic? If it is static, then have we thought about adding dedicated fields for each resource type and using new easy-to-use types? ecs.Resource
type seems quite error prone as it is not well typed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be 5 resource types accounted for in HostResourceManager currently (cpu, memory, ports_tcp, ports_udp, gpu). We are choose ecs.Resource
type as this is the same format agent uses to report host resources to ecs when registering instance (here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. IIUC the goal of HostResourceManager
is to manage the state of resources and ecs.Resource
type doesn't seem to be easy-to-use for that purpose. It is probably designed for communicating the resources state to the backend.
Also since each resource has its own mutation function (CPU is a continuous variable while ports are a set), it might be worthwhile to define an interface for resource and handle the state changes in the implementations.
This is a non-blocking comment and I am just sharing my opinion. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree a separate type for HostResourceManager
can be a bit more useful. On the other hand I also see that by re-using ecs.Resource
we are able to piggy back on an existing implementation which could be easy in it's own way as it is declared to define ecs resources (albeit to communicate to backend, perhaps). And using a map, accessing values becomes fairly easy.
If it is not much of a hassle, I will stay with ecs.Resource
implementation, as previous, parallel (and future) work is already using this type. But I will be willing to change this upon recommendation :)
d2831ee
to
0664d5f
Compare
return true, nil | ||
} | ||
|
||
ok, err := h.consumable(resources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it typically mean when we have 1 resource type errored out of the 5 resource types, and the other ones were fine? What will be our action when this happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never happen, but putting in as a check. The task should fail to start with incorrect resource conversion as put in the comment above consume // false, err -> resources map has errors, task should fail as cannot schedule with 'wrong' resource map (this basically never happens)
// false, nil -> did not consume, should stay pending | ||
// false, err -> resources map has errors, something went wrong | ||
// true, nil -> successfully consumed | ||
func (h *HostResourceManager) consume(taskArn string, resources map[string]*ecs.Resource) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we put host resource manager in its own package? How are consumers of host resource manager supposed to know what methods form its public API?
58844e5
to
b9eb126
Compare
} | ||
|
||
// Checks all resources exists and their values are not nil | ||
func checkResourcesHealth(resources map[string]*ecs.Resource) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we necessarily want to check that all resource types exist in the map. Technically, for tasks that don't consume certain resource types, it should be okay for them to leave it out.
We might want to do check, instead:
for each key,val in resources:
does key exist in host resources? i.e. is it a recognized resource type? if not, throw error
if valid key, then check if value type matches
Lastly, check if the specific type of value is not nil
for ; end >= 0; end-- { | ||
if *s1[end] == *s2[len(s2)-1] { | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have end = begin + len(s2)
?
f65e6f5
to
44f0d4d
Compare
44f0d4d
to
dbb2239
Compare
This reverts commit 40590fe.
This reverts commit bec1303.
* Revert "Revert "host resource manager initialization"" This reverts commit dafb967. * Revert "Revert "Add method to get host resources reserved for a task (#3706)"" This reverts commit 8d824db. * Revert "Revert "Add host resource manager methods (#3700)"" This reverts commit bec1303. * Revert "Revert "Remove task serialization and use host resource manager for task resources (#3723)"" This reverts commit cb54139. * Revert "Revert "add integ tests for task accounting (#3741)"" This reverts commit 61ad010. * Revert "Revert "Change reconcile/container update order on init and waitForHostResources/emitCurrentStatus order (#3747)"" This reverts commit 60a3f42. * Revert "Revert "Dont consume host resources for tasks getting STOPPED while waiting in waitingTasksQueue (#3750)"" This reverts commit 8943792.
* Revert reverted changes for task resource accounting (#3796) * Revert "Revert "host resource manager initialization"" This reverts commit dafb967. * Revert "Revert "Add method to get host resources reserved for a task (#3706)"" This reverts commit 8d824db. * Revert "Revert "Add host resource manager methods (#3700)"" This reverts commit bec1303. * Revert "Revert "Remove task serialization and use host resource manager for task resources (#3723)"" This reverts commit cb54139. * Revert "Revert "add integ tests for task accounting (#3741)"" This reverts commit 61ad010. * Revert "Revert "Change reconcile/container update order on init and waitForHostResources/emitCurrentStatus order (#3747)"" This reverts commit 60a3f42. * Revert "Revert "Dont consume host resources for tasks getting STOPPED while waiting in waitingTasksQueue (#3750)"" This reverts commit 8943792. * fix memory resource accounting for multiple containers in single task (#3782) * fix memory resource accounting for multiple containers * change unit tests for multiple containers, add unit test for awsvpc
Summary
HostResourceManager
keeps account of task resources for tasks running/scheduled to run on the instance Agent is running on. It will keep an account of cummulativeCPU
,MEMORY
,PORTS_TCP
,PORTS_UDP
andGPU
which are the same resources registered with ECS backend to schedule tasks on ECS instances. The overarching goal is to start a task on an instance as soon as the resources for it become available on the instance, and these resources will be kept track of by this manager.A previous PR (#3684) added initialization for
HostResourceManager
.This PR further adds following methods:
consume
method to keep account of resources of tasks being scheduled to startconsumable
method which given a set of resources returns whether those resources can be 'consumed' by thatHostResourceManager
release
method to keep account of resources of tasks being stopped and releasing resources and adding back to pool of available resourcesThese methods are being added here. These will be used to schedule tasks according to above description in further changes.
Related Containers Roadmap Issue
aws/containers-roadmap#325
Testing
New tests cover the changes: Yes
Description for the changelog
Add host resource manager methods
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.