Deriving resources from a scope #4

eleon · 2021-07-06T23:42:31Z

We need a function to add resources to a given scope. For example, if a user starts with an empty scope, she may want to add resources assigned to her that are part of the user scope.

int qv_scope_add(QV_handle qv,
	       QV_scope_t scope,
	       QV_obj_type_t obj,
	       int num_objs);

This function adds num_objs objects of type obj to the input scope.

The text was updated successfully, but these errors were encountered:

eleon · 2021-10-12T23:28:00Z

Actually, rather than having a function to add resources to a scope, we need a function to derive or extract resources from an existing scope (and thus from resources owned). This function can be used when individual workers want to launch work on a specific set of resources.

int
qv_scope_extract(
    qv_context_t *ctx,
    qv_scope_t *scope,
    qv_hw_obj_type_t obj_type,
    int num_objs,
    int hint, 
    qv_scope_t **subscope
);

This function creates subscope with num_objs objects of type obj_type from scope. The hint parameter is intended to tell QV of a desired trait such as close to a NIC. We still need to determine what hints would be useful to have and how to represent them.

eleon · 2021-10-19T22:38:48Z

We should keep track of what objects are given so that we do not give them again until they are freed. We can use scope_free to free the resources associated with a scope. I am not sure we would need qv_scope_nobjs_free(qv, scope, obj_type, nobjs).

eleon · 2021-10-21T21:19:22Z

New suggestion for the function name: qv_subscope_create()

samuelkgutierrez · 2022-05-13T15:44:21Z

@eleon, to maintain tracking on this issue please see progress in c460005. We can discuss the name once we are settled on its functionality.

eleon · 2022-05-13T17:36:43Z

Thank you, @samuelkgutierrez! The semantics look good based on test-mpi-scopes.c. Perhaps, we can create an additional test that focuses solely on qv_scope_create. I'm thinking about testing aspects like once a resource is given through this function, we should not give it again (unless the scope is freed); and that GPUs are given correctly as well. As soon as I get an opportunity, I can write the test.

eleon · 2022-11-22T21:06:38Z

@samuelkgutierrez, I am trying out test-mpi-scopes and found strange behavior from the qv_scope_split operation. I am using a 2-socket architecture, each socket with 18 SMT-2 cores. I would have thought that splitting the node with 2 tasks would have resulted in one socket (18 cores) per task, but that is not the case:

leon@pascal30:qv$ QV_PORT=55996 srun -N1 -n2 quo-vadis/build-pascal/tests/test-mpi-scopes
[1] self_scope taskid is 0
[1] self_scope ntasks is 1
[0] self_scope taskid is 0
[0] self_scope ntasks is 1
[0] base_scope taskid is 0
[0] base_scope ntasks is 2
[1] base_scope taskid is 1
[1] base_scope ntasks is 2
[1] Number of PUs in base_scope is 72
[1] base GID is 1
[0] Number of PUs in base_scope is 72
[0] base GID is 0
[1] Number of PUs in sub_scope is 18
[1] sub_scope taskid is 0
[1] sub_scope ntasks is 1
[0] Number of PUs in sub_scope is 36
[0] sub_scope taskid is 0
[0] sub_scope ntasks is 1
[0] New cpubind is     0-17,36-53
[1] New cpubind is     9-17,45-53
[0] Popped cpubind is  0-17
[1] Popped cpubind is  18-35
[0] Number of PUs in create_scope is 2
[0] create_scope taskid is 0
[0] create_scope ntasks is 1
[1] Number of PUs in sub_sub_scope is 9
[0] Number of PUs in sub_sub_scope is 18

The strange behavior is apparent here:

[1] Number of PUs in sub_scope is 18
[0] Number of PUs in sub_scope is 36

eleon · 2022-11-22T21:08:03Z

Perhaps, I need the AFFINITY_PRESERVING flag? If so, shouldn't it be the default? I guess, I need to read a bit more to fully understand. I think this issue may be related to Issue #9 rather than this page's issue.
Sorry for the detour, I am now focusing on qv_scope_create :)

eleon · 2022-11-22T23:30:29Z

Tested qv_scope_create and added associated test test-mpi-scope-create.c
It works! Thanks, @samuelkgutierrez.
The only issue is when a set of cores have been assigned to a scope, they can be re-assigned to another scope even if the original scope has not been released:

leon@pascal30:qv$ QV_PORT=55996 srun -N1 -n2 quo-vadis/build-pascal/tests/test-mpi-scope-create 
[0] Base scope w/36 cores, running on 0-17
[1] Base scope w/36 cores, running on 18-35

===Scope split===
=> [0] Split: got 18 cores, running on 0-17,36-53
=> [1] Split: got 18 cores, running on 18-35,54-71

===Asking and not releasing 1,10 core scopes===

===Scope w/1 cores===
=> [0] Core scope: got 1 cores, running on 0,36
=> [1] Core scope: got 1 cores, running on 18,54
[0] Popped up to 0-17,36-53
[1] Popped up to 18-35,54-71

===Scope w/10 cores===
=> [0] Core scope: got 10 cores, running on 0-9,36-45
=> [1] Core scope: got 10 cores, running on 18-27,54-63
[0] Popped up to 0-17,36-53
[1] Popped up to 18-35,54-71

===Asking and releasing 5-core scopes===

===Scope w/5 cores===
=> [1] Core scope: got 5 cores, running on 18-22,54-58
=> [0] Core scope: got 5 cores, running on 0-4,36-40
[1] Popped up to 18-35,54-71
[0] Popped up to 0-17,36-53

===Scope w/5 cores===
=> [0] Core scope: got 5 cores, running on 0-4,36-40
=> [1] Core scope: got 5 cores, running on 18-22,54-58
[0] Popped up to 0-17,36-53
[1] Popped up to 18-35,54-71

eleon · 2022-11-22T23:34:33Z

In the example above, each task gets a scope with 1 core:

task 0: 0,36
task 1: 18,54

The scope is not released, then each task asks for 10 cores:

task 0: 0-9,36-45
task 1: 18-27,54-63

In this case, cores 0,36 and 18,54 should have not been used for the second scope, because they are part of an active scope.

samuelkgutierrez · 2022-11-23T01:38:09Z

I don't think we want to exclude the possibility of resources being shared across scopes. We could certainly make better decisions when resource reference counting is implemented, but I don't like the idea of returning a resource exhaustion error code.

eleon · 2022-11-23T01:45:28Z

I agree @samuelkgutierrez, resources being shared across scopes is fine. The issue here is as follows:
Let's say a process has 18 cores in a scope. Then, threads of this process start requesting cores using qv_scope_create (one core per thread, for example). Then, even though the parent scope has 18 cores, all the threads will get the first core, rather than a different core. Like you said, perhaps, this will be solved with reference counting :) Thanks, Sam.

eleon · 2024-01-31T00:21:31Z

Now that we have qv_scope_create (see below), we talked about implementing this functionality with the qv_scope_create_hint_t named QV_SCOPE_CREATE_EXCLUSIVE. When this parameter is used resources given by qv_scope_create won't be given again until the associated scope is freed.

qv_scope_create(
    qv_context_t *ctx,
    qv_scope_t *scope,
    qv_hw_obj_type_t type,
    int nobjs,
    qv_scope_create_hint_t hint,
    qv_scope_t **subscope
);

eleon changed the title ~~Adding resources to a scope~~ Deriving resources from a scope Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deriving resources from a scope #4

Deriving resources from a scope #4

eleon commented Jul 6, 2021

eleon commented Oct 12, 2021

eleon commented Oct 19, 2021 •

edited

Loading

eleon commented Oct 21, 2021

samuelkgutierrez commented May 13, 2022

eleon commented May 13, 2022 •

edited

Loading

eleon commented Nov 22, 2022

eleon commented Nov 22, 2022 •

edited

Loading

eleon commented Nov 22, 2022

eleon commented Nov 22, 2022

samuelkgutierrez commented Nov 23, 2022

eleon commented Nov 23, 2022

eleon commented Jan 31, 2024

Deriving resources from a scope #4

Deriving resources from a scope #4

Comments

eleon commented Jul 6, 2021

eleon commented Oct 12, 2021

eleon commented Oct 19, 2021 • edited Loading

eleon commented Oct 21, 2021

samuelkgutierrez commented May 13, 2022

eleon commented May 13, 2022 • edited Loading

eleon commented Nov 22, 2022

eleon commented Nov 22, 2022 • edited Loading

eleon commented Nov 22, 2022

eleon commented Nov 22, 2022

samuelkgutierrez commented Nov 23, 2022

eleon commented Nov 23, 2022

eleon commented Jan 31, 2024

eleon commented Oct 19, 2021 •

edited

Loading

eleon commented May 13, 2022 •

edited

Loading

eleon commented Nov 22, 2022 •

edited

Loading