Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Make Placement Group Wildcard and Indexed Resource Assignments Consistent #48088

Merged
merged 18 commits into from
Nov 7, 2024

Conversation

MengjinYan
Copy link
Collaborator

@MengjinYan MengjinYan commented Oct 17, 2024

Why are these changes needed?

Currently when we allocate resources to tasks or actors with placement group,

  • if the resource request doesn't specify the placement group bundle index, we will not subtract the resources from the indexed resource and will only subtract the resource from the wildcard resource.
  • In addition, if the bundle index is specified, the resource allocation is not consistent between the wildcard resource and the indexed resource for unit resources (e.g. GPU).

The behavior above leads to incorrect GPU assignment when specifying bundle index. Also, task can potentially be allocated across bundles if no bundle index specified.

This PR fix the above behavior to make the allocation on the wildcard resource and the indexed resource consistent with the following changes:

  1. Updated the TryAllocate function in resource_instance_set.cc to add logic to handle the pg resource allocation. If the resource requirement include resource in a placement group,
    (1) it will always first try to allocation on index resources. For resource requirements without a bundle index, it will iterate through all the bundles and try to find one with enough space.
    (2)Then it will try to allocate the wildcard resource based on the index resource allocation.
  2. Added corresponding helper functions & auxiliary data structures
  3. Added/Modified tests for the new behavior

Related issue number

Closes #29811

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
@MengjinYan MengjinYan assigned MengjinYan and rkooo567 and unassigned MengjinYan Oct 17, 2024
@MengjinYan MengjinYan requested a review from rkooo567 October 18, 2024 16:18
@rkooo567
Copy link
Contributor

can you run some pg related release tests @MengjinYan ?

@rkooo567 rkooo567 added the go add ONLY when ready to merge, run all tests label Oct 24, 2024
@rkooo567
Copy link
Contributor

@MengjinYan there are some failures from core tests. can you take a look? I am in progress with reviewing

@MengjinYan
Copy link
Collaborator Author

@MengjinYan there are some failures from core tests. can you take a look? I am in progress with reviewing

Yes. I took a look and found the bug. I didn't consider the case where multiple PGs coexists on a host and allocated the indexed bundle regardless of the pg_id when the index is not specified. Will submit a fix for that soon.

…ces in the case where the bundle id is not specified

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally lgtm! Some concerns around performance (because I remember the scheduler was sensitive to performance for some large stress tests)

/*for_indexed_resource=*/true);
if (data) {
ResourceID original_resource_id(data->original_resource);
absl::flat_hash_set<ResourceID> &resource_set =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you get data by bracket and if the data doesn't exist, it. creates a new one

Suggested change
absl::flat_hash_set<ResourceID> &resource_set =
pg_indexed_resources_.find(original_resource_id) != pg_indexed_resources_.end() {
it->second.erase(...)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will update.

/*for_wildcard_resource=*/false,
/*for_indexed_resource=*/true);
if (data) {
pg_indexed_resources_[ResourceID(data->original_resource)].emplace(resource_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for this particular case, I'd like to create a default vector when the particular resource id doesn't exist. I'm wondering if square brackets is okay here. As I don't have a lot of context on c++ best practices so please let me know what's the best pattern.

@@ -43,6 +45,21 @@ bool NodeResourceInstanceSet::Has(ResourceID resource_id) const {

void NodeResourceInstanceSet::Remove(ResourceID resource_id) {
resources_.erase(resource_id);

// Remove from the pg_indexed_resources_ as well
auto data = ParsePgFormattedResource(resource_id.Binary(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this will add significant overhead to the scheduler (because parsing string is so much more expensive). is there any way to do this only when we know resource id is pg?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. And I checked the code. The Remove function is only called in the DeleteLocalResource function in LocalResourceManager when returning the pg bundle. The overhead added shouldn't impact the schedule of the task.

At the same time, the concerns on the Set and TryAllocate are valid. One potential idea I can think of is adding a boolean in the ResourceID to indicate whether it is a pg or not. Will further explore the idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overhead will be smaller if PG resources starts with a prefix so we can check using startswith instead of regex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense!

At same time, I think the downside is: we need to make sure any custom resources won't start with the same prefix and we still need to do the parsing for all the placement group allocation.

Also, as the release didn't show noticeable performance regression, the code can be checked in as it is. I'll add the idea to the TODO for the future improvement.

@@ -69,6 +86,14 @@ NodeResourceInstanceSet &NodeResourceInstanceSet::Set(ResourceID resource_id,
resources_.erase(resource_id);
} else {
resources_[resource_id] = std::move(instances);

// Popluate the pg_indexed_resources_map_
auto data = ParsePgFormattedResource(resource_id.Binary(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

@@ -87,6 +87,12 @@ class NodeResourceInstanceSet {
return resources_;
}

/// Only for testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really used? I couldn't find it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. It is not used. I was to follow the above function for resources_.

At the same time, I think it makes sense to remove the function and add it when it is needed in the test

src/ray/common/scheduling/resource_instance_set.cc Outdated Show resolved Hide resolved
src/ray/common/scheduling/resource_instance_set.cc Outdated Show resolved Hide resolved
allocations[resource_id] = std::move(*allocation);
auto data = ParsePgFormattedResource(resource_id.Binary(),
/*for_wildcard_resource*/ true,
/*for_indexed_resource*/ true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this supposed to be false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to find all pg resources including both wildcard resources and indexed resources.

And my understanding is that we need to set both "for_wildcard_resource" and "for_indexed_resource" to be true to find the resource ids for both.

So here I think we should still set it to be true.

return std::nullopt;
}
} else {
// The case where the bundle index is specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there no case where the demands only include indexed resources, but not wildcard resources? I assume no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(feel like the logic assumes this to be correct. maybe add it to the docstring?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, here:

// Note that all nodes that have placement group have a special
it will always include the wildcard resource.

At the same time, I'll add the assumption to the comment as well.

auto allocation = TryAllocate(pair.first, resource_demands.Get(pair.first));

if (allocation) {
wildcard_allocation = *allocation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe do this only once to avoid repetitive copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Currently, the same allocation will used for both wildcard resource and indexed resource.

Instead of doing a copy here and 2 moves for both wildcard and index resource, I think I can be a copy for the indexed resource and a move for the wildcard resource,

MengjinYan and others added 6 commits October 28, 2024 13:17
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Co-authored-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
@MengjinYan
Copy link
Collaborator Author

Generally lgtm! Some concerns around performance (because I remember the scheduler was sensitive to performance for some large stress tests)

Discussed offline and ran the "many tasks" benchmark test.

From the benchmark test result, we didn't see noticeable performance regression compare to results from tests running on the master branch. At the same time, the current idea that we have for the improvement will need non-trivial effort and would be worth a separate effort for that.

I'll also run the stress_test_many_tasks tests to further verify the latency. And if there still noticeable regression, the code should e able to check in without further performance improvements regarding the string parsing.

At the same time, I also added "TODO" comments at the place where we do the parsing with the improvement idea.

@MengjinYan
Copy link
Collaborator Author

Generally lgtm! Some concerns around performance (because I remember the scheduler was sensitive to performance for some large stress tests)

Discussed offline and ran the "many tasks" benchmark test.

From the benchmark test result, we didn't see noticeable performance regression compare to results from tests running on the master branch. At the same time, the current idea that we have for the improvement will need non-trivial effort and would be worth a separate effort for that.

I'll also run the stress_test_many_tasks tests to further verify the latency. And if there still noticeable regression, the code should e able to check in without further performance improvements regarding the string parsing.

At the same time, I also added "TODO" comments at the place where we do the parsing with the improvement idea.

Performance benchmark result from stress_test_many_tasks:

[{"perf_metric_name":"stage_0_time","perf_metric_value":8.555947065353394,"perf_metric_type":"LATENCY"},
  {"perf_metric_name":"stage_1_avg_iteration_time","perf_metric_value":27.85337915420532,"perf_metric_type":"LATENCY"},
  {"perf_metric_name":"stage_2_avg_iteration_time","perf_metric_value":68.38944630622863,"perf_metric_type":"LATENCY"},
  {"perf_metric_name":"stage_3_creation_time","perf_metric_value":1.8722784519195557,"perf_metric_type":"LATENCY"},
  {"perf_metric_name":"stage_3_time","perf_metric_value":3337.7971575260162,"perf_metric_type":"LATENCY"},
  {"perf_metric_name":"stage_4_spread","perf_metric_value":0.35827938592076564,"perf_metric_type":"LATENCY"}]

Compare to the current result and there is no noticeable regression:
stress-test-many-tasks-1-2024-10-30T23-05-57 210Z
stress-test-many-tasks-2-2024-10-30T23-05-51 810Z
stress-test-many-tasks-3-2024-10-30T23-05-59 861Z

@@ -95,15 +95,15 @@ std::string BundleSpecification::DebugString() const {
}

std::string FormatPlacementGroupResource(const std::string &original_resource_name,
const PlacementGroupID &group_id,
const std::string &group_id_str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_id_hex?

/// [original_resource_name]_group_[bundle_index]_[group_id_str]
/// \return The corresponding formatted placement group resource string.
std::string FormatPlacementGroupResource(const std::string &original_resource_name,
const std::string &group_id_str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_id_hex?

src/ray/common/scheduling/resource_instance_set.h Outdated Show resolved Hide resolved
@@ -43,6 +45,21 @@ bool NodeResourceInstanceSet::Has(ResourceID resource_id) const {

void NodeResourceInstanceSet::Remove(ResourceID resource_id) {
resources_.erase(resource_id);

// Remove from the pg_indexed_resources_ as well
auto data = ParsePgFormattedResource(resource_id.Binary(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overhead will be smaller if PG resources starts with a prefix so we can check using startswith instead of regex.

MengjinYan and others added 4 commits October 31, 2024 17:23
Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
@rkooo567
Copy link
Contributor

rkooo567 commented Nov 7, 2024

premerge failures!

@jjyao
Copy link
Collaborator

jjyao commented Nov 7, 2024

Test failures are unrelated rllib tests

@jjyao jjyao merged commit 8deca45 into master Nov 7, 2024
4 of 5 checks passed
@jjyao jjyao deleted the issue-29811 branch November 7, 2024 17:10
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
… Consistent (ray-project#48088)

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
… Consistent (ray-project#48088)

Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] GPU placement group doesn't honor the bundle index
3 participants