Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-manager: refactor gpu device resource plugin #1846

Conversation

saintube
Copy link
Member

@saintube saintube commented Jan 16, 2024

Ⅰ. Describe what this PR does

koord-manager: refactor GPU resource calculator as a noderesource plugin.

Ⅱ. Does this pull request fix one issue?

Part of #1834.

It standardizes the workflow of the GPU device resource updating, hence allowing us to disable the plugin if needed.

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

To keep the consistent behavior with the former version, the GPU device reconciliation logic is just refactored without any improvement in this PR. Some resource updating logic should be revisited, e.g.

  • Could the device resources and labels be reset e.g. configured disabled
  • Is the handled resource names in the device enumerable

In addition, the device resource updating should support the NUMA-level calculation in the future.

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (c20d185) 67.08% compared to head (2f32371) 67.26%.
Report is 3 commits behind head on main.

Files Patch % Lines
...r/noderesource/plugins/gpudeviceresource/plugin.go 95.53% 3 Missing and 2 partials ⚠️
...controller/noderesource/noderesource_controller.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1846      +/-   ##
==========================================
+ Coverage   67.08%   67.26%   +0.17%     
==========================================
  Files         407      407              
  Lines       45702    45650      -52     
==========================================
+ Hits        30660    30707      +47     
+ Misses      12820    12731      -89     
+ Partials     2222     2212      -10     
Flag Coverage Δ
unittests 67.26% <94.95%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@saintube saintube force-pushed the koord-manager-refactor-gpu-device-resource-plugin branch from ca8c2d8 to 1300ec8 Compare January 17, 2024 10:11
@koordinator-bot koordinator-bot bot added size/XL and removed size/L labels Jan 17, 2024
@saintube saintube force-pushed the koord-manager-refactor-gpu-device-resource-plugin branch from 1300ec8 to 9b26e92 Compare January 17, 2024 13:23
@saintube saintube changed the title [WIP] koord-manager: refactor gpu device resource plugin koord-manager: refactor gpu device resource plugin Jan 17, 2024
@saintube saintube force-pushed the koord-manager-refactor-gpu-device-resource-plugin branch from 9b26e92 to fd10089 Compare January 18, 2024 03:38
Signed-off-by: saintube <saintube@foxmail.com>
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@zwzhang0107
Copy link
Contributor

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zwzhang0107

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit ad7cb9b into koordinator-sh:main Jan 19, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants