Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koord-scheduler: skip checking for nodes whose nodemetrics update time has expired #1563

Merged
merged 1 commit into from
Aug 22, 2023

Conversation

lucming
Copy link
Contributor

@lucming lucming commented Aug 21, 2023

Ⅰ. Describe what this PR does

In some cases, nodemetric is not updated for a long time, eg: koordlet be deleted. which will result in nodemetric not being able to report, and if the latest load in nodemetric is very high, it will result in the node not being able to run any pods, even if the actual load is very low

At this point, the loadaware scheduling plugin's calculations should not be counted in the final scheduling result, because the plugin strongly relies on nodemetric

in filter phase: do not handle these nodes that nodemetric have not been updated for a long time
in score phase: score 0 and return directly

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@codecov
Copy link

codecov bot commented Aug 21, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.16% 🎉

Comparison is base (4682194) 65.04% compared to head (152613d) 65.20%.
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1563      +/-   ##
==========================================
+ Coverage   65.04%   65.20%   +0.16%     
==========================================
  Files         352      352              
  Lines       36253    36326      +73     
==========================================
+ Hits        23580    23686     +106     
+ Misses      10931    10899      -32     
+ Partials     1742     1741       -1     
Flag Coverage Δ
unittests 65.20% <100.00%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/scheduler/plugins/loadaware/load_aware.go 69.26% <100.00%> (ø)

... and 10 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eahydra eahydra changed the title koord-scheduler: do nothing in loadaware scheduler plugin when nodemetric can't be updated for a long time koord-scheduler: skip checking for nodes whose nodemetrics update time has expired Aug 22, 2023
…tric is not updated for a long time

Signed-off-by: liuming6 <liuming6@360.cn>
@lucming lucming force-pushed the fix-loadaware-sched branch from f76a7ee to 152613d Compare August 22, 2023 08:23
@koordinator-bot koordinator-bot bot added size/M and removed size/S labels Aug 22, 2023
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit bb9907d into koordinator-sh:main Aug 22, 2023
Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

j4ckstraw pushed a commit to j4ckstraw/koordinator that referenced this pull request May 20, 2024
…e has expired (koordinator-sh#1563)

Signed-off-by: liuming6 <liuming6@360.cn>
Co-authored-by: liuming6 <liuming6@360.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants