Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: kill container after calling eviction api success #1759

Merged
merged 5 commits into from
Jun 3, 2024

Conversation

j4ckstraw
Copy link
Contributor

@j4ckstraw j4ckstraw commented Nov 27, 2023

Ⅰ. Describe what this PR does

fix #1758

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Copy link

codecov bot commented Nov 27, 2023

Codecov Report

Attention: Patch coverage is 73.17073% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 68.56%. Comparing base (eed98fa) to head (423e5b2).
Report is 15 commits behind head on main.

Files Patch % Lines
...let/qosmanager/plugins/memoryevict/memory_evict.go 52.63% 9 Missing ⚠️
.../koordlet/qosmanager/plugins/cpuevict/cpu_evict.go 87.50% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1759      +/-   ##
==========================================
- Coverage   68.56%   68.56%   -0.01%     
==========================================
  Files         430      430              
  Lines       39383    39408      +25     
==========================================
+ Hits        27004    27020      +16     
- Misses      10043    10048       +5     
- Partials     2336     2340       +4     
Flag Coverage Δ
unittests 68.56% <73.17%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@j4ckstraw j4ckstraw force-pushed the fix-evict-logic branch 2 times, most recently from 3651941 to 709eb82 Compare November 27, 2023 07:18
@saintube saintube changed the title fix: kill container after calling eviction api success koordlet: kill container after calling eviction api success Nov 27, 2023
@zwzhang0107
Copy link
Contributor

zwzhang0107 commented Dec 6, 2023

memory evict is just like oom killer, which dose not restricted by pdb.
by default, memory evict should always kill containers even the eviction is failed for rpc timeout for example.
so it is better to add flag to control the default logic.

@j4ckstraw
Copy link
Contributor Author

memory evict is just like oom killer, which dose not restricted by pdb. by default, memory evict should always kill containers even the eviction is failed for rpc timeout for example. so it is better to add flag to control the default logic.

How about add a koordlet feature-gate named DoNotEvictPodIfCallEvictionAPIFailed, maybe it's too long, WDYT?

@j4ckstraw j4ckstraw force-pushed the fix-evict-logic branch 2 times, most recently from d42f54e to c060754 Compare December 6, 2023 12:22
@zwzhang0107
Copy link
Contributor

zwzhang0107 commented Dec 11, 2023

memory evict is just like oom killer, which dose not restricted by pdb. by default, memory evict should always kill containers even the eviction is failed for rpc timeout for example. so it is better to add flag to control the default logic.

How about add a koordlet feature-gate named DoNotEvictPodIfCallEvictionAPIFailed, maybe it's too long, WDYT?

feature-gate means new features, which will be iterated from alpha(default=false), beta(default=true), and ga.
so an argument named --only-evict-by-api(default=false) is better, and only call eviction api when enabled.

also, please add a TODO comment for supporting fine-grained eviction arguments just like kubelet

@j4ckstraw
Copy link
Contributor Author

j4ckstraw commented Dec 12, 2023

memory evict is just like oom killer, which dose not restricted by pdb. by default, memory evict should always kill containers even the eviction is failed for rpc timeout for example. so it is better to add flag to control the default logic.

How about add a koordlet feature-gate named DoNotEvictPodIfCallEvictionAPIFailed, maybe it's too long, WDYT?

feature-gate means new features, which will be iterated from alpha(default=false), beta(default=true), and ga. so an argument named --only-evict-by-api(default=false) is better, and only call eviction api when enabled.

also, please add a TODO comment for supporting fine-grained eviction arguments just like kubelet

Maybe you misunderstood me, I want to kill container on when calling eviction api returns ok, not eviction only by API. @zwzhang0107

@zwzhang0107
Copy link
Contributor

want

Actually, after the eviction api returns ok, there is no need to kill container if you want to solve the grace termination or PDB problem.
It seems that defines a self-defined evictor args(--evictor="default/only-eviction/...", which means --only-evict-by-api) can solve your concerns?

@j4ckstraw
Copy link
Contributor Author

want

Actually, after the eviction api returns ok, there is no need to kill container if you want to solve the grace termination or PDB problem. It seems that defines a self-defined evictor args(--evictor="default/only-eviction/...", which means --only-evict-by-api) can solve your concerns?

Got you, I will file a new patch later.

@j4ckstraw
Copy link
Contributor Author

@zwzhang0107 PTAL

Copy link

stale bot commented May 5, 2024

This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close
    Thank you for your contributions.

@stale stale bot added the lifecycle/stale label May 5, 2024
@zwzhang0107
Copy link
Contributor

/lgtm

@jasonliu747
Copy link
Member

@j4ckstraw please solve the code conflicts, and we are good to go.

Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
@j4ckstraw
Copy link
Contributor Author

@zwzhang0107 rebased

@zwzhang0107
Copy link
Contributor

/lgtm
/approve

@koordinator-bot koordinator-bot bot added the lgtm label Jun 3, 2024
@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: zwzhang0107

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 4102bea into koordinator-sh:main Jun 3, 2024
20 checks passed
@j4ckstraw j4ckstraw deleted the fix-evict-logic branch June 3, 2024 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[proposal] call eviction API before kill container
4 participants