Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add ut for performance test #3260

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bysph
Copy link

@bysph bysph commented Dec 11, 2023

This MR aims to add a simple performance UT for Volcano.

Two reasons for supplementing UT:

  1. There are some questions about performance data, such as volcano性能验证 #3203.
  2. Performance-oriented UT helps Volcano make optimizations.

Run this case using the following commands:

cd pkg/scheduler
go test -v --run=TestRunOnc

I have run this case under default configuration without optimization in a machine (56C 256G), and get this result:
image

I also added a "latencyrecorder" locally to track the execution time of various plugins through instrumentation. However, it's not elegant because it is implemented through code instrumentation, so I haven't submitted it yet.

Can anyone take a look first and give some suggestions? thx~

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign thor-wl
You can assign the PR to them by writing /assign @thor-wl in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Dec 11, 2023
@bysph
Copy link
Author

bysph commented Dec 11, 2023

@lowang-bh Could you please spare some time to review my code? thx~

@lowang-bh
Copy link
Member

Thanks for your hard works. Can it cover all those schedule logic in the real scheduler?
My idea is to use k8s fakeclient to submit jobs/nodes/pods and then triger the scheduler run.

@bysph
Copy link
Author

bysph commented Dec 12, 2023

Thanks for your hard works. Can it cover all those schedule logic in the real scheduler? My idea is to use k8s fakeclient to submit jobs/nodes/pods and then triger the scheduler run.

I think it covers most of the schedule logic, except for the part that calls the k8s API(If there are any errors in my statement, please correct me, as I'm still new to volcano.). According to my understanding, the cache is used as middle layer between k8s and Volcano scheduling logic, so I chose to add a mock cache.

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

@lowang-bh
Copy link
Member

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

I think we can keep this pr also.
@wangyang0616 @Monokaix

@bysph
Copy link
Author

bysph commented Dec 12, 2023

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

I think we can keep this pr also. @wangyang0616 @Monokaix

Thanks~ hope I can help

Signed-off-by: sph <shepenghui@corp.netease.com>
@Monokaix
Copy link
Member

Monokaix commented Dec 14, 2023

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

I think we can keep this pr also. @wangyang0616 @Monokaix

I think there are two process that we should concern, the first one is schedule logic in every session, this mainly consists of predict and nodeOrder plugin, the second one is the kube-apiserver throughtput in real cluster, so your test can only test the first part and in my opion, we have no necessary to mock a cache becauese cache is composed of many interfaces, we just need mock these interfaces such as kube client : )

Also, kube-apiserver throughput has a great influence on scheduler performance, so we abstract a bind interface to enable user register their own bind method like bind a lots of pods in one bind request, so you can also work on this to achieve a higher performance.

@bysph
Copy link
Author

bysph commented Dec 14, 2023

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.
I think we can keep this pr also. @wangyang0616 @Monokaix

I think there are two process that we should concern, the first one is schedule logic in every session, this mainly consists of predict and nodeOrder plugin, the second one is the kube-apiserver throughtput in real cluster, so your test can only test the first part and in my opion, we have no necessary to mock a cache becauese cache is composed of many interfaces, we just need mock these interfaces such as kube client : )

Also, kube-apiserver throughput has a great influence on scheduler performance, so we abstract a bind interface to enable user register their own bind method like bind a lots of pods in one bind request, so you can also work on this to achieve a higher performance.

@Monokaix Thanks for your review~ You are absolutely right that this unit test does not cover the batch binding part, and indeed, the implementation of the mock kubeclient unit test should be able to fully replace it.

However, with this unit test, Volcano users can now assess the performance of the scheduler's core module and carry out optimization work. If the implementation of the mock kubeclient unit test cannot be completed in the short term (which might involve some code modifications or refactoring), would you be open to considering this unit test in the interim?

Furthermore, the bottleneck in the second part is not only in the binding but also in the performance of Kubernetes, so it may not be entirely solvable with unit testing but might require end-to-end testing.

@Monokaix
Copy link
Member

Monokaix commented Dec 14, 2023

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.
I think we can keep this pr also. @wangyang0616 @Monokaix

I think there are two process that we should concern, the first one is schedule logic in every session, this mainly consists of predict and nodeOrder plugin, the second one is the kube-apiserver throughtput in real cluster, so your test can only test the first part and in my opion, we have no necessary to mock a cache becauese cache is composed of many interfaces, we just need mock these interfaces such as kube client : )
Also, kube-apiserver throughput has a great influence on scheduler performance, so we abstract a bind interface to enable user register their own bind method like bind a lots of pods in one bind request, so you can also work on this to achieve a higher performance.

@Monokaix Thanks for your review~ You are absolutely right that this unit test does not cover the batch binding part, and indeed, the implementation of the mock kubeclient unit test should be able to fully replace it.

However, with this unit test, Volcano users can now assess the performance of the scheduler's core module and carry out optimization work. If the implementation of the mock kubeclient unit test cannot be completed in the short term (which might involve some code modifications or refactoring), would you be open to considering this unit test in the interim?

Furthermore, the bottleneck in the second part is not only in the binding but also in the performance of Kubernetes, so it may not be entirely solvable with unit testing but might require end-to-end testing.

Yaeh, you can work on this ut and you are welcome, but can we not mock the whole cache and just use fake interface when we need mock. And if this feasible, you can have a try, if it's a little hard, you can try to wrap a Cache and overwrite the methods that hard to mock : )

@bysph
Copy link
Author

bysph commented Dec 14, 2023

The reason I don't directly construct a real cache through a fake kube client is because certain parts of the cache implementation use kubeconfig to create a new client (such as vcclient). These scenarios are difficult to use a fake kube client. Do you have any good ideas on how to handle this?

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.
I think we can keep this pr also. @wangyang0616 @Monokaix

I think there are two process that we should concern, the first one is schedule logic in every session, this mainly consists of predict and nodeOrder plugin, the second one is the kube-apiserver throughtput in real cluster, so your test can only test the first part and in my opion, we have no necessary to mock a cache becauese cache is composed of many interfaces, we just need mock these interfaces such as kube client : )
Also, kube-apiserver throughput has a great influence on scheduler performance, so we abstract a bind interface to enable user register their own bind method like bind a lots of pods in one bind request, so you can also work on this to achieve a higher performance.

@Monokaix Thanks for your review~ You are absolutely right that this unit test does not cover the batch binding part, and indeed, the implementation of the mock kubeclient unit test should be able to fully replace it.
However, with this unit test, Volcano users can now assess the performance of the scheduler's core module and carry out optimization work. If the implementation of the mock kubeclient unit test cannot be completed in the short term (which might involve some code modifications or refactoring), would you be open to considering this unit test in the interim?
Furthermore, the bottleneck in the second part is not only in the binding but also in the performance of Kubernetes, so it may not be entirely solvable with unit testing but might require end-to-end testing.

Yaeh, you can work on this ut and you are welcome, but can we not mock the whole cache and just use fake interface when we need mock. And if this feasible, you can have a try, if it's a little hard, you can try to wrap a Cache and overwrite the methods that hard to mock : )

OK,I will fix that later.

@lowang-bh
Copy link
Member

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

Hi, @bysph , I mocked the SchedulerCache in PR add mock cache for ut and performance test #3269. You can continue your work on this feature, maybe.

@bysph
Copy link
Author

bysph commented Dec 19, 2023

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

Hi, @bysph , I mocked the SchedulerCache in PR add mock cache for ut and performance test #3269. You can continue your work on this feature, maybe.

Okay, I will submit a UT based on this. Should I wait for the code to be merged before submitting?

@Monokaix
Copy link
Member

Monokaix commented Jan 4, 2024

Yes, I have some plan to do it. I will write a demo and then we can work together on that feature.

Hi, @bysph , I mocked the SchedulerCache in PR add mock cache for ut and performance test #3269. You can continue your work on this feature, maybe.

Okay, I will submit a UT based on this. Should I wait for the code to be merged before submitting?

Yes,you can try it after #3269 merged.

Copy link

stale bot commented Mar 17, 2024

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants