-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High cpu usage of powershell processes triggered by csi-proxy #193
Comments
/reopen |
@divyenpatel: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@pradeep-hegde change is partially fixing the issue and not getting much-needed performance improvement. |
@wdazhi2020: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@dazhiw: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@divyenpatel did we compare with intree Windows? It is known that Windows consumes more resources than Linux. The question is do we think there's a regression between intree vs csi Windows? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
@alexander-ding please post the results of your experiments here |
The experiments I ran were not related to resource usage unfortunately. I was measuring latency and throughput for CSI Proxy. The resource usage related experiments were on Linux nodes. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
Circling this back.. tracked from a lot of calls to
Dumping the stacktrace it shows a very odd call to the same function, what is very different from the normal behavior.
|
for VolumeStats, we could use |
The real answer to this is to stop using PowerShell, and use Win32 APIs where possible, falling back to WMI invocation. I have POC code for some of this that could be ported. Will take a while to complete, but happy to start taking this on? |
Moving from powershell to win32 API would be a nice change at the expense of making the code harder to read e.g. Line 190 in 07be14d
For the current issue, @knabben did a great analysis in #193 (comment) about the current implementation, and also opened a PR fixing it in #336 but we haven't been able to merge it because presubmit tests are broken because of a Windows module that was disabled in Github Actions. I was trying to fix it in #348 but didn't have luck. For the next steps, the plan of action was to:
Changes would be blocked by the presubmit error, we should try to fix the SMB tests first. |
If we're not keen on using Win32 API directly because of readability, then we should strongly consider using the MSFT_Disk & Win32_DiskPartition WMI classes, which is what PowerShell is wrapping, but without the .NET overhead. |
You may check how it would be like to switch to use WMI classes from this WIP branch. |
What happened:
We are doing some performance tests for vsphere-csi-driver (https://github.com/kubernetes-sigs/vsphere-csi-driver) on k8s cluster with 20 windows worker nodes. Each node has 4 CPUs and 16GB of memory. k8s version is 1.22.3, windows version is Windows Server 2019. the vsphere csi node plugin uses csi-proxy for privileged operations on host devices. The test repeatedly issues CreatePod followed by DeletePod. Each Pod just starts one container with image "mcr.microsoft.com/oss/kubernetes/pause:1.4.1", and mounts one persistent volume.
During the test we observed high cpu usage on the windows worker nodes, and most of the cpu usage came from the powershell processes triggered by csi-proxy. With 1.4 Pod creations per second, the average cpu usage on each node is about 75% of the total cpu capacity, the CPU cost of powershell processes triggered by csi-proxy is about 45% of the total CPU capacity; With 2.4 Pod creations per second, the cpu usage on each node reached almost 100%, and the CPU cost of powershell processes triggered by csi-proxy is about 68%. As a comparison, for vsphere-csi-driver on Linux nodes, with 3 Pod creations per second the cpu usage on each node is constantly below 10%.
What you expected to happen:
Reduce the cpu cost of csi-proxy.
How to reproduce it:
deploy vsphere-csi-driver on k8s cluster with windows nodes, and create Pod with PV mount, then delete Pod.
Anything else we need to know?:
Environment:
kubectl version
): 1.22.3uname -a
):The text was updated successfully, but these errors were encountered: