Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add csi-proxy monitoring to health checker #556

Closed

Conversation

mcshooter
Copy link
Contributor

@mcshooter mcshooter commented May 12, 2021

CSI-Proxy is being integrated on windows nodes, running as a windows service. It is used to help CSI Node Plugin interact with the node given that privileged containers are not yet supported on Windows. This change monitors to ensure that csi-proxy is running and available through NPD's health checker.

#461

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 12, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @mcshooter. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mcshooter
To complete the pull request process, please assign dchen1107 after the PR has been reviewed.
You can assign the PR to them by writing /assign @dchen1107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 12, 2021
@mcshooter
Copy link
Contributor Author

/cc @jeremyje

@k8s-ci-robot k8s-ci-robot requested a review from jeremyje May 12, 2021 22:37
@mcshooter mcshooter force-pushed the addCsiProxyProblemDetection branch from dd22a19 to e2e1626 Compare May 12, 2021 23:06
@@ -85,6 +85,13 @@ func getHealthCheckFunc(hco *options.HealthCheckerOptions) func() (bool, error)
}
return true, nil
}
case types.CsiProxyComponent:
return func() (bool, error) {
if _, err := powershell("Get-Process", types.CsiProxyComponent); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does csi-proxy host an HTTP endpoint? I'd figure it would.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been looking for it as well, kind of like what kube-proxy has. But I haven't found anything like it. For now, I am using this to get some support in for csi-proxy. If I find it, I'll update it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using gRPC via named pipes in Windows. https://github.com/kubernetes-csi/csi-proxy/blob/9e1e33da998e4c1e3c7c4c00ceae64cebca0c258/internal/server/server.go#L73-L98 You'll need to import those client libraries and connect to the csi proxy endpoint.

Copy link
Member

@mauriciopoppe mauriciopoppe May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of named pipes can be obtained with: [System.IO.Directory]::GetFiles("\\.\\pipe\\"), in my dev env it shows as:

\\.\\pipe\\csi-proxy-filesystem-v1alpha1
\\.\\pipe\\csi-proxy-filesystem-v1beta1
\\.\\pipe\\csi-proxy-filesystem-v1beta2
\\.\\pipe\\csi-proxy-disk-v1alpha1
\\.\\pipe\\csi-proxy-disk-v1beta1
\\.\\pipe\\csi-proxy-disk-v1beta2
\\.\\pipe\\csi-proxy-disk-v1beta3
\\.\\pipe\\csi-proxy-volume-v1alpha1
\\.\\pipe\\csi-proxy-volume-v1beta1
\\.\\pipe\\csi-proxy-volume-v1beta2
\\.\\pipe\\csi-proxy-volume-v1beta3
\\.\\pipe\\csi-proxy-smb-v1alpha1
\\.\\pipe\\csi-proxy-smb-v1beta1
\\.\\pipe\\csi-proxy-smb-v1beta2
\\.\\pipe\\csi-proxy-system-v1alpha1
\\.\\pipe\\csi-proxy-iscsi-v1alpha1
\\.\\pipe\\csi-proxy-iscsi-v1alpha2

Unfortunately there's no csi-proxy-healthz named pipe or similar, there are only versioned APIs for different API groups.

Checking that the process exists should be good enough as csi-proxy will crash on an error and get restarted by the window service or if a new endpoint is needed we can add something like \\.\\pipe\\csi-proxy-healthz, if we decide to go with this route you'd need to import the csi-proxy client library as @jeremyje says.

cc @jingxu97

@jeremyje
Copy link
Contributor

/sig windows
/sig node
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/windows Categorizes an issue or PR as relevant to SIG Windows. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 12, 2021
@mcshooter
Copy link
Contributor Author

/retest

@mcshooter mcshooter force-pushed the addCsiProxyProblemDetection branch from e2e1626 to fffd841 Compare May 12, 2021 23:34
@mcshooter
Copy link
Contributor Author

/retest

1 similar comment
@mcshooter
Copy link
Contributor Author

/retest

@@ -85,6 +85,13 @@ func getHealthCheckFunc(hco *options.HealthCheckerOptions) func() (bool, error)
}
return true, nil
}
case types.CsiProxyComponent:
return func() (bool, error) {
if _, err := powershell("Get-Process", types.CsiProxyComponent); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just check whether the process exists, wouldn't windows service already handled it?

Why do we need this health checker now then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently looking into a way to connect to csi-proxy endpoint for NPD Health Checker to use to test if it is up and running. In the meantime, this was used to test it.

@mcshooter mcshooter closed this Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants