Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Lstat in plugin watcher to avoid Windows problem #99463

Merged
merged 1 commit into from
Mar 2, 2021

Conversation

jingxu97
Copy link
Contributor

@jingxu97 jingxu97 commented Feb 25, 2021

This PR tries to use Lstat in plugin watcher due to Windows issue of using os.Stat(). In some Windows tests with SAC 1909 Windows version, we notice that os.Stat() return error for domain socket in Windows.

\var\lib\kubelet\plugins_registry\pd.csi.storage.gke.io-reg.sock: The file cannot be accessed by the system.

But this behavior is not consistent, though.

Change-Id: I4f9b808829f1a56dc622e343c291d3ffc316f416

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


User Lstat in plugin watcher due to Windows issue

Change-Id: I4f9b808829f1a56dc622e343c291d3ffc316f416
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 25, 2021
@k8s-ci-robot
Copy link
Contributor

@jingxu97: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 25, 2021
@k8s-ci-robot k8s-ci-robot requested review from dims and mrunalp February 25, 2021 16:47
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 25, 2021
@ddebroy
Copy link
Member

ddebroy commented Feb 25, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Feb 25, 2021
@jingxu97 jingxu97 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 25, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. label Feb 25, 2021
@jingxu97 jingxu97 added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Feb 25, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Feb 25, 2021
@jingxu97
Copy link
Contributor Author

/retest

@saad-ali
Copy link
Member

saad-ali commented Feb 25, 2021

Lstat returns a FileInfo describing the named file. 
If the file is a symbolic link, the returned FileInfo describes the symbolic link.
***Lstat makes no attempt to follow the link.***
If there is an error, it will be of type *PathError.

Is that desired behavior?

@yujuhong
Copy link
Contributor

@jingxu97 could you describe or add more context about what this PR is fixing? Thanks.

@jingxu97
Copy link
Contributor Author

@jingxu97 could you describe or add more context about what this PR is fixing? Thanks.

I added more context in the commit. I have seen issues mentioned for Stat() in Windows like golang/go#34900
golang/go#29119

But they are not the exact behavior I am seeing here.

@yujuhong
Copy link
Contributor

@jingxu97 I think you edited @saad-ali's comment by mistake...

@saad-ali does Jing's explanation make sense to you?

@jingxu97
Copy link
Contributor Author

jingxu97 commented Feb 26, 2021

Lstat returns a FileInfo describing the named file. 
If the file is a symbolic link, the returned FileInfo describes the symbolic link.
***Lstat makes no attempt to follow the link.***
If there is an error, it will be of type *PathError.

Is that desired behavior?

Here Lstat is used for checking whether the file is a dir or not. If it is not a dir, it will further check whether it is a domain socket (windows and Linux use different ways to check it due to issue in Windows here golang/go#33357)

In this aspect, I think it is safe to use Lstat first to check whether it is a dir or not. A symlink is not considered a dir, then it will future to check whether it is a socket or not (in Linux, it will then use os.Stat() to check it).

@k8s-ci-robot k8s-ci-robot added the sig/windows Categorizes an issue or PR as relevant to SIG Windows. label Mar 2, 2021
@jsturtevant
Copy link
Contributor

We ran into this with the device manager: #97576

@marosset
Copy link
Contributor

marosset commented Mar 2, 2021

/test pull-kubernetes-e2e-aks-engine-azure-windows
/test pull-kubernetes-e2e-aks-engine-windows-contianerd

@marosset
Copy link
Contributor

marosset commented Mar 2, 2021

/lgtm
/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Mar 2, 2021
@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 2, 2021

ping @saad-ali @yujuhong on approving, thanks!

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 2, 2021

/retest

@yujuhong
Copy link
Contributor

yujuhong commented Mar 2, 2021

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jingxu97, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2021
@k8s-ci-robot
Copy link
Contributor

@jingxu97: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-e2e-aks-engine-azure-windows 70e01c5 link /test pull-kubernetes-e2e-aks-engine-azure-windows
pull-kubernetes-e2e-kind 70e01c5 link /test pull-kubernetes-e2e-kind
pull-kubernetes-e2e-gce-ubuntu-containerd 70e01c5 link /test pull-kubernetes-e2e-gce-ubuntu-containerd

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 2, 2021

/retest

@saad-ali
Copy link
Member

saad-ali commented Mar 3, 2021

In this aspect, I think it is safe to use Lstat first to check whether it is a dir or not. A symlink is not considered a dir, then it will future to check whether it is a socket or not (in Linux, it will then use os.Stat() to check it).

Please make sure to add a release note to this PR. User visible change is that symlinks inside plugin directory for CSI and device plugins will no longer be followed.

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 3, 2021

In this aspect, I think it is safe to use Lstat first to check whether it is a dir or not. A symlink is not considered a dir, then it will future to check whether it is a socket or not (in Linux, it will then use os.Stat() to check it).

Please make sure to add a release note to this PR. User visible change is that symlinks inside plugin directory for CSI and device plugins will no longer be followed.

I think no matter with or without this change, if the socket file is a symlink, for Linux, it will follow it. There are two checks for Linux,

In this aspect, I think it is safe to use Lstat first to check whether it is a dir or not. A symlink is not considered a dir, then it will future to check whether it is a socket or not (in Linux, it will then use os.Stat() to check it).

Please make sure to add a release note to this PR. User visible change is that symlinks inside plugin directory for CSI and device plugins will no longer be followed.

Actually with or without this change, the behavior of the code will not change. For linux, it uses os.Stat() twice

  1. The first os.Stat() is to check whether path is the dir or not: (The assumption here is if the path is a dir, it should not be a socket file)

Before fix, if path is a symlink, and target path is a dir, then the code will return immediately
After changing to os.Lstat(), if path is a symlink, it will continue to call IsUnixDomainSocket in side of this function it will call os.Stat() again in step 2

  1. The second os.Stat()
    This check will determine whether file is a socket file or not. We are not changing this os.Stat() here for Linux.
    Windows uses a different way to check domain socket file, since file Mode returned by os.Stat() does not work for Windows.

So the first os.Stat (now it changed to os.Lstat()) should not affect the final result. Actually the first check is not very necessary in my option since the actual check of domain socket is in the second check.

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 3, 2021

@marosset Now we can confirm this os.Stat() is introduced in the new 20H2 Windows image. The older ones are fine. Could you share the link for opening Windows issue? Thanks!

@jsturtevant
Copy link
Contributor

@jingxu97
Copy link
Contributor Author

jingxu97 commented Mar 3, 2021

@jingxu97 https://github.com/microsoft/Windows-Containers

Thanks! I opened an issue there microsoft/Windows-Containers#97

@@ -159,7 +159,7 @@ func (w *Watcher) traversePluginDir(dir string) error {
func (w *Watcher) handleCreateEvent(event fsnotify.Event) error {
klog.V(6).Infof("Handling create event: %v", event)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this breaks plugins that register in the following way:

plugin sets up a dir on disk somewhere:
/myplugin/mysocket.sock

then symlinks its dir into the kubelet's plugin dir:
/var/plugindir/myplugin -> /myplugin

event.Name would be /var/plugindir/myplugin

previously:

  • os.Stat would follow, and get info for /var/plugindir/myplugin
  • fi.IsDir() would be true
  • we would traverse and find the plugin socket on line 185

with this change:

  • os.Lstat does not follow
  • fi.IsDir() is false
  • IsUnixDomainSocket is also false
  • we ignore and return "klog.V(5).Infof("Ignoring non socket file %s", fi.Name())"

If os.Stat has known issues with sockets on windows, I think this would be a better workaround:

fi, err := os.Stat(event.Name)
if err != nil && os.GOOS == "windows" {
  fi, err = os.Lstat(event.Name)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created a PR to address #99723
I missed the traverse part. Thank you for pointing it out!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this logic on Windows and found out in the case path is a symlink points to a dir, os.Stat() will not work for all Windows versions tested including LSTC and SAC

The error is "Access is Denied".
So this use case will not work for Windows before or after this change.

k8s-ci-robot added a commit that referenced this pull request Mar 11, 2021
#99723-upstream-release-1.19

Automated cherry pick of #99463: Use Lstat in plugin watcher to avoid Windows problem #99723:  Fix issue in checking domain socket for plugin watcher
k8s-ci-robot added a commit that referenced this pull request Mar 12, 2021
#99723-upstream-release-1.20

Automated cherry pick of #99463: Use Lstat in plugin watcher to avoid Windows problem #99723:  Fix issue in checking domain socket for plugin watcher
@liggitt liggitt added the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Apr 26, 2022
@liggitt liggitt removed the kind/regression Categorizes issue or PR as related to a regression from a prior release. label Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

8 participants