Extend NodeUnpublish test to verify cleanup of TargetPath (#336) #338

dobsonj · 2021-04-28T15:59:52Z

What type of PR is this?
/kind bug

What this PR does / why we need it:
kubernetes/kubernetes#101441 deprecates removal of the CSI NodePublish target_path by the kubelet. This must be done by the CSI plugin according to the CSI spec.
This PR adds a sanity test to verify that the CSI plugin removes target_path as part of NodeUnpublish.

Which issue(s) this PR fixes:

Fixes #336

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Yes

Removal of the CSI nodepublish path by the kubelet is deprecated. This must be done by the CSI plugin according to the CSI spec.

k8s-ci-robot · 2021-04-28T16:00:00Z

Welcome @dobsonj!

It looks like this is your first PR to kubernetes-csi/csi-test 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-test has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-04-28T16:00:00Z

Hi @dobsonj. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pohly · 2021-04-28T16:22:27Z

pkg/sanity/node.go

+
+			// NodeGetVolumeStats is expected to return NotFound
+			// if the TargetPath is removed by NodeUnpublishVolume
+			By("Checking NodeGetVolumeStats returns NotFound error")


This seems to make assumptions about how NodeGetVolumeStats is implemented. It could return an error because it tracks volume state and knows that NodeUnpublishVolume was called even when that function didn't remove the target path.

What you really need is to check on the node whether the directory is gone. Perhaps you can use CreateTargetDir (

csi-test/pkg/sanity/sanity.go

Line 124 in a251c44

CreateTargetDir func(path string) (string, error)

) or whatever is configured (

csi-test/pkg/sanity/sanity.go

Line 143 in a251c44

CreateTargetPathCmd string

). At least the internal implementation (

csi-test/pkg/sanity/sanity.go

Lines 362 to 367 in a251c44

// Create the target path. Only the directory itself

// and not its parents get created, and it is an error

// if the directory already exists.

if err := os.Mkdir(targetPath, 0755); err != nil {

return "", err

}

) fails when the directory exists.

But that looks like a hack. A cleaner solution is to introduce a CheckPath feature (with a configurable callback and command) which returns "file", "directory", "other" and "not_found", or an error, and then use that here.

Thanks for the feedback @pohly, this was very helpful.
I implemented the CheckPath function, but with a couple of minor differences from what you described.

I used return codes instead of strings, but it does return the 4 conditions you described. The test in node.go only checks for found or not found though. Since the spec states "file or directory" I did not want to make assumptions about the file type in the sanity test.

I only did the configurable callback, not the command string that can override that behavior like in createMountTargetLocation() and removeMountTargetLocation(). This simplified the implementation, and I question whether the command string would be as useful in this context as it is for create / remove commands. Translating those 4 conditions into error codes (or strings) would add some complexity to the command. If the command string supports an essential use case though, I can add in.

The command string is essential for those users who invoke the csi-sanity command to run tests. They can't extend that binary with Go callbacks.

Thanks for elaborating on this, I added the command string option.
I tested it with the following check_path.sh bash script:

#!/bin/bash if [ -f "$1" ]; then echo "1" # file elif [ -d "$1" ]; then echo "2" # dir elif [ ! -e "$1" ]; then echo "3" # not found else echo "4" # other fi

And that passes against the mock driver using the following args:

$ csi-sanity -csi.checkpathcmd ./check_path.sh -csi.endpoint /tmp/csi.sock

Why use numbers as output? I find strings as in my original proposal easier to use because there's no chance that someone, say, returns 3 when they meant 4. The same strings can be string constants in Go for the Go callback, so it could be even made type safe there.

You're right, strings are a better approach here. When I added the Atoi call I started to question if that was really the best option. Updated.

pohly · 2021-04-30T12:36:35Z

pkg/sanity/sanity.go

+			return "", fmt.Errorf("check path command %s failed: %v", config.CheckPathCmd, err)
+		}
+		// Convert the command's stdout to an integer. This is expected to match
+		// the value for PathIsFile, PathIsDir, PathIsNotFound, or PathIsOther.


This comment is out-dated, right?

Yes, good catch.

pohly · 2021-04-30T12:37:40Z

pkg/sanity/sanity.go

@@ -449,3 +460,68 @@ func PseudoUUID() string {
 func UniqueString(prefix string) string {
 	return prefix + uniqueSuffix
 }
+
+// Return codes for CheckPath


Please introduce a type (like type PathKind string) and use that to make the code more type safe.

pohly · 2021-04-30T12:38:38Z

pkg/sanity/sanity.go

+		}
+		// Convert the command's stdout to an integer. This is expected to match
+		// the value for PathIsFile, PathIsDir, PathIsNotFound, or PathIsOther.
+		outstr := strings.TrimSpace(string(out))


Here it would be good to check that the script has return a valid string. If not, raise an error.

pohly · 2021-04-30T19:16:34Z

pkg/sanity/sanity.go

+	default:
+		err = fmt.Errorf("invalid PathType: %s", pk)
+	}
+	return err


This seems non-idiomatic. A shorter version is:

func (pk PathKind) validate() error { switch pk { case PathIsFile, PathIsDir, PathIsNotFound, PathIsOther: return nil default: return fmt.Errorf("invalid PathType: %s", pk) } }

I also find it odd to first cast a string and then call validate. It works of course, but a func IsPathKind (in string) (PathKind, error) makes the intent clearer.

Makes sense, done. Appreciate your help with this.

pohly

Please squash your commits.

Except for some nits it looks okay, but I want to try it out before merging. I should get to that on Monday.

pohly · 2021-05-03T09:25:46Z

cmd/csi-sanity/main.go

@@ -81,6 +81,8 @@ func main() {
 	stringVar(&config.RemoveTargetPathCmd, "removemountpathcmd", "Command to run for target path removal")
 	stringVar(&config.RemoveStagingPathCmd, "removestagingpathcmd", "Command to run for staging path removal")
 	durationVar(&config.RemovePathCmdTimeout, "removepathcmdtimeout", "Timeout for the commands to remove target and staging paths, in seconds")
+	stringVar(&config.CheckPathCmd, "checkpathcmd", "Command to run to check a given path")


It's a bit hard for users of csi-sanity to find out what that command needs to do. Right now they have to dig into the source code to find the comment in pkg/sanity/sanity.go.

Perhaps add It must print "file", "directory", "not_found", "other" on stdout.?

pohly · 2021-05-03T09:46:55Z

pkg/sanity/node.go

@@ -449,6 +449,51 @@ var _ = DescribeSanity("Node Service", func(sc *TestContext) {
 			Expect(ok).To(BeTrue())
 			Expect(serverError.Code()).To(Equal(codes.InvalidArgument), "unexpected error: %s", serverError.Message())
 		})
+
+		It("should fail if target path was not removed by driver", func() {


Please replace with "should remove target path". The full test name then becomes `NodeUnpublishVolume should remove target path", which is a valid statement about the behavior of the driver.

pohly

Almost done. I tried this out in PMEM-CSI (intel/pmem-csi#948) and it worked as expected.

gnufied · 2021-05-03T14:27:30Z

/ok-to-test

gnufied · 2021-05-03T15:23:43Z

Once this merges - we should send out an email to CSI announce list on - https://github.com/container-storage-interface/community so as CSI driver authors can fix their drivers.

pohly · 2021-05-03T18:10:09Z

Once this merges - we should send out an email to CSI announce list on - https://github.com/container-storage-interface/community so as CSI driver authors can fix their drivers.

That change of behavior in Kubernetes was already announced in https://groups.google.com/g/container-storage-interface-drivers-announce/c/GgtP9kiv5Qc/m/CO2yyOJvAAAJ

That announcement did not specifically talk about deleting the directory, though, because it follows from the spec. Pointing out the change of behavior of the csi-sanity suite is still worthwhile to avoid surprises.

pohly · 2021-05-03T18:12:16Z

Is the mock driver not creating the target path? That might have to be added for pull-kubernetes-csi-csi-test to pass.

pohly · 2021-05-03T18:13:29Z

That the test checks the path when it should exist is useful because it informs users of csi-sanity when their check path configuration isn't working. Please don't remove that.

dobsonj · 2021-05-03T18:15:04Z

Is the mock driver not creating the target path? That might have to be added for pull-kubernetes-csi-csi-test to pass.

The mock driver works okay, but pull-kubernetes-csi-csi-test uses hostpath. I'm looking into that.

pohly · 2021-05-03T18:19:53Z

Is the mock driver not creating the target path? That might have to be added for pull-kubernetes-csi-csi-test to pass.

The mock driver works okay, but pull-kubernetes-csi-csi-test uses hostpath. I'm looking into that.

Then this might be exactly the issue that I mentioned: the check path feature must be configured so that it checks the path where hostpath is deployed, not where the csi-sanity test runs.

The test assertion failure could be enhanced. I'll leave a comment on the code line.

pohly · 2021-05-03T18:23:15Z

pkg/sanity/node.go

+			By("Checking the target path exists")
+			pa, err := CheckPath(volpath, sc.Config)
+			Expect(err).NotTo(HaveOccurred())
+			Expect(pa).NotTo(Equal(PathIsNotFound))


When this assertion fails, the resulting errors is not very helpful:

/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:453 Expected <sanity.PathKind>: not_found not to equal <sanity.PathKind>: not_found /home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:477

You can add a description to it:

Expect(err).NotTo(HaveOccurred(), "checking path %q", volpath) Expect(pa).NotTo(Equal(PathIsNotFound), "path %q should have been created by CSI driver and the test config should enabling testing for that path", volpath)

pohly · 2021-05-03T18:23:29Z

pkg/sanity/node.go

+			By("Checking the target path was removed")
+			pa, err = CheckPath(volpath, sc.Config)
+			Expect(err).NotTo(HaveOccurred())
+			Expect(pa).To(Equal(PathIsNotFound))


Similar here.

dobsonj · 2021-05-03T18:46:20Z

Then this might be exactly the issue that I mentioned: the check path feature must be configured so that it checks the path where hostpath is deployed, not where the csi-sanity test runs.

Yeah, now I see, it needs to be configured here:

csi-test/release-tools/prow.sh

Lines 976 to 985 in a251c44

    
               run_with_loggers "${CSI_PROW_WORK}/csi-sanity" \ 
        
                                -ginkgo.v \ 
        
                                -csi.junitfile "${ARTIFACTS}/junit_sanity.xml" \ 
        
                                -csi.endpoint "dns:///$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' csi-prow-control-plane):$(kubectl get "services/${CSI_PROW_SANITY_SERVICE}" -o "jsonpath={..nodePort}")" \ 
        
                                -csi.stagingdir "/tmp/staging" \ 
        
                                -csi.mountdir "/tmp/mount" \ 
        
                                -csi.createstagingpathcmd "${CSI_PROW_WORK}/mkdir_in_pod.sh" \ 
        
                                -csi.createmountpathcmd "${CSI_PROW_WORK}/mkdir_in_pod.sh" \ 
        
                                -csi.removestagingpathcmd "${CSI_PROW_WORK}/rmdir_in_pod.sh" \ 
        
                                -csi.removemountpathcmd "${CSI_PROW_WORK}/rmdir_in_pod.sh" \

I'll follow up with a fix for this and the other 2 comments about improving the error messages.

pohly · 2021-05-04T07:12:15Z

I'll follow up with a fix for this and the other 2 comments about improving the error messages.

release-tools cannot be modified here. Instead, csi-release-tools has to be modified and then gets imported into various projects.

This failure shows that the new test is a breaking change: users of csi-sanity have to do something when updating, otherwise they get test failures. In this repo that causes a circular dependency: we first have to update csi-release-tools, but that then only works with an update csi-test.

Let's avoid this breaking change. Two options:

Don't set any default for the check path feature: if unset, skip the test.
If creating the path is not the default, expect path checking also to be reconfigured, otherwise skip the test.

The first options looks simpler at first glance, but then the csi-sanity command needs a parameter for setting the builtin default implementation. This could be hard to explain.

Therefore I prefer the second option.

dobsonj · 2021-05-04T14:28:39Z

release-tools cannot be modified here. Instead, csi-release-tools has to be modified and then gets imported into various projects.

This failure shows that the new test is a breaking change: users of csi-sanity have to do something when updating, otherwise they get test failures. In this repo that causes a circular dependency: we first have to update csi-release-tools, but that then only works with an update csi-test.

Let's avoid this breaking change. Two options:

Don't set any default for the check path feature: if unset, skip the test.

If creating the path is not the default, expect path checking also to be reconfigured, otherwise skip the test.

The first options looks simpler at first glance, but then the csi-sanity command needs a parameter for setting the builtin default implementation. This could be hard to explain.

Therefore I prefer the second option.

This makes sense, I'll go with the second option as you suggest.

I noticed after I pushed those changes that make test started failing the verify-subtree.sh check with Directory 'release-tools' contains non-upstream changes. One thing that still puzzles me though is why csi-sanity still failed on the same line after that last change. Could be an issue with the change that I just pushed to prow.sh, or an issue with the hostpath driver (though I didn't notice an obvious problem when reading the code). But is it possible that my change to the local copy of release-tools was somehow ignored during the test? Just trying to better understand the full process.

In any case, I'll revert the change to release-tools and go with option 2 for this current PR.

pohly · 2021-05-04T14:53:31Z

But is it possible that my change to the local copy of release-tools was somehow ignored during the test?

No, I see csi-sanity being called with -csi.checkpathcmd /home/prow/go/pkg/csiprow.ZLUhMcmtDc/checkdir_in_pod.sh.

pohly · 2021-05-04T14:56:47Z

release-tools/prow.sh

+    cat >"${CSI_PROW_WORK}/checkdir_in_pod.sh" <<EOF
+#!/bin/sh
+OUT=\$(kubectl exec "${CSI_PROW_SANITY_POD}" -c "${CSI_PROW_SANITY_CONTAINER}" -- stat --printf="%F" "\$@" 2>&1)
+if [ \$? -ne 0 ]; then


I don't see any obvious issue with this, but treating any error (including failures in kubectl) as "not_found" doesn't look right.

For example, does stat in the container support --printf?

I briefly considered using stat, but then decided against it as too brittle. Instead I went with a sequence of standard shell if checks in https://github.com/intel/pmem-csi/pull/948/files#diff-67bdd13daabc0c2eb2feaba824f0b9668e60c4cd9a68cad31357eaa568c50e8a

I see, thank you, that seems the most likely explanation for the failure in the last test run (i.e. returning non-zero for some unexpected error).

dobsonj · 2021-05-04T19:26:09Z

I dropped the commit that changed release-tools/prow.sh, made minor changes to the error messages in node.go, and then implemented the skip test check as we discussed. This change also exposed a place in hack/e2e.sh that needed to be updated to use the new --csi.checkpathcmd argument to avoid skipping the test.

dobsonj · 2021-05-04T19:29:09Z

CI passed this time, I'll wait to squash the commits again until the review is otherwise complete.

pohly · 2021-05-04T21:23:23Z

pkg/sanity/node.go

+				Skip("CreateTargetPathCmd was set, but CheckPathCmd was not. Please update your testing configuration to enable CheckPathCmd.")
+			}
+			if sc.Config.CreateTargetDir != nil && sc.Config.CheckPath == nil {
+				Skip("CreateTargetDir was set, but CheckPath was not. Please update your testing configuration to enable CheckPath.")


Both should always be non-nil because they have defaults.

What you need to check for here is "sc.Config.CreateTargetDir not default and sc.Config.CheckPath not default".

Also, I think this only needs to be checked when sc.Config.CreateTargetPathCmd is unset. Otherwise we check and use sc.Config.CheckPathCmd.

createMountTargetLocation has 2 separate paths, one for the custom function, but if that is nil then it calls mkdir. sc.Config.CreateTargetDir is nil by default.
https://github.com/dobsonj/csi-test/blob/a2091104af4d4e1df72bc6cfa8de4d955afa411a/pkg/sanity/sanity.go#L363-L378
Originally I did try to compare sc.Config.CheckPath != defaultCheckPath here, but got the following compile error:

pkg/sanity/node.go:463:63: invalid operation: sc.Config.CheckPath != defaultCheckPath (func can only be compared to nil)

Which is interesting... I think that should work fine in C, but apparently not in Go.
So I ended up changing CheckPath to follow the same model as createMountTargetLocation and leave sc.Config.CheckPath set to nil by default, specifically so the check on line 463 would work.
https://github.com/dobsonj/csi-test/blob/a2091104af4d4e1df72bc6cfa8de4d955afa411a/pkg/sanity/sanity.go#L539-L545

If sc.Config.CreateTargetPathCmd is set, we should hit the skip statement on line 461 and not make it down to line 464. Unless CreateTargetPathCmd, CheckPathCmd, and CreateTargetDir are all set at the same time, which seems strange. I can add a check for that though if that's what you're pointing out.

So I ended up changing CheckPath to follow the same model as createMountTargetLocation and leave sc.Config.CheckPath set to nil by default, specifically so the check on line 463 would work.

That's the part that I had missed. Then the current code is fine.

pohly

Looks good now, but please squash before we merge it.

…-csi#336) The CSI spec states that the SP MUST delete the file or directory that it created as part of NodeUnpublishVolume. This adds a new scenario to the sanity test to verify that NodeUnpublishVolume removes TargetPath.

dobsonj · 2021-05-05T15:48:16Z

Looks good now, but please squash before we merge it.

Done, thanks.

pohly

/lgtm
/approve

k8s-ci-robot · 2021-05-05T17:57:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dobsonj, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 28, 2021

k8s-ci-robot requested review from msau42 and pohly April 28, 2021 15:59

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 28, 2021

pohly reviewed Apr 28, 2021

View reviewed changes

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2021

pohly requested changes Apr 30, 2021

View reviewed changes

pohly reviewed Apr 30, 2021

View reviewed changes

pohly requested changes Apr 30, 2021

View reviewed changes

dobsonj force-pushed the 336-NodeUnpublish-test branch from c19211e to 887c56f Compare April 30, 2021 20:43

pohly reviewed May 3, 2021

View reviewed changes

pohly mentioned this pull request May 3, 2021

test: update csi-test to 4.x intel/pmem-csi#948

Merged

pohly reviewed May 3, 2021

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 3, 2021

pohly reviewed May 3, 2021

View reviewed changes

pohly reviewed May 4, 2021

View reviewed changes

dobsonj force-pushed the 336-NodeUnpublish-test branch from 490217e to a209110 Compare May 4, 2021 19:20

pohly reviewed May 4, 2021

View reviewed changes

pohly requested changes May 5, 2021

View reviewed changes

dobsonj force-pushed the 336-NodeUnpublish-test branch from a209110 to f531b5b Compare May 5, 2021 14:14

pohly reviewed May 5, 2021

View reviewed changes

k8s-ci-robot assigned pohly May 5, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2021

k8s-ci-robot merged commit f2a4ecd into kubernetes-csi:master May 5, 2021

dobsonj deleted the 336-NodeUnpublish-test branch May 5, 2021 20:07

dobsonj mentioned this pull request May 7, 2021

prow.sh: enable -csi.checkpathcmd option in csi-sanity kubernetes-csi/csi-release-tools#148

Merged

dobsonj mentioned this pull request Jun 21, 2021

REQUEST: New membership for dobsonj kubernetes/org#2799

Closed

6 tasks

	// Create the target path. Only the directory itself
	// and not its parents get created, and it is an error
	// if the directory already exists.
	if err := os.Mkdir(targetPath, 0755); err != nil {
	return "", err
	}

Extend NodeUnpublish test to verify cleanup of TargetPath (#336) #338

Extend NodeUnpublish test to verify cleanup of TargetPath (#336) #338

Conversation

dobsonj commented Apr 28, 2021 • edited Loading

k8s-ci-robot commented Apr 28, 2021

k8s-ci-robot commented Apr 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly left a comment

Choose a reason for hiding this comment

pohly May 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly left a comment

Choose a reason for hiding this comment

gnufied commented May 3, 2021

gnufied commented May 3, 2021

pohly commented May 3, 2021

pohly commented May 3, 2021

pohly commented May 3, 2021

dobsonj commented May 3, 2021

pohly commented May 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dobsonj commented May 3, 2021

pohly commented May 4, 2021

dobsonj commented May 4, 2021

pohly commented May 4, 2021

pohly May 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dobsonj commented May 4, 2021 • edited Loading

dobsonj commented May 4, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly left a comment

Choose a reason for hiding this comment

dobsonj commented May 5, 2021

pohly left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 5, 2021

dobsonj commented Apr 28, 2021 •

edited

Loading

pohly May 3, 2021 •

edited

Loading

pohly May 4, 2021 •

edited

Loading

dobsonj commented May 4, 2021 •

edited

Loading