Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend NodeUnpublish test to verify cleanup of TargetPath (#336) #338

Merged
merged 1 commit into from
May 5, 2021

Conversation

dobsonj
Copy link
Member

@dobsonj dobsonj commented Apr 28, 2021

What type of PR is this?
/kind bug

What this PR does / why we need it:
kubernetes/kubernetes#101441 deprecates removal of the CSI NodePublish target_path by the kubelet. This must be done by the CSI plugin according to the CSI spec.
This PR adds a sanity test to verify that the CSI plugin removes target_path as part of NodeUnpublish.

Which issue(s) this PR fixes:

Fixes #336

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Yes

Removal of the CSI nodepublish path by the kubelet is deprecated. This must be done by the CSI plugin according to the CSI spec.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 28, 2021
@k8s-ci-robot k8s-ci-robot requested review from msau42 and pohly April 28, 2021 15:59
@k8s-ci-robot
Copy link
Contributor

Welcome @dobsonj!

It looks like this is your first PR to kubernetes-csi/csi-test 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/csi-test has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @dobsonj. Thanks for your PR.

I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 28, 2021

// NodeGetVolumeStats is expected to return NotFound
// if the TargetPath is removed by NodeUnpublishVolume
By("Checking NodeGetVolumeStats returns NotFound error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to make assumptions about how NodeGetVolumeStats is implemented. It could return an error because it tracks volume state and knows that NodeUnpublishVolume was called even when that function didn't remove the target path.

What you really need is to check on the node whether the directory is gone. Perhaps you can use CreateTargetDir (

CreateTargetDir func(path string) (string, error)
) or whatever is configured (
CreateTargetPathCmd string
). At least the internal implementation (
// Create the target path. Only the directory itself
// and not its parents get created, and it is an error
// if the directory already exists.
if err := os.Mkdir(targetPath, 0755); err != nil {
return "", err
}
) fails when the directory exists.

But that looks like a hack. A cleaner solution is to introduce a CheckPath feature (with a configurable callback and command) which returns "file", "directory", "other" and "not_found", or an error, and then use that here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @pohly, this was very helpful.
I implemented the CheckPath function, but with a couple of minor differences from what you described.

  1. I used return codes instead of strings, but it does return the 4 conditions you described. The test in node.go only checks for found or not found though. Since the spec states "file or directory" I did not want to make assumptions about the file type in the sanity test.
  2. I only did the configurable callback, not the command string that can override that behavior like in createMountTargetLocation() and removeMountTargetLocation(). This simplified the implementation, and I question whether the command string would be as useful in this context as it is for create / remove commands. Translating those 4 conditions into error codes (or strings) would add some complexity to the command. If the command string supports an essential use case though, I can add in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command string is essential for those users who invoke the csi-sanity command to run tests. They can't extend that binary with Go callbacks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for elaborating on this, I added the command string option.
I tested it with the following check_path.sh bash script:

#!/bin/bash
if [ -f "$1" ]; then
        echo "1" # file
elif [ -d "$1" ]; then
        echo "2" # dir
elif [ ! -e "$1" ]; then
        echo "3" # not found
else
        echo "4" # other
fi

And that passes against the mock driver using the following args:

$ csi-sanity -csi.checkpathcmd ./check_path.sh -csi.endpoint /tmp/csi.sock

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use numbers as output? I find strings as in my original proposal easier to use because there's no chance that someone, say, returns 3 when they meant 4. The same strings can be string constants in Go for the Go callback, so it could be even made type safe there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, strings are a better approach here. When I added the Atoi call I started to question if that was really the best option. Updated.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 29, 2021
return "", fmt.Errorf("check path command %s failed: %v", config.CheckPathCmd, err)
}
// Convert the command's stdout to an integer. This is expected to match
// the value for PathIsFile, PathIsDir, PathIsNotFound, or PathIsOther.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out-dated, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

@@ -449,3 +460,68 @@ func PseudoUUID() string {
func UniqueString(prefix string) string {
return prefix + uniqueSuffix
}

// Return codes for CheckPath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please introduce a type (like type PathKind string) and use that to make the code more type safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
// Convert the command's stdout to an integer. This is expected to match
// the value for PathIsFile, PathIsDir, PathIsNotFound, or PathIsOther.
outstr := strings.TrimSpace(string(out))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it would be good to check that the script has return a valid string. If not, raise an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

default:
err = fmt.Errorf("invalid PathType: %s", pk)
}
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems non-idiomatic. A shorter version is:

func (pk PathKind) validate() error {
	switch pk {
	case PathIsFile, PathIsDir, PathIsNotFound, PathIsOther:
              return nil
         default:
		return fmt.Errorf("invalid PathType: %s", pk)
	}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also find it odd to first cast a string and then call validate. It works of course, but a func IsPathKind (in string) (PathKind, error) makes the intent clearer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, done. Appreciate your help with this.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash your commits.

Except for some nits it looks okay, but I want to try it out before merging. I should get to that on Monday.

@dobsonj dobsonj force-pushed the 336-NodeUnpublish-test branch from c19211e to 887c56f Compare April 30, 2021 20:43
@@ -81,6 +81,8 @@ func main() {
stringVar(&config.RemoveTargetPathCmd, "removemountpathcmd", "Command to run for target path removal")
stringVar(&config.RemoveStagingPathCmd, "removestagingpathcmd", "Command to run for staging path removal")
durationVar(&config.RemovePathCmdTimeout, "removepathcmdtimeout", "Timeout for the commands to remove target and staging paths, in seconds")
stringVar(&config.CheckPathCmd, "checkpathcmd", "Command to run to check a given path")
Copy link
Contributor

@pohly pohly May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard for users of csi-sanity to find out what that command needs to do. Right now they have to dig into the source code to find the comment in pkg/sanity/sanity.go.

Perhaps add It must print "file", "directory", "not_found", "other" on stdout.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -449,6 +449,51 @@ var _ = DescribeSanity("Node Service", func(sc *TestContext) {
Expect(ok).To(BeTrue())
Expect(serverError.Code()).To(Equal(codes.InvalidArgument), "unexpected error: %s", serverError.Message())
})

It("should fail if target path was not removed by driver", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace with "should remove target path". The full test name then becomes `NodeUnpublishVolume should remove target path", which is a valid statement about the behavior of the driver.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done. I tried this out in PMEM-CSI (intel/pmem-csi#948) and it worked as expected.

@gnufied
Copy link
Contributor

gnufied commented May 3, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 3, 2021
@gnufied
Copy link
Contributor

gnufied commented May 3, 2021

Once this merges - we should send out an email to CSI announce list on - https://github.com/container-storage-interface/community so as CSI driver authors can fix their drivers.

@pohly
Copy link
Contributor

pohly commented May 3, 2021

Once this merges - we should send out an email to CSI announce list on - https://github.com/container-storage-interface/community so as CSI driver authors can fix their drivers.

That change of behavior in Kubernetes was already announced in https://groups.google.com/g/container-storage-interface-drivers-announce/c/GgtP9kiv5Qc/m/CO2yyOJvAAAJ

That announcement did not specifically talk about deleting the directory, though, because it follows from the spec. Pointing out the change of behavior of the csi-sanity suite is still worthwhile to avoid surprises.

@pohly
Copy link
Contributor

pohly commented May 3, 2021

Is the mock driver not creating the target path? That might have to be added for pull-kubernetes-csi-csi-test to pass.

@pohly
Copy link
Contributor

pohly commented May 3, 2021

That the test checks the path when it should exist is useful because it informs users of csi-sanity when their check path configuration isn't working. Please don't remove that.

@dobsonj
Copy link
Member Author

dobsonj commented May 3, 2021

Is the mock driver not creating the target path? That might have to be added for pull-kubernetes-csi-csi-test to pass.

The mock driver works okay, but pull-kubernetes-csi-csi-test uses hostpath. I'm looking into that.

@pohly
Copy link
Contributor

pohly commented May 3, 2021

Is the mock driver not creating the target path? That might have to be added for pull-kubernetes-csi-csi-test to pass.

The mock driver works okay, but pull-kubernetes-csi-csi-test uses hostpath. I'm looking into that.

Then this might be exactly the issue that I mentioned: the check path feature must be configured so that it checks the path where hostpath is deployed, not where the csi-sanity test runs.

The test assertion failure could be enhanced. I'll leave a comment on the code line.

By("Checking the target path exists")
pa, err := CheckPath(volpath, sc.Config)
Expect(err).NotTo(HaveOccurred())
Expect(pa).NotTo(Equal(PathIsNotFound))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this assertion fails, the resulting errors is not very helpful:

/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:453
Expected
    <sanity.PathKind>: not_found
not to equal
    <sanity.PathKind>: not_found
/home/prow/go/src/github.com/kubernetes-csi/csi-test/pkg/sanity/node.go:477

You can add a description to it:

Expect(err).NotTo(HaveOccurred(), "checking path %q", volpath)
Expect(pa).NotTo(Equal(PathIsNotFound), "path %q should have been created by CSI driver and the test config should enabling testing for that path", volpath)

By("Checking the target path was removed")
pa, err = CheckPath(volpath, sc.Config)
Expect(err).NotTo(HaveOccurred())
Expect(pa).To(Equal(PathIsNotFound))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here.

@dobsonj
Copy link
Member Author

dobsonj commented May 3, 2021

Then this might be exactly the issue that I mentioned: the check path feature must be configured so that it checks the path where hostpath is deployed, not where the csi-sanity test runs.

Yeah, now I see, it needs to be configured here:

run_with_loggers "${CSI_PROW_WORK}/csi-sanity" \
-ginkgo.v \
-csi.junitfile "${ARTIFACTS}/junit_sanity.xml" \
-csi.endpoint "dns:///$(docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' csi-prow-control-plane):$(kubectl get "services/${CSI_PROW_SANITY_SERVICE}" -o "jsonpath={..nodePort}")" \
-csi.stagingdir "/tmp/staging" \
-csi.mountdir "/tmp/mount" \
-csi.createstagingpathcmd "${CSI_PROW_WORK}/mkdir_in_pod.sh" \
-csi.createmountpathcmd "${CSI_PROW_WORK}/mkdir_in_pod.sh" \
-csi.removestagingpathcmd "${CSI_PROW_WORK}/rmdir_in_pod.sh" \
-csi.removemountpathcmd "${CSI_PROW_WORK}/rmdir_in_pod.sh" \

I'll follow up with a fix for this and the other 2 comments about improving the error messages.

@pohly
Copy link
Contributor

pohly commented May 4, 2021

I'll follow up with a fix for this and the other 2 comments about improving the error messages.

release-tools cannot be modified here. Instead, csi-release-tools has to be modified and then gets imported into various projects.

This failure shows that the new test is a breaking change: users of csi-sanity have to do something when updating, otherwise they get test failures. In this repo that causes a circular dependency: we first have to update csi-release-tools, but that then only works with an update csi-test.

Let's avoid this breaking change. Two options:

  • Don't set any default for the check path feature: if unset, skip the test.
  • If creating the path is not the default, expect path checking also to be reconfigured, otherwise skip the test.

The first options looks simpler at first glance, but then the csi-sanity command needs a parameter for setting the builtin default implementation. This could be hard to explain.

Therefore I prefer the second option.

@dobsonj
Copy link
Member Author

dobsonj commented May 4, 2021

release-tools cannot be modified here. Instead, csi-release-tools has to be modified and then gets imported into various projects.

This failure shows that the new test is a breaking change: users of csi-sanity have to do something when updating, otherwise they get test failures. In this repo that causes a circular dependency: we first have to update csi-release-tools, but that then only works with an update csi-test.

Let's avoid this breaking change. Two options:

  • Don't set any default for the check path feature: if unset, skip the test.
  • If creating the path is not the default, expect path checking also to be reconfigured, otherwise skip the test.

The first options looks simpler at first glance, but then the csi-sanity command needs a parameter for setting the builtin default implementation. This could be hard to explain.

Therefore I prefer the second option.

This makes sense, I'll go with the second option as you suggest.

I noticed after I pushed those changes that make test started failing the verify-subtree.sh check with Directory 'release-tools' contains non-upstream changes. One thing that still puzzles me though is why csi-sanity still failed on the same line after that last change. Could be an issue with the change that I just pushed to prow.sh, or an issue with the hostpath driver (though I didn't notice an obvious problem when reading the code). But is it possible that my change to the local copy of release-tools was somehow ignored during the test? Just trying to better understand the full process.

In any case, I'll revert the change to release-tools and go with option 2 for this current PR.

@pohly
Copy link
Contributor

pohly commented May 4, 2021

But is it possible that my change to the local copy of release-tools was somehow ignored during the test?

No, I see csi-sanity being called with -csi.checkpathcmd /home/prow/go/pkg/csiprow.ZLUhMcmtDc/checkdir_in_pod.sh.

cat >"${CSI_PROW_WORK}/checkdir_in_pod.sh" <<EOF
#!/bin/sh
OUT=\$(kubectl exec "${CSI_PROW_SANITY_POD}" -c "${CSI_PROW_SANITY_CONTAINER}" -- stat --printf="%F" "\$@" 2>&1)
if [ \$? -ne 0 ]; then
Copy link
Contributor

@pohly pohly May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious issue with this, but treating any error (including failures in kubectl) as "not_found" doesn't look right.

For example, does stat in the container support --printf?

I briefly considered using stat, but then decided against it as too brittle. Instead I went with a sequence of standard shell if checks in https://github.com/intel/pmem-csi/pull/948/files#diff-67bdd13daabc0c2eb2feaba824f0b9668e60c4cd9a68cad31357eaa568c50e8a

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you, that seems the most likely explanation for the failure in the last test run (i.e. returning non-zero for some unexpected error).

@dobsonj dobsonj force-pushed the 336-NodeUnpublish-test branch from 490217e to a209110 Compare May 4, 2021 19:20
@dobsonj
Copy link
Member Author

dobsonj commented May 4, 2021

I dropped the commit that changed release-tools/prow.sh, made minor changes to the error messages in node.go, and then implemented the skip test check as we discussed. This change also exposed a place in hack/e2e.sh that needed to be updated to use the new --csi.checkpathcmd argument to avoid skipping the test.

@dobsonj
Copy link
Member Author

dobsonj commented May 4, 2021

CI passed this time, I'll wait to squash the commits again until the review is otherwise complete.

Skip("CreateTargetPathCmd was set, but CheckPathCmd was not. Please update your testing configuration to enable CheckPathCmd.")
}
if sc.Config.CreateTargetDir != nil && sc.Config.CheckPath == nil {
Skip("CreateTargetDir was set, but CheckPath was not. Please update your testing configuration to enable CheckPath.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both should always be non-nil because they have defaults.

What you need to check for here is "sc.Config.CreateTargetDir not default and sc.Config.CheckPath not default".

Also, I think this only needs to be checked when sc.Config.CreateTargetPathCmd is unset. Otherwise we check and use sc.Config.CheckPathCmd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createMountTargetLocation has 2 separate paths, one for the custom function, but if that is nil then it calls mkdir. sc.Config.CreateTargetDir is nil by default.
https://github.com/dobsonj/csi-test/blob/a2091104af4d4e1df72bc6cfa8de4d955afa411a/pkg/sanity/sanity.go#L363-L378
Originally I did try to compare sc.Config.CheckPath != defaultCheckPath here, but got the following compile error:

pkg/sanity/node.go:463:63: invalid operation: sc.Config.CheckPath != defaultCheckPath (func can only be compared to nil)

Which is interesting... I think that should work fine in C, but apparently not in Go.
So I ended up changing CheckPath to follow the same model as createMountTargetLocation and leave sc.Config.CheckPath set to nil by default, specifically so the check on line 463 would work.
https://github.com/dobsonj/csi-test/blob/a2091104af4d4e1df72bc6cfa8de4d955afa411a/pkg/sanity/sanity.go#L539-L545

If sc.Config.CreateTargetPathCmd is set, we should hit the skip statement on line 461 and not make it down to line 464. Unless CreateTargetPathCmd, CheckPathCmd, and CreateTargetDir are all set at the same time, which seems strange. I can add a check for that though if that's what you're pointing out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ended up changing CheckPath to follow the same model as createMountTargetLocation and leave sc.Config.CheckPath set to nil by default, specifically so the check on line 463 would work.

That's the part that I had missed. Then the current code is fine.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now, but please squash before we merge it.

…-csi#336)

The CSI spec states that the SP MUST delete the file or directory that
it created as part of NodeUnpublishVolume. This adds a new scenario to
the sanity test to verify that NodeUnpublishVolume removes TargetPath.
@dobsonj dobsonj force-pushed the 336-NodeUnpublish-test branch from a209110 to f531b5b Compare May 5, 2021 14:14
@dobsonj
Copy link
Member Author

dobsonj commented May 5, 2021

Looks good now, but please squash before we merge it.

Done, thanks.

Copy link
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 5, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dobsonj, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2021
@k8s-ci-robot k8s-ci-robot merged commit f2a4ecd into kubernetes-csi:master May 5, 2021
@dobsonj dobsonj deleted the 336-NodeUnpublish-test branch May 5, 2021 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend NodeUnpublish test to verify cleanup of node_publish path
4 participants