Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support wildcards in reverse mode --exclude #367

Closed
wants to merge 18 commits into from
Closed

Support wildcards in reverse mode --exclude #367

wants to merge 18 commits into from

Conversation

ekalin
Copy link
Contributor

@ekalin ekalin commented Feb 16, 2019

This adds support for gitignore-like wildcards and exclude patters in reverse mode. It (somewhat) fixes #273: no regexp support, but the syntax should be powerful enough to satisfy most needs.

Also, since adding a lot of --exclude options can be tedious, it adds the --exclude-from option to read patterns from a file (or files).


Here are the steps that I've mapped:

  • Store exclusion patterns as plaintext
    Since there will be wildcards, it's not possible anymore to save a list of encrypted paths (calculated during initialization) and match against this list. Instead, in isExcluded the path is decrypted and matched against the plaintext excluded paths.
  • Try avoid duplicate calls to decryptPath
    In Open, for example, the path is decrypted in the isExcluded check, and then when newFile is called to read the real file, the path is decrypted again. It might be possible to optimize the code to avoid decrypting it twice in this and the other Fuse callbacks. Also newFile calls isExcluded as a safeguard, generating yet another decryption.
  • Add support for wildcards when checking for exclusions
    I'm thiking of using https://github.com/sabhiram/go-gitignore to do the checking.
  • Add support for --exclude-from command-line option
  • Document new command-line options and behaviors.

@ekalin
Copy link
Contributor Author

ekalin commented Feb 16, 2019

"Store exclusion patterns as plaintext" is done.

rfs.cExclude is redundant now, but it'll probably become an object of the go-gitignore library in the future.

This broke internal/fusefrontend_reverse/isexcluded_test.go. Since isExcluded now tries to decrypt the name, it needs a nameTransform. Ideally I'd replace it with some kind of mock object just to run the test, but I couldn't find a way to do that.

But in reality that test isn't particularly useful. It's not possible to exclude the root directory (there's a check on line 67 of internal/fusefrontend_reverse/rfs.go). And once matching is done with the gitignore library, it'll probably make no sense to try to add "" as an exclusion. So I'm inclined to delete that test, but I'd like to check with you first.

Edit: Never mind, that test needs to be changed when using the gitignore matcher.

@rfjakob
Copy link
Owner

rfjakob commented Feb 17, 2019

Thanks for working on this! gitignore format seems like a good choice.

The only thing I'm a little concerned about is that this could break existing users of "--exclude".

@ekalin
Copy link
Contributor Author

ekalin commented Feb 17, 2019

There is indeed a change in functionality, previously '--exclude some_file.txt' only excluded some_file.txt in the root of the filesystem, now it will exclude a file with that name anywhere.

However, by adding a '/' before the paths should work as before: '--exclude /some_file.txt' only excludes some_file.txt in the root. So it should be a simple migration.

@ekalin
Copy link
Contributor Author

ekalin commented Feb 17, 2019

I have removed the duplicate calls to decryptPath (once when checking for exclusion and then in the actual Fuse callback), but it's really not elegant: isExcluded returns the outputs of decryptPath.

If you think this is really bad, feel free to skip this commit. The support for --exclude-from should be independent of this change (as a matter of fact, should be independent of all the other changes.)

@rfjakob
Copy link
Owner

rfjakob commented Feb 19, 2019

I think we should add a new flag, -e2, for the new syntax. And when the old -e foo flag is used, we convert it to -e2 /foo.

@ekalin
Copy link
Contributor Author

ekalin commented Feb 19, 2019

It can be done, but my plan is also to add an option to read from a file, and the user would be able to combine this new option with the existing ones is he wants to. It might be confusing if --exclude-from supports one syntax and the command line ones support two. There could be two --exclude-from variants, but then the number of options might get out of hand :-)

And -e2 seems a bad name, but I can't think of a better one...

But anyway, in the end it's your call. If you still prefer to keep -e as a compatibility as it works currently, I'll add another flag for the new syntax.

@rfjakob
Copy link
Owner

rfjakob commented Feb 21, 2019

The -e option is part of the stable ABI, we cannot change it easily. Yes please add a new flag. Suggestions for a name are welcome!

Idea:

  • -exclude-wildcard / -ew

Since exclusions will support wildcards, it's not possible to store
the list of exclusions as a list of encrypted paths (calculated during
initialization). Instead, the plain paths are stored, and the path to
be checked is decrypted at the time of isExcluded check and compared
with the stored plaintext paths.
isExcluded needs to decryptPath, so it was made to return the results
of this function to that the Fuse callbacks don't need to call
decryptPath again.
-exclude keeps the original behaviour by matching from the root of the
mounted filesystem.

The new option -exclude-wildcard (or -ew) matches anywhere (unless tha
pattern begins with '/').
It allows the user to specify a file containg exclusion patters. The
patterns are "wildcard" ones, that is, matching anywhere.
@ekalin ekalin changed the title WIP: Support wildcards in reverse mode --exclude Support wildcards in reverse mode --exclude Feb 24, 2019
@ekalin
Copy link
Contributor Author

ekalin commented Feb 24, 2019

I believe the implementation is ready.

// from decryptPath for convenience.
func (rfs *ReverseFS) isExcludedCipher(relPath string) (bool, string, error) {
if rfs.isTranslatedConfig(relPath) || rfs.isDirIV(relPath) {
return false, "", nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since isDirIV only checks the base name and not the full path, an attacker might be able to check for the existence of a directory by accessing DIRNAME/gocryptfs.diriv.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume the user has specified that directory DIRNAME should be excluded. With the old exclusion logic, all files and subdirectories (e.g., DIRNAME/gocryptfs.diriv) were excluded as well.

With the new code, isExcludedCipher("DIRNAME/gocryptfs.diriv") will return excluded=false since the check is skipped for gocryptfs.diriv files (see implementation of isDirIV). I didn't find any easy way to exploit that, but the new behavior does not seem correct to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't exploit it either. Putting some debugging calls, I see that if I run stat pHIb3vvfW_V4E9G1svEJlw/gocryptfs.diriv where pHIb3vvfW_V4E9G1svEJlw is the encrypted name of an excluded directory, I see that GetAttr (and then isExcludedCiper) are called first for the parent directory (which returns false).

If there are several directories in the path, GetAttr is called sequentially for each directory. I'm not sure where this behavior comes from.

That said, there might be a way to skip this parent directory checking. So I suppose that in isExcludedCipher, if the file is a special file (DirIV or NameFile) I should check if the containing directory is excluded. For TranslatedConfig I think this is unnecessary as it only appears in the root directory, and for regular files the actual pattern matching handles the exclusion correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where this behavior comes from.

The kernel is responsible for the GetAttr call, however, I think its a bad idea to rely on it as a security feature. There might be ways to prevent it from happening (the kernel caches such information under specific circumstances, and does not issue another request before some time has passed), and it also wouldn't be called when gocryptfs is embedded into a third party app that uses the FS interface directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For DirIV files, I've added a check whether the parent directory is excluded. For this case, the previous behavior should be back.

@@ -304,10 +266,10 @@ func (rfs *ReverseFS) openDirPlaintextnames(relPath string, entries []fuse.DirEn

// OpenDir - FUSE readdir call
func (rfs *ReverseFS) OpenDir(cipherPath string, context *fuse.Context) ([]fuse.DirEntry, fuse.Status) {
if rfs.isExcluded(cipherPath) {
excluded, relPath, err := rfs.isExcludedCipher(cipherPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this patch applied, special paths (translated configs, dir IVs and name files) will be translated to incorrect paths. Opening any of the special paths should fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide an example what could be going wrong here, assume cipherPath ends with .name. In practice this shouldn't happen because name-files are regular files and not directories, and FUSE will check the file type before calling any operations - however, for now assume that the user can somehow trick FUSE into issuing such an operation. With the old code decrypting a filename would have failed.

With the updated code, isExcludedCipher will silently remove the .name-extension, and behave like the user opened some different path. A FUSE request to open the translated config or a gocryptfs.diriv file will now also behave different, and try to open the root Cipherdir (since isExcludedCipher will translate those paths to relPath = "").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems even harder to exploit, but I believe that the check if the parent directory is excluded should also prevent this case and also the Readlink one below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think checking the parent directory is sufficient here. Opening a .name file with OpenDir should never work, no matter if the parent directory is excluded or not.

Previously rfs.decryptPath failed for such paths. Maybe we should now add explicit checks and disallow operations that do not make any sense? Maybe isExcludedCipher could even return the file type (regular file, name file, diriv file, ...)? @rfjakob: Any opinions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's not necessary to return the file type; just a boolean indicating if it's a virtual file should be enough to add a check in OpenDir and Readlink for those kinds of files and stop access to them. Or do you see the need for different behavior for each kind of special file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OpenDir and Readlink it would be sufficient, but in some functions the file type is required anyway - see, e.g., the Open() FUSE call. I am not sure what a suitable design would be, though.

return entries
}
filtered = make([]fuse.DirEntry, 0, len(entries))
for _, entry := range entries {
// filepath.Join handles the case of cipherPath="" correctly:
// Join("", "foo") -> "foo". This does not: cipherPath + "/" + name"
p := filepath.Join(cDir, entry.Name)
if rfs.isExcluded(p) {
if excluded, _, _ := rfs.isExcludedCipher(p); excluded {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit inefficient: The paths are first encrypted and then decrypted just to check if they are excluded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed.

@@ -417,10 +380,14 @@ func (rfs *ReverseFS) StatFs(relPath string) *fuse.StatfsOut {

// Readlink - FUSE call
func (rfs *ReverseFS) Readlink(relPath string, context *fuse.Context) (string, fuse.Status) {
if rfs.isExcluded(relPath) {
excluded, pPath, err := rfs.isExcludedCipher(relPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior here is also changed for special files (translated config, dirIV and name files).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much the same issue as above. Previously decryptPath (in openBackingDir) would have failed for files ending with .name or other special files (e.g., gocryptfs.diriv). Such files should never be symlinks.

With the new code, however, the .name extension is silently dropped. If the user would somehow trick FUSE into issuing a Readlink operation on a name file, it would now return the result as if the command would have been applied on the file without the .name extension.

Files are excluded before their names are encrypted, to avoid having
the names decrypted again when checking for exclusions.
@ekalin
Copy link
Contributor Author

ekalin commented Feb 25, 2019

As for the other items in the review, I don't quite understand the problems. Could you perhaps describe them in some more detail, if possible showing an example?

Copy link
Contributor

@slackner slackner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some more detailed explanations, please let me know if there is still anything unclear.

// from decryptPath for convenience.
func (rfs *ReverseFS) isExcludedCipher(relPath string) (bool, string, error) {
if rfs.isTranslatedConfig(relPath) || rfs.isDirIV(relPath) {
return false, "", nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume the user has specified that directory DIRNAME should be excluded. With the old exclusion logic, all files and subdirectories (e.g., DIRNAME/gocryptfs.diriv) were excluded as well.

With the new code, isExcludedCipher("DIRNAME/gocryptfs.diriv") will return excluded=false since the check is skipped for gocryptfs.diriv files (see implementation of isDirIV). I didn't find any easy way to exploit that, but the new behavior does not seem correct to me.

@@ -304,10 +266,10 @@ func (rfs *ReverseFS) openDirPlaintextnames(relPath string, entries []fuse.DirEn

// OpenDir - FUSE readdir call
func (rfs *ReverseFS) OpenDir(cipherPath string, context *fuse.Context) ([]fuse.DirEntry, fuse.Status) {
if rfs.isExcluded(cipherPath) {
excluded, relPath, err := rfs.isExcludedCipher(cipherPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide an example what could be going wrong here, assume cipherPath ends with .name. In practice this shouldn't happen because name-files are regular files and not directories, and FUSE will check the file type before calling any operations - however, for now assume that the user can somehow trick FUSE into issuing such an operation. With the old code decrypting a filename would have failed.

With the updated code, isExcludedCipher will silently remove the .name-extension, and behave like the user opened some different path. A FUSE request to open the translated config or a gocryptfs.diriv file will now also behave different, and try to open the root Cipherdir (since isExcludedCipher will translate those paths to relPath = "").

@@ -417,10 +380,14 @@ func (rfs *ReverseFS) StatFs(relPath string) *fuse.StatfsOut {

// Readlink - FUSE call
func (rfs *ReverseFS) Readlink(relPath string, context *fuse.Context) (string, fuse.Status) {
if rfs.isExcluded(relPath) {
excluded, pPath, err := rfs.isExcludedCipher(relPath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much the same issue as above. Previously decryptPath (in openBackingDir) would have failed for files ending with .name or other special files (e.g., gocryptfs.diriv). Such files should never be symlinks.

With the new code, however, the .name extension is silently dropped. If the user would somehow trick FUSE into issuing a Readlink operation on a name file, it would now return the result as if the command would have been applied on the file without the .name extension.

ekalin added 2 commits March 8, 2019 18:58
This was done to allow it to be replace with a mock object in tests
(specifically in excluder_test.go).
Assuming a directory SOMEDIR (possibly deep in the filesystem
hierarchy) is excluded, now SOMEDIR/gocryptfs.diriv is considered
excluded.

If it were not excluded, it might be possible to infer the existence
of SOMEDIR by checking for SOMEDIR/gocryptfs.diriv.

In practice I've not been able to actually exploit this, as fuse or
the kernel tries to open each directory in a path before actually
trying to open (or stat) a file, but it's safer if gocryptfs makes a
further check.
@rfjakob
Copy link
Owner

rfjakob commented Mar 10, 2019

@ekalin do you think the issues that @slackner mentioned are resolved?

@ekalin
Copy link
Contributor Author

ekalin commented Mar 10, 2019

I think adding check in OpenDir and Readlink if the file is not a virtual file created by gocryptfs (diriv or .name for long file names) is a valid idea, even if it's something hard to exploit.

But I didn't really understood @slackner's last comment about Open or other functions. They already handle the virtual files, and for regular files (that represent a real file in the backing filesystem) the FS call to the backing filesystem should handle error responses (trying to open a directory, or a nonexistent file, etc). But perhaps he can clarify what he had in mind.

All that said, this is my first time working with Fuse, so I'm sure people with more experience can see some things I might be missing.

@ekalin
Copy link
Contributor Author

ekalin commented Mar 11, 2019

@slackner, was your idea of returning the file type from isExcludeCipher just to avoid having to check the type again (as in Open, for example), or were you thinking of something else?

@slackner
Copy link
Contributor

@ekalin do you think the issues that @slackner mentioned are resolved?

Unfortunately, I don't think all of the issues are resolved yet. From my point of view, the main issue is still that special files (dirIV files, name files, ...) are not handled correctly in all FUSE calls (e.g., OpenDir or Readlink). Previously decryptPath failed for special files, but now that the decoding happens in isExcludeCipher, .name extensions, for example, are just silently dropped. Some other paths (dirIV and config paths) are silently mapped to "", so they might operate on the root directory (instead of returning an error, as expected).

@slackner, was your idea of returning the file type from isExcludeCipher just to avoid having to check the type again (as in Open, for example), or were you thinking of something else?

I suggested this for two reasons. The first one is to avoid the duplicate checks. Even more important, however, is that we explicitly think about which file types are allowed in which FUSE call. Returning the file type from isExcludeCipher would ensure that we do not interpret the returned plaintext path without also considering the file type.

@ekalin
Copy link
Contributor Author

ekalin commented Mar 12, 2019

Open, Access and GetAttr check the type of file and act accordingly. They also check for and error in the return of isExcludedCipher, and the decrypted path is only used for regular files. I don't see any problem with those functions.

OpenDir and Readlink need to check if it's a special file, and fail if that's the case. I'll add that check shortly.

Finally, StatFs doesn't really do anything with the file it's passed. But it wouldn't hurt to check if isExcludedCipher returns an error.

As for returning the type in isExcludedCipher, it would avoid duplicate checks for special files in the functions. These checks are just string matches, but they might be called a lot (imagine running find in a huge directory hierarchy or something like that), and it seems that to access a file, each directory in its path is opened sequentially.

So there might be a performance gain. But perhaps this gain is negligible in comparison to the time necessary to do disk I/O to read the file or its metadata. Do any of you have any idea if avoiding the duplicate file type check would actually be helpful?

One disadvantage of doing that is that isExcludedCipher would do even more, and return yet another parameter. Perhaps it should be renamed to something more generic like getFileInfo.

@slackner
Copy link
Contributor

So there might be a performance gain. But perhaps this gain is negligible in comparison to the time necessary to do disk I/O to read the file or its metadata. Do any of you have any idea if avoiding the duplicate file type check would actually be helpful?

One disadvantage of doing that is that isExcludedCipher would do even more, and return yet another parameter. Perhaps it should be renamed to something more generic like getFileInfo.

I would expect that the performance gain is negligible, however, having checks at two different places seems a bit error-prone to me. Assume we introduce new file types at some point, then a check like if type != FileTypeRegular { will still work. Duplicating the checks means we have to update a lot more places. Note that I am not the maintainer of this project though, maybe @rfjakob has a different opinion on how this should be solved.

I agree that we should probably try to come up with a better name when the function includes even more logic.

@rfjakob
Copy link
Owner

rfjakob commented Mar 17, 2019

Looks pretty solid to me now. As a merge preparation, I have rebased to master and squashed: https://github.com/rfjakob/gocryptfs/commits/better-exclude

@rfjakob
Copy link
Owner

rfjakob commented Mar 26, 2019

Merged as 3bc100a , thanks!

@rfjakob rfjakob closed this Mar 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exclude in reverse-mode with regex
3 participants