Skip to content
This repository has been archived by the owner on Nov 9, 2019. It is now read-only.

path_get refactor and implementation of missing path_ hostcalls on Windows #41

Merged
merged 23 commits into from
Aug 8, 2019
Merged

path_get refactor and implementation of missing path_ hostcalls on Windows #41

merged 23 commits into from
Aug 8, 2019

Conversation

kubkon
Copy link
Member

@kubkon kubkon commented Jul 23, 2019

This PR moves path_get helper out of the sys submodule and puts it in platform-independent hostcalls_impl::fs_helpers module. It requires readlinkat to be implemented which currently is missing on Windows, hence, this PR breaks WASI tutorial on Windows.

It also strives at adding missing path_ hostcalls so that we can enable more tests on Windows.

TODO:

  • readlinkat
  • path_create_directory
  • path_remove_directory
  • path_readlink
  • path_symlink
  • path_unlink_file

EDIT: some basics of the above syscalls are now drafted out. I'm still uneasy about symlink handling on Windows for a couple of reasons:

  1. creating symlinks requires that the user has either the necessary privileges or the OS is in developer mode -- not much we can do about this I guess!
  2. Windows differentiates between file symlinks and dir symlinks whereas AFAIK *nix does not -- for that reason, unlinking a symlink done via path_unlink_file is rather messy as we need to check whether we are dealing with a file or dir symlink and invoke appropriately either std::fs::remove_file or std::fs::remove_dir
  3. AFAICT dangling symlinks are impossible to create on Windows, not to mention creating a symlink look -- for that reason, to temporarily overcome the former until a better solution uncovers (on that below), I've added a hacky routine create_dangling_symlink which essentially creates a target directory, creates a symlink, and then deletes the target directory thus making the symlink dangling
  4. opening files as directories with path_open needs more love ;-)

My current idea for symlinks on Windows is to create a thin, virtual wrapper Symlink (or similar) which will imitate *nix's functionality thus allowing for symlink loops and dangling symlinks until the symlink is actually made solid (by pointing to an actual resource). Then, the Symlink should know whether it points to a file or a dir thus making the cleanup simpler.

@kubkon kubkon changed the title path_get refactor and implementation of missing path_ hostcalls readlinkat on Windows path_get refactor and implementation of missing path_ hostcalls on Windows Jul 23, 2019
@kubkon kubkon marked this pull request as ready for review August 1, 2019 21:21
@kubkon
Copy link
Member Author

kubkon commented Aug 1, 2019

cc @sunfishcode @alexcrichton

OK, so I guess the PR is ready for review and I'd really welcome your feedback here guys. I'm actually kinda stumped with some of this stuff, and especially handling of symlinks on Windows for the reasons mentioned in the PR's description. I've actually gone ahead and created a thin wrapper type, Symlink, around Windows symlinks that essentially tries to imitate *nix symlinks so that self loops and dangling symlinks are possible (otherwise the tests are broken). I'm not sure this is the way we want to go though. In fact, I'd welcome any and all discussion here on the best way to proceed.

Finally, if we decide to scrap this PR as-is, I'm OK with that as well. We should still be able to salvage the implementation of at least some of the path_ syscalls I guess.

@kubkon
Copy link
Member Author

kubkon commented Aug 2, 2019

@sunfishcode here's a thought on the WASI spec: do you think it would make sense to disallow creation of symlink loops and dangling symlinks in WASI? Additionally, would it also make sense to split path_symlink syscall into two: path_symlink_file and path_symlink_dir? Those two changes to the spec would clean up the implementation on Windows a lot (to the best of my knowledge that is! @alexcrichton could you weigh in here as well please?), since then we'd only ever be allowed to open a symlink to an existing resource (which immediately precludes a symlink loop), and having separate syscalls for symlinking files and dirs would match Windows syscalls symlink_file and symlink_dir, whereas on *nix, they'd be one and the same.

This still doesn't address the problem of privileges on Windows: creating symlinks requires that the user has either the necessary privileges or the OS is in developer mode, but I guess it'd clean up the implementation a lot. Anyhow, I'd love to hear your thoughts on all of this!

@alexcrichton
Copy link
Collaborator

Without looking too much into the implementation, this overall looks pretty reasonable. I agree that symlink_file and symlink_dir should probably be added (or at least considered) for the spec itself, but for now having this sort of emulation for Windows seems reasonable. I suspect that removing looping/dangling symlinks may be harder to do, but supporting it the best we can for now seems totally reasonable as well.

@kubkon
Copy link
Member Author

kubkon commented Aug 5, 2019

@sunfishcode did you perhaps have a look at the PR? I'd like to merge it into master before #46, #47 and #54 so as to avoid rebase conflicts and to give @marmistrz ability to enable+add more tests. :-)

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! path_get is some of the most complex code, and it's nice to have more of it factored out into common code.

) -> Result<(File, String)> {
const MAX_SYMLINK_EXPANSIONS: usize = 128;

if path.contains("\0") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a character literal instead of a string literal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be indeed, thanks!

@@ -515,6 +520,8 @@ pub(crate) fn path_link(
let new_dirfd = wasi_ctx
.get_fd_entry(new_dirfd, host::__WASI_RIGHT_PATH_LINK_TARGET, 0)
.and_then(|fe| fe.fd_object.descriptor.as_file())?;
let (old_dirfd, old_path) = path_get(old_dirfd, 0, old_path, false)?;
let (new_dirfd, new_path) = path_get(new_dirfd, 0, new_path, false)?;

hostcalls_impl::path_link(old_dirfd, new_dirfd, old_path, new_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it might be a little error prone, where one could forget to call path_get and in some cases it would still compile and seem to work. Would it make sense to create a struct to replace path_get's tuple return value? Then you could make the hostcalls_impl functions use that struct for their arguments, which would help ensure that path_get has been called. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that sounds like a good idea. Let me have a play with it and I'll report back to this thread :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've encapsulated the return value of path_get inside a PathGet struct. Have a look and let me know if this is what you've had in mind :-) Also, I can see several places we could refactor a lot of impl by extending this struct but I don't want to pile onto this PR too much. Instead, we can for sure refactor in smaller steps in subsequent PRs.

@kubkon
Copy link
Member Author

kubkon commented Aug 7, 2019

@sunfishcode if you're happy with the rest of PR as is, I guess I'll go ahead and squash and merge so that we can move forward with other PRs?

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not yet comfortable with the path_symlink implementation. I made a few comments below, but looking at the big picture, if symlink creation requires special privileges or a special OS mode, it suggests to me we should either:

  • remove path_symlink from WASI, or
  • document that support for path_symlink may be subject to limitations of the underyling filesystem, and maybe even add a way to query a directory to determine what form of symlinks may be created inside it

My suggestion would be to remove the SYMLINKS singleton here, since it's kind of misleading -- it helps emulate the desired functionality for the program that creates the symlinks, but they're not real and other processes wouldn't see them. I propose it's better to just have path_symlink fail in the cases that Windows doesn't support, and we'll take up the remaining questions at the WASI standards level. How does that sound?

let target_path: &Path = target.as_ref();

if source_path.exists() {
Err(host::__WASI_EEXIST)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check if the source_path exists up front, or can we defer this and only check it in the non-file non-dir case?

SymlinkKind::File
} else if target_path.is_dir() {
symlink_dir(target_path, source_path).map_err(errno_from_ioerror)?;
SymlinkKind::Dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How crazy would it be to do this: instead of checking is_file and is_dir, just start by calling symlink_file and if that fails with the right error code, then try symlink_dir, and if that fails, check for the loop/dangling cases?

@kubkon
Copy link
Member Author

kubkon commented Aug 7, 2019

I'm not yet comfortable with the path_symlink implementation. I made a few comments below, but looking at the big picture, if symlink creation requires special privileges or a special OS mode, it suggests to me we should either:

* remove `path_symlink` from WASI, or

* document that support for `path_symlink` may be subject to limitations of the underyling filesystem, and maybe even add a way to query a directory to determine what form of symlinks may be created inside it

My suggestion would be to remove the SYMLINKS singleton here, since it's kind of misleading -- it helps emulate the desired functionality for the program that creates the symlinks, but they're not real and other processes wouldn't see them. I propose it's better to just have path_symlink fail in the cases that Windows doesn't support, and we'll take up the remaining questions at the WASI standards level. How does that sound?

Yep, sounds good to me! As I said, it's been a roller-coaster ride for me trying to get it to work on Windows. The emulation also made me somewhat uneasy since emulating a dangling symlink would not get committed to the OS until it became solid, which, if the app died prematurely, etc., would potentially never happen, leaving other processes none-the-wiser. Anyhow, is the following plan of action aligned with what you've had in mind:

  1. creating dangling and self-loops will result in an error on Windows (no need for SYMLINKS singleton then)
  2. if the OS is not set to allow for symlink creation, we return a __WASI_ENOTCAPABLE error from path_symlink
  3. creating proper symlinks (pointing at existing resources) would be done in two stages: try symlink_file, if fails, try symlink_dir, otherwise, trip with an error
  4. adapt test cases (whichever seem reasonable) to not use dangling symlinks

Those are just some loose thoughts but, in principle, I reckon I should be able to refactor the code to make it work.

Also, do you want me to address the comments relating SYMLINKS singleton if it's going to go anyhow? I'd imagine we just leave them be if that's the case.

@kubkon
Copy link
Member Author

kubkon commented Aug 8, 2019

@sunfishcode OK, so here are the changes as per your suggestion:

  1. removed SYMLINKS singleton altogether
  2. creating dangling and self-loops now results in an error on Windows
  3. if the user lacks permissions to create symlinks, we return a __WASI_ENOTCAPABLE error from path_symlink
  4. creating proper symlinks (pointing at existing resources) is done in two stages: try symlink_file, if fails, try symlink_dir, otherwise, trip with an error
  5. the test cases are adapted as follows:
  • readlink_no_buffer is ignored on Windows as it relates to interpreting what a dangling symlink is pointing at
  • symlink_loop related to symlink loops, hence it's also turned off on Windows
  • nofollow_errors are now split into two testcases: dangling_symlink and nofollow_errors; in the former, we check if opening a dangling symlink yields a __WASI_ELOOP error but only on *nix, and in the latter, we have most of the logic of the original nofollow_errors testcase but without any dangling symlinks.

Have a look and let me know what you reckon!

There is an additional caveat here that I've already mentioned in this PR: Windows does make a distinction between file and directory symlinks. I've adapted the nofollow_errors testcase so that we never try and unlink a symlink pointing at a dir and then link it to a file instead as on Windows that would not make much sense. Question and perhaps a suggestion here for WASI: should we perhaps have two methods path_symlink_file and path_symlink_dir instead of simply path_symlink in the spec? Alternatively, if we won't go down this road, I guess we'll need to ensure that in the docs, we emphasise that on Windows, the apps should not try and swap the target of a symlink as it might yield erroneous results on Windows host.

@marmistrz
Copy link
Member

marmistrz commented Aug 8, 2019

If path_symlink still requires some work, could we possibly merge this change without path_symlink and add the support for this syscall later on? I'd like to fix the conflicts of #46 with this PR and go ahead with path_filestat_get.

Cargo.toml Outdated
@@ -16,6 +16,7 @@ libc = "0.2"
rand = "0.6"
cfg-if = "0.1.9"
log = "0.4"
lazy_static = "1.3.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency is no longer needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, thanks!

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the comment above, this LGTM, so feel free to merge when ready.

@sunfishcode
Copy link
Member

I don't yet have an opinion about splitting path_symlink into file vs directory forms. My impression is that the technique of doing symlink_file and falling back to symlink_dir as needed is workable, though it is a little suboptimal. However, the problem of users trying to swap out the target of a symlink exists even if we split path_symlink into two forms.

@kubkon
Copy link
Member Author

kubkon commented Aug 8, 2019

I don't yet have an opinion about splitting path_symlink into file vs directory forms. My impression is that the technique of doing symlink_file and falling back to symlink_dir as needed is workable, though it is a little suboptimal. However, the problem of users trying to swap out the target of a symlink exists even if we split path_symlink into two forms.

Unfortunately, that's very true on Windows. Perhaps this will require some more thought. :-)

@kubkon kubkon merged commit e18175c into CraneStation:master Aug 8, 2019
@kubkon kubkon deleted the path_get_refactor branch August 8, 2019 15:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants