Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow colons in Windows host paths #235

Merged
merged 1 commit into from
Aug 6, 2019

Conversation

mrowqa
Copy link
Contributor

@mrowqa mrowqa commented Aug 1, 2019

This PR allows calls like wasmtime --mapdir=.:\\?\C:\Users\user\workspace cp.wasm a.txt b.txt.

@mrowqa
Copy link
Contributor Author

mrowqa commented Aug 1, 2019

Note: this notation could lead to errors like --mapdir=C:\some\path (\some\path is valid Windows path). We can change the separator if there's a need. Double colon should be safe.

@kubkon
Copy link
Member

kubkon commented Aug 1, 2019

This looks good! Although I'm kinda worried about the possibility of errors you mentioned above, but I think I like the idea of having a :: as a separator. Do you think you could introduce that in this PR, or you reckon we should open another PR with this proposed change?

@mrowqa
Copy link
Contributor Author

mrowqa commented Aug 2, 2019

I think there's no need for another PR with just changed separator. We should decide which one we want (@sunfishcode ?), update this PR and then merge it. If we decide to stay with single colon, then we could think about some warning, but we'd need to think about the check itself (doesn't look at first glance as something easy to check exhaustively).

@mrowqa
Copy link
Contributor Author

mrowqa commented Aug 4, 2019

Another note which I should have put here earlier. I'm still confused. Since, if I remember correctly, call like --dir=/some/path/ cp.wasm /some/path/a /some/path/b worked on WSL/Linux, but analogous call on Windows --dir=C:\some\path cp.wasm C:\some\path\a C:\some\path\b didn't. But it should, as far as I know (the stdlib implementation targeting wasi should consume known prefix of path and translate it to some preopened dirfd). And if that's the case, then we should use '::' as separator, because proposed code in this PR would parse incorrectly call like --mapdir=C:\path:C:\another.

@tschneidereit
Copy link
Member

Since, if I remember correctly, call like --dir=/some/path/ cp.wasm /some/path/a /some/path/b worked on WSL/Linux, but analogous call on Windows --dir=C:\some\path cp.wasm C:\some\path\a C:\some\path\b didn't. But it should, as far as I know (the stdlib implementation targeting wasi should consume known prefix of path and translate it to some preopened dirfd).

Agreed: this should definitely work. An interesting question (for WASI standardization) is whether it should be allowed to use Windows-style paths inside a WASI module at all. The alternative being to standardize on *nix-style paths: no drive letters, / as separator. Given that code has to be reasonably portable to compile to WASI in the first place, this doesn't seem like it'd be much of a burden.

As for which separator to use, one option would be to let the user choose: require the separator to also be mentioned at the beginning of the mapping pattern, and maybe at the end:
--mapdir="|/c/path|C:\path|".

@kubkon
Copy link
Member

kubkon commented Aug 4, 2019

@tschneidereit excellent point! As I’ve already discussed offline with @mrowqa, with CraneStation/wasi-common#41 landing, any path with slashes works as expected:

C:/some/path

And:

/some/path

Note that both are equivalent on Windows.

As @mrowqa aptly pointed out, paths with backslashes however are somewhat of a mystery ATM, but we’re currently trying figure it out.

@mrowqa
Copy link
Contributor Author

mrowqa commented Aug 5, 2019

@tschneidereit I also considered standardization of the paths. However, I don't know much about what programs we are currently compiling successfully and what may break, so it's hard for me to favor some specific option.

--mapdir="|/c/path|C:\path|"
That's one good option, but it looks unnatural to me. Having static separator is nicer and I have done actually research which separator should be safe coming up with ::[1]. There's just one rare thing specific to PowerShell that uses ::, so (i) apps probably don't support it, and (ii) shell should inspect the argument anyway.

Other option is to have syntax like --map-from-host=C:\path --map-to-guest=/c/path, but it's long and we probably can't validate it properly with current parser. One time when I was telling @sunfishcode how docopt (current parser) is bugged, he told that we're planning to change it to clap anyway. Taking a quick look at clap we probably can get interface like --mapdir C:\path /c/path --mapdir D:\another /d/another (I haven't dig into the details).

So, I suggest: use temporarily '::' just for the simplicity, and once we change the command line arguments parser, we can try using optional arguments with two values if that's possible.

@kubkon

Note that both are equivalent on Windows.

They are not. Look at [1]. The second one is actually a relative path. Windows remembers current drive letter (which is used if none is specified), and for each drive, it remembers current path (so c:file.txt is equivalent to file.txt if current drive is C:.

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

@tschneidereit
Copy link
Member

I also considered standardization of the paths. However, I don't know much about what programs we are currently compiling successfully and what may break, so it's hard for me to favor some specific option.

I think we just fundamentally have to do something here. We can't support OS-specific paths, at least for preopened files, i.e. where the path is set before runtime. Whether we also need to normalize all dynamic paths is a different question. If a user gets a list of all files in a directory, do we try to normalize their names? I'm not sure that's really viable.

So, I suggest: use temporarily '::' just for the simplicity, and once we change the command line arguments parser, we can try using optional arguments with two values if that's possible.

That sounds good to me, yes. Thank you for the explanation, and for the research!

@mrowqa
Copy link
Contributor Author

mrowqa commented Aug 5, 2019

I think we just fundamentally have to do something here. We can't support OS-specific paths, at least for preopened files, i.e. where the path is set before runtime. Whether we also need to normalize all dynamic paths is a different question. If a user gets a list of all files in a directory, do we try to normalize their names? I'm not sure that's really viable.

I haven't worked on WASI, so I don't know how exactly it's implemented. @kubkon ?
And I don't understand all of the terminology: is dynamic path a path obtained during runtime, i.e. when program operates on Path structs? Is it host or guest path? Do you mean something specific by normalization of file names? Some kind of remapping their names to potentially something different on host system, or maybe some kind of canonical form just for internal use?

That sounds good to me, yes. Thank you for the explanation, and for the research!

You're welcome :) . I'll update the PR in just a moment.

@kubkon
Copy link
Member

kubkon commented Aug 6, 2019

@tschneidereit I also considered standardization of the paths. However, I don't know much about what programs we are currently compiling successfully and what may break, so it's hard for me to favor some specific option.

--mapdir="|/c/path|C:\path|"
That's one good option, but it looks unnatural to me. Having static separator is nicer and I have done actually research which separator should be safe coming up with ::[1]. There's just one rare thing specific to PowerShell that uses ::, so (i) apps probably don't support it, and (ii) shell should inspect the argument anyway.

Other option is to have syntax like --map-from-host=C:\path --map-to-guest=/c/path, but it's long and we probably can't validate it properly with current parser. One time when I was telling @sunfishcode how docopt (current parser) is bugged, he told that we're planning to change it to clap anyway. Taking a quick look at clap we probably can get interface like --mapdir C:\path /c/path --mapdir D:\another /d/another (I haven't dig into the details).

So, I suggest: use temporarily '::' just for the simplicity, and once we change the command line arguments parser, we can try using optional arguments with two values if that's possible.

@kubkon

Note that both are equivalent on Windows.

They are not. Look at [1]. The second one is actually a relative path. Windows remembers current drive letter (which is used if none is specified), and for each drive, it remembers current path (so c:file.txt is equivalent to file.txt if current drive is C:.

[1] https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

Ah, that's actually good to know, thanks for clarifying this!

@tschneidereit
Copy link
Member

I don't understand all of the terminology:

That's because I'm not an expert in this domain and am probably not using the right terminology myself :) I'll explain what I meant more specifically:

is dynamic path a path obtained during runtime, i.e. when program operates on Path structs?

That's what I meant, yes, as opposed to a preopened path. (See this comment for some more thoughts on how to handle those.)

Is it host or guest path?

Good question! A lot, perhaps all, of this only matters if we're talking about both, really: how do host and guest paths map to each other.

Do you mean something specific by normalization of file names? Some kind of remapping their names to potentially something different on host system, or maybe some kind of canonical form just for internal use?

What I meant by "normalization" is a way to ensure that you're talking about the same path regardless of the host system. Unfortunately I think that's just not possible. For absolute paths, we might be able to handle that by treating them as opaque and just leaving it up to the host to map them to something useful. However, for relative paths used in openat, we're running into unresolvable issues, I think. There are probably more, but case sensitivity is the most obvious example:

Say you have two calls to openat

openat(dir_fd, "FileName");
openat(dir_fd, "filename");

These will result in either the same or two different files being opened, depending on whether the underlying file system is case sensitive or not. That means code that uses both of these—perhaps at entirely different places in the code base—will behave differently e.g. on (stock-) Windows and macOS vs (stock-) Linux.

@tschneidereit
Copy link
Member

I think we should land this. If we need to change things later, we always can.

One thing that'd be good is to have some kinds of tests, ideally ones that work across all supported OSs. Not sure how doable that is with out current test harness.

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 6, 2019

(stock-) Windows and macOS vs (stock-) Linux.

macOS can be made to behave case sensitive too using taskpolicy -x: https://worthdoingbadly.com/casesensitive-iossim/

@tschneidereit
Copy link
Member

macOS can be made to behave case sensitive too using taskpolicy -x

What happens if such a process tries to create two files with the names FileName and filename in the same directory?

In any case, that doesn't seem to help us much, given that WASI needs to work across a wide range of different OSs, with different filesystem semantics (and quirks.)

@bjorn3
Copy link
Contributor

bjorn3 commented Aug 6, 2019

What happens if such a process tries to create two files with the names FileName and filename in the same directory?

I don't know. You also have to have to be root to use taskpolicy -x anyway, so it is not useful in this case.

@sunfishcode
Copy link
Member

This PR looks good to me; please also update the documentation in docs/WASI-tutorial.md for the new command-line syntax.

@mrowqa
Copy link
Contributor Author

mrowqa commented Aug 6, 2019

@tschneidereit thank you for the explanation! Now, it sounds familiar - I had discussed parts of it with @sunfishcode and @kubkon (even opened https://github.com/CraneStation/wasi-common/issues/44).

What exactly would like to test? The --mapdir option? Or is it a general idea for testing the "WASI filesystem"? If this is the second case, it might be a good idea to open an issue.

@sunfishcode oh, I missed that one occurrence. I have updated it.

@programmerjake
Copy link

one (probably not very good) solution is to manually check if there are files with the same case-insensitive name and either prohibit creating a differently-cased name, rename to the case on disk, or rename to the case specified in the filesystem call, with a check to bypass the case-checking/case-conversion if there are already 2 or more files with matching names. when opening an existing file, it would always fail if the case doesn't match.

this would be kinda the intersection between case-sensitive and case-insensitive filesystems.

@sunfishcode
Copy link
Member

@programmerjake Yeah, that's a good point. We could do some amount of emulation if we wanted a greater degree of portability. However we'd also pay a fair amount of overhead -- if I understand what you're saying, we'd have to try 2N files where N is the number of alphabetical characters, or scanning all the filenames in the directory, and we'd also have to think about racing with other processes.

@tschneidereit I agree that we want tests, though in this case we don't have tests for the old --mapdir function either, and we don't yet have a good way to add them. I expect #220 will be a step in a direction which will make this easier, but in the meantime, I think it's useful to land this PR now so that we can unblock people working on Windows.

@sunfishcode sunfishcode merged commit 17d676e into bytecodealliance:master Aug 6, 2019
@mrowqa mrowqa deleted the mapdir-windows branch August 7, 2019 00:22
@programmerjake
Copy link

@programmerjake Yeah, that's a good point. We could do some amount of emulation if we wanted a greater degree of portability. However we'd also pay a fair amount of overhead -- if I understand what you're saying, we'd have to try 2N files where N is the number of alphabetical characters, or scanning all the filenames in the directory, and we'd also have to think about racing with other processes.

at a minimum, we could have open fail when the case doesn't match. that's relatively inexpensive to implement and would probably catch most non-portable cases.
on Win32, GetFileInformationByHandleEx would probably work, though I don't know if it uses the case from the filesystem or from the createfile call. I don't have a Windows installation readily available to test on.

@tschneidereit
Copy link
Member

at a minimum, we could have open fail when the case doesn't match. that's relatively inexpensive to implement and would probably catch most non-portable cases.

@programmerjake do you mean we'd fail if we try to open Filename on a case-insensitive filesystem where filename exists? That does indeed seem reasonably cheap, but I'm not sure how much it'd help with portability.

@programmerjake
Copy link

at a minimum, we could have open fail when the case doesn't match. that's relatively inexpensive to implement and would probably catch most non-portable cases.

@programmerjake do you mean we'd fail if we try to open Filename on a case-insensitive filesystem where filename exists? That does indeed seem reasonably cheap, but I'm not sure how much it'd help with portability.

@tschneidereit yes, open would fail in that case.

I think it'll help portability since, in my (admittedly limited) experience, the most common case-sensitivity issue I've encountered is code that refers to other files using the wrong case when opening for reading.

Common examples include resource paths and #include directives in C-style languages (not that compiling code using a webassembly compiler would be common).

@programmerjake
Copy link

@tschneidereit one additional reason I think it will help is that it will catch issues during development on case-insensitive filesystems as opposed to when the code is tested on a case-sensitive filesystem (which it may not be, some people only ever test on Windows)

@programmerjake
Copy link

@tschneidereit one additional reason I think it will help is that it will catch issues during development on case-insensitive filesystems as opposed to when the code is tested on a case-sensitive filesystem (which it may not be, some people only ever test on Windows)

created an issue for case-sensitivity: https://github.com/WebAssembly/WASI/issues/72

@tschneidereit
Copy link
Member

@programmerjake excellent, thank you!

@indolering
Copy link

@programmerjake @sunfishcode

Note that the runtime behavior of checking for case-folded filenames is not O(n^chars), it is O(files) or O(log files): you just convert all of the filenames in a directory to a canonical case-folded name (toCasefold(NFC(filename))) before you compare. This is known as caseless matching in Unicode parlance, which is fast and vectorizable.

Wine and Samba run into issues with deeply nested directories: /foo/bar/readme.txt, /Foo/bar/readme.txt. But that's because they are providing a compatibility layer for unmodified binaries. In our case, we could just throw an error when there are multiple casefolded matches.

grishasobol pushed a commit to grishasobol/wasmtime that referenced this pull request Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants