-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional tests related to pathToFileURL #248
Conversation
The node documentation says that this is incorrect... > var u = new URL('/foo#1', 'file:');
> u
URL {
href: 'file:///foo#1',
origin: 'null',
protocol: 'file:',
username: '',
password: '',
host: '',
hostname: '',
port: '',
pathname: '/foo',
search: '',
searchParams: URLSearchParams {},
hash: '#1'
} Well, it is a matter of intent. If you intention that the path be > var u = new URL('file:')
> u.pathname = "/foo#1"
'/foo#1'
> u
URL {
href: 'file:///foo%231',
origin: 'null',
protocol: 'file:',
username: '',
password: '',
host: '',
hostname: '',
port: '',
pathname: '/foo%231',
search: '',
searchParams: URLSearchParams {},
hash: ''
} |
The claim in the node documentation that the following is incorrect... > new URL('/some/path%.c', 'file:');
URL {
href: 'file:///some/path%.c',
origin: 'null',
protocol: 'file:',
username: '',
password: '',
host: '',
hostname: '',
port: '',
pathname: '/some/path%.c',
search: '',
searchParams: URLSearchParams {},
hash: ''
} ... I don't understand. This looks correct. Now, the issue is whether this should be allowed... > var u = new URL('file:')
> u.pathname = "/some/path%.c"
'/some/path%.c'
> u
URL {
href: 'file:///some/path%.c',
origin: 'null',
protocol: 'file:',
username: '',
password: '',
host: '',
hostname: '',
port: '',
pathname: '/some/path%.c',
search: '',
searchParams: URLSearchParams {},
hash: ''
} It looks fine to me but depending on how you read the spec... it is a validation error, suggesting that one ought to log an error. |
* This function ensures that path is resolved absolutely, | ||
* and that the URL control characters are correctly encoded when converting into a File URL. | ||
* | ||
* new URL('/foo#1', 'file:'); // Incorrect: file:///foo#1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Safari both this example and new URL("file:///foo#1")
has a href value of file:///foo#1
. In both of them #1
is considered a hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whereas in let u = new URL('file:///test'); u.pathname = "/foo#1";
foo#1
is considered as query and percent encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anonrig Right. So if you have a path that is "/foo#1", then it needs to be percent encoded. You can either do it yourself, or use ada:
ada::result url = ada::parse("file://");
if(!url->set_pathname("/foo#1")) { return false; }
if(url->get_href() != "file:///foo%231") { return false; }
So there is no need for a special function (e.g., pathToFileURL) in this instance. Just use the path setter from ada.
Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path setter would do it. I also assume that path setter changes %
to %25
right? Since it's the other requirement of the function. If yes, we don't have to do a for loop in here: https://github.com/nodejs/node/blob/main/src/node_url.cc#L332
* new URL('/foo#1', 'file:'); // Incorrect: file:///foo#1 | ||
* pathToFileURL('/foo#1'); // Correct: file:///foo%231 (POSIX) | ||
* | ||
* new URL('/some/path%.c', 'file:'); // Incorrect: file:///some/path%.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safari returns "file:///some/path%.c"
for href for this value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Firefox too...
I get file:///some/path%.c
. It does not try to percent encode the '%' character.
* new URL('/some/path%.c', 'file:'); // Incorrect: file:///some/path%.c | ||
* pathToFileURL('/some/path%.c'); // Correct: file:///some/path%25.c (POSIX) | ||
* | ||
* It is unclear why node states that 'file:///some/path%.c' is incorrect. Any path part of URL should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. This is the commit that added the function: nodejs/node@eef072f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing is done on the C++ side too: https://github.com/nodejs/node/blob/main/src/node_url.cc#L332
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found this response on Stack Overflow: https://stackoverflow.com/questions/21526313/use-character-in-url-inside-file-name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a refactor on the Ada migration pull request. Originally it was added in this commit: nodejs/node@cf340fe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Stack Overflow answer makes sense, but I don't see how it helps us or Node. If the user has a path with '%' in it, what is Node going to do ? Percent encoding it does not make it go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it percent encodes it, and later decodes it. There are 2 functions in Node:
pathToFileURL
and fileURLToPath
Hello Guy. Can you help us understand better the node pathToFileURL function? It is currently implemented in JavaScript within Node. Node relies on the C++ URL parser ada. For performance reasons, it seems that we could simply ask Node to call the path setter from ada. However, before changing anything, it is best to understand what is already present (Chesterton's fence). Specifically...
In effect, Guy, why don't we just use a standard-compliant path setter ? (I am asking honestly, I am not being argumentative.) |
@jasnell We are wondering about the pathToFileURL in Node. If you have any insights, please share. |
|
Thanks @guybedford. That's useful. @TimothyGu can you help? Here what the path setter in ada does right now... ada::result url = ada::parse("file://");
url->set_pathname(....)
url->get_href()....
It seems to me (but I could be wrong) that all we need for ada->set_pathname to be a replacement for Another issue that you can see is that But if we had something like get_pathname_decoded, we could round trip and allow systems such as Node to quickly retrieve the path. It would also make testing possible: we could have a routine that inputs all sorts of valid paths and we could make sure that ada encodes and decodes them. It seems to be related to these issues...
Motivation: Pushing these computations in ada (instead of leaving them to JavaScript) could improve the performance. |
Not entirely sure what the question here is. I think "well-formed" file URLs should not have non-percent-encoding Also, you should be careful about UNC paths on Windows, which is a combination of hostname, port, and pathname. |
Let me make it precise. The spec. provides us with 'pathname setters' (search 'pathname setter' in https://url.spec.whatwg.org/#url-parsing). The 'pathname setters' will do percent encoding for you (see below) based on a percent-encoding set. What do we think it should do in the following cases: ada::result url = ada::parse("file://");
url->set_pathname("/base/Some%20File.js")
url->get_href() == ? E.g., is it ada::result url = ada::parse("file://");
url->set_pathname("/base/ab%xy.txt")
url->get_href() == ? Do we get That is my question in short. Let me elaborate. The path percent-encode set is U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>), U+003F (?), U+0060 (`), U+007B ({), and U+007D (}), C0 controls: range U+0000 NULL to U+001F INFORMATION SEPARATOR ONE, inclusive and all code points greater than U+007E (~). The U+0025 (%) is not part of the path percent-encode set. I agree that the spec. says that It is interesting to look into why it is a problem. We have that URL-path-segment string must be one of the following: zero or more URL units excluding U+002F (/) and U+003F (?). In turn, the URL units are URL code points (URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters) and percent-encoded bytes (A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits.) Here is what the spec says about that puzzle... "Of the possible values for the percentEncodeSet argument only two end up encoding U+0025 (%) and thus give “roundtripable data”: component percent-encode set and application/x-www-form-urlencoded percent-encode set. The other values for the percentEncodeSet argument — which happen to be used by the URL parser — leave U+0025 (%) untouched and as such it needs to be percent-encoded first in order to be properly represented." It specifically says that the parser leaves % untouched !!! The component percent-encode set is the path percent-encode set plus U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,), t) and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E (^), inclusive, and U+007C (|). See how it adds U+0025? So it is deliberately the case that U+0025 is to be left untouched. The specification provides an example...
So clearly, it suggests that in some cases, '%' is not itself percent encoded. Reference: |
Here's what Node.js v18 (pre-Ada) does, and also what the spec expects: > u = new URL('file://'); u.pathname = '/base/Some%20File.js'; u.href
'file:///base/Some%20File.js'
> u = new URL('file://'); u.pathname = '/base/Some%xyFile.js'; u.href
'file:///base/Some%xyFile.js' This behavior is consistent with the parser itself: > u = new URL('file:///base/Some%20File.js'); u.href
'file:///base/Some%20File.js'
> u = new URL('file:///base/Some%xyFile.js'); u.href
'file:///base/Some%xyFile.js' Note, the second URL is, as you note, "invalid". In practice, the "valid" equivalent would have Now, the spec is only binding for the JavaScript API that Node.js and browsers. Whether or not you think this is a good idea for Ada's C++ API is up to you – but I think it's quite reasonable to stick with the JS API. However, it's important to note its limitations: it would be a mistake to call |
That's the key point. |
File path <-> URL conversion is not standardised. See whatwg/fetch#1338 POSIX style paths are fairly straightforward. Windows paths are more complex. For my own URL library (a Swift implementation of the WHATWG standard), I created a fairly comprehensive test suite based on experimenting with the platform's native The implementation is here. I'm not sure how easy it is to follow if you're not used to Swift, but I've tried to document everything thoroughly. I'd love for everybody to come together to precisely document how this conversion should work. |
Closing this PR in favour of issue #309 |
Node.js has a distinct pathToFileURL function. This PR adds some related tests.