-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional tests related to pathToFileURL #248
Changes from all commits
5ab7650
610d8dd
67fa563
c7e6696
a52f82c
0bb3273
53bb06a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
|
||
#include <cstdlib> | ||
#include <iostream> | ||
|
||
#include "ada.h" | ||
|
||
|
||
/** | ||
* Node.js has a function called url.pathToFileURL(path). | ||
* https://nodejs.org/api/url.html#urlpathtofileurlpath | ||
* | ||
* It is explained as follows... | ||
* | ||
* This function ensures that path is resolved absolutely, | ||
* and that the URL control characters are correctly encoded when converting into a File URL. | ||
* | ||
* new URL('/foo#1', 'file:'); // Incorrect: file:///foo#1 | ||
* pathToFileURL('/foo#1'); // Correct: file:///foo%231 (POSIX) | ||
* | ||
* new URL('/some/path%.c', 'file:'); // Incorrect: file:///some/path%.c | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Safari returns There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Firefox too... |
||
* pathToFileURL('/some/path%.c'); // Correct: file:///some/path%25.c (POSIX) | ||
* | ||
* It is unclear why node states that 'file:///some/path%.c' is incorrect. Any path part of URL should | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree. This is the commit that added the function: nodejs/node@eef072f There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same thing is done on the C++ side too: https://github.com/nodejs/node/blob/main/src/node_url.cc#L332 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found this response on Stack Overflow: https://stackoverflow.com/questions/21526313/use-character-in-url-inside-file-name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did a refactor on the Ada migration pull request. Originally it was added in this commit: nodejs/node@cf340fe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Stack Overflow answer makes sense, but I don't see how it helps us or Node. If the user has a path with '%' in it, what is Node going to do ? Percent encoding it does not make it go away. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably it percent encodes it, and later decodes it. There are 2 functions in Node:
|
||
* be percent decoded prior to intepreting it as a path. Yet both /some/path%.c and /some/path%25.c | ||
* should, after percent decoding, become /some/path%.c. Maybe the intention is that the path are to be | ||
* treated verbatim? | ||
* | ||
* For reference, the path percent-encode set is: | ||
* | ||
* U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>), U+003F (?), U+0060 (`), U+007B ({), and U+007D (}). | ||
* C0 controls: range U+0000 NULL to U+001F INFORMATION SEPARATOR ONE, inclusive. | ||
* and all code points greater than U+007E (~). | ||
* | ||
* Thus '%' is allowed as a path character in a URL. | ||
* | ||
* What does the reference to POSIX means? | ||
* | ||
* Under many operating systems, including common Linux distributions and macos, | ||
* both file names path%.c and path%25.c are allowed as one can easily check. | ||
* | ||
* Type the following commands in a shell: | ||
* | ||
* touch path%.c | ||
* touch path%25.c | ||
* ls path* | ||
* | ||
* You can distinguish between the two in path if you use percent encoding: path%25.c and path%2525.c. | ||
* | ||
* Which path names are legal typically does not just depend on the operating system, it also depends | ||
* critical on the file system. | ||
*/ | ||
bool check_path_setters() { | ||
// It seems that the spec allows '%' without two Hex next to it, as long as it is not part | ||
// of a setter. | ||
ada::result url_direct1 = ada::parse("file:///some/path%.c"); | ||
if(url_direct1->get_href() != "file:///some/path%.c") { return false; } | ||
|
||
// This is the equivalent: | ||
ada::result url_direct2 = ada::parse("file:///some/path%25.c"); | ||
if(url_direct2->get_href() != "file:///some/path%25.c") { return false; } | ||
|
||
// Are url_direct1 and url_direct2 the same URL? | ||
|
||
|
||
|
||
ada::result url = ada::parse("file://"); | ||
if(!url->set_pathname("/foo#1")) { return false; } | ||
if(url->get_href() != "file:///foo%231") { return false; } | ||
|
||
url->set_pathname("/some/path%25.c"); | ||
if(url->get_href() != "file:///some/path%25.c") { | ||
std::cerr << url->get_href() << std::endl; | ||
return false; | ||
} | ||
|
||
/** | ||
* https://url.spec.whatwg.org/#path-state | ||
* If c is U+0025 (%) and remaining does not start with two ASCII hex digits, invalid-URL-unit validation error. | ||
* A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, | ||
* are encouraged to report them somewhere. | ||
* A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, | ||
* e.g., through a return statement. | ||
*/ | ||
url->set_pathname("/some/path%.c"); | ||
if(url->get_href() != "file:///some/path%.c") { | ||
std::cerr << url->get_href() << std::endl; | ||
return false; | ||
} | ||
url->set_pathname("/base/Some%20File.js"); | ||
if(url->get_href() != "file:///base/Some%20File.js") { | ||
std::cerr << url->get_href() << std::endl; | ||
return false; | ||
} | ||
std::cout << url->get_pathname() <<std::endl; | ||
|
||
url->set_pathname("/base/Some%2520File.js"); | ||
if(url->get_href() != "file:///base/Some%2520File.js") { | ||
std::cerr << url->get_href() << std::endl; | ||
return false; | ||
} | ||
std::cout << url->get_pathname() <<std::endl; | ||
|
||
url->set_pathname("/base/love?.js"); | ||
if(url->get_href() != "file:///base/love%3F.js") { | ||
std::cerr << url->get_href() << std::endl; | ||
return false; | ||
} | ||
std::cout << url->get_pathname() <<std::endl; | ||
|
||
url->set_pathname("/base/love#.js"); | ||
if(url->get_href() != "file:///base/love%23.js") { | ||
std::cerr << url->get_href() << std::endl; | ||
return false; | ||
} | ||
std::cout << url->get_pathname() <<std::endl; | ||
|
||
std::cout << "Ok." << std::endl; | ||
return true; | ||
} | ||
|
||
int main() { | ||
return check_path_setters() ? EXIT_SUCCESS : EXIT_FAILURE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Safari both this example and
new URL("file:///foo#1")
has a href value offile:///foo#1
. In both of them#1
is considered a hash.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whereas in
let u = new URL('file:///test'); u.pathname = "/foo#1";
foo#1
is considered as query and percent encoded.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anonrig Right. So if you have a path that is "/foo#1", then it needs to be percent encoded. You can either do it yourself, or use ada:
So there is no need for a special function (e.g., pathToFileURL) in this instance. Just use the path setter from ada.
Correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Path setter would do it. I also assume that path setter changes
%
to%25
right? Since it's the other requirement of the function. If yes, we don't have to do a for loop in here: https://github.com/nodejs/node/blob/main/src/node_url.cc#L332