Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional tests related to pathToFileURL #248

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/url-setters.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ namespace ada {

bool url::set_pathname(const std::string_view input) {
if (has_opaque_path) { return false; }
path = "";
path.clear();
return parse_path(input);
}

Expand Down
1 change: 1 addition & 0 deletions tests/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ endif()
add_cpp_test(basic_fuzzer)
add_cpp_test(from_file_tests)
add_cpp_test(basic_tests)
add_cpp_test(pathtourl)

96 changes: 96 additions & 0 deletions tests/pathtourl.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@

#include <cstdlib>
#include <iostream>

#include "ada.h"


/**
* Node.js has a function called url.pathToFileURL(path).
* https://nodejs.org/api/url.html#urlpathtofileurlpath
*
* It is explained as follows...
*
* This function ensures that path is resolved absolutely,
* and that the URL control characters are correctly encoded when converting into a File URL.
*
* new URL('/foo#1', 'file:'); // Incorrect: file:///foo#1
Copy link
Member

@anonrig anonrig Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Safari both this example and new URL("file:///foo#1") has a href value of file:///foo#1. In both of them #1 is considered a hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas in let u = new URL('file:///test'); u.pathname = "/foo#1"; foo#1 is considered as query and percent encoded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anonrig Right. So if you have a path that is "/foo#1", then it needs to be percent encoded. You can either do it yourself, or use ada:

     ada::result url = ada::parse("file://");
     if(!url->set_pathname("/foo#1")) { return false; }
     if(url->get_href() != "file:///foo%231") { return false; }

So there is no need for a special function (e.g., pathToFileURL) in this instance. Just use the path setter from ada.

Correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path setter would do it. I also assume that path setter changes % to %25 right? Since it's the other requirement of the function. If yes, we don't have to do a for loop in here: https://github.com/nodejs/node/blob/main/src/node_url.cc#L332

* pathToFileURL('/foo#1'); // Correct: file:///foo%231 (POSIX)
*
* new URL('/some/path%.c', 'file:'); // Incorrect: file:///some/path%.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safari returns "file:///some/path%.c" for href for this value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Firefox too...
I get file:///some/path%.c. It does not try to percent encode the '%' character.

* pathToFileURL('/some/path%.c'); // Correct: file:///some/path%25.c (POSIX)
*
* It is unclear why node states that 'file:///some/path%.c' is incorrect. Any path part of URL should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. This is the commit that added the function: nodejs/node@eef072f

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@lemire lemire Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. So that's your code here... why do you change '%' to '%25'?
Screenshot 2023-02-28 at 6 40 59 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a refactor on the Ada migration pull request. Originally it was added in this commit: nodejs/node@cf340fe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Stack Overflow answer makes sense, but I don't see how it helps us or Node. If the user has a path with '%' in it, what is Node going to do ? Percent encoding it does not make it go away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it percent encodes it, and later decodes it. There are 2 functions in Node:

pathToFileURL and fileURLToPath

* be percent decoded prior to intepreting it as a path. Yet both /some/path%.c and /some/path%25.c
* should, after percent decoding, become /some/path%.c. Maybe the intention is that the path are to be
* treated verbatim?
*
* For reference, the path percent-encode set is:
*
* U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>), U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).
* C0 controls: range U+0000 NULL to U+001F INFORMATION SEPARATOR ONE, inclusive.
* and all code points greater than U+007E (~).
*
* Thus '%' is allowed as a path character in a URL.
*
* What does the reference to POSIX means?
*
* Under many operating systems, including common Linux distributions and macos,
* both file names path%.c and path%25.c are allowed as one can easily check.
*
* Type the following commands in a shell:
*
* touch path%.c
* touch path%25.c
* ls path*
*
* You can distinguish between the two in path if you use percent encoding: path%25.c and path%2525.c.
*
* Which path names are legal typically does not just depend on the operating system, it also depends
* critical on the file system.
*/
bool check_path_setters() {
// It seems that the spec allows '%' without two Hex next to it, as long as it is not part
// of a setter.
ada::result url_direct1 = ada::parse("file:///some/path%.c");
if(url_direct1->get_href() != "file:///some/path%.c") { return false; }

// This is the equivalent:
ada::result url_direct2 = ada::parse("file:///some/path%25.c");
if(url_direct2->get_href() != "file:///some/path%25.c") { return false; }

// Are url_direct1 and url_direct2 the same URL?



ada::result url = ada::parse("file://");
if(!url->set_pathname("/foo#1")) { return false; }
if(url->get_href() != "file:///foo%231") { return false; }

url->set_pathname("/some/path%25.c");
if(url->get_href() != "file:///some/path%25.c") {
std::cerr << url->get_href() << std::endl;
return false;
}

/**
* https://url.spec.whatwg.org/#path-state
* If c is U+0025 (%) and remaining does not start with two ASCII hex digits, invalid-URL-unit validation error.
* A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers,
* are encouraged to report them somewhere.
* A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly,
* e.g., through a return statement.
*/
if(url->set_pathname("/some/path%.c")) {
std::cerr << "A percent character in 'set_pathname' should be followed by two ASCII hex digits." << std::endl;
}
if(url->get_href() != "file:///some/path%.c") {
std::cerr << url->get_href() << std::endl;
return false;
}
return true;
}

int main() {
return check_path_setters() ? EXIT_SUCCESS : EXIT_FAILURE;
}