-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: file: URL separator deduplication #552
Comments
To be fair, It is a bit different, as #232 collapsed a leading sequence of slashes only, somewhat mimicking the ignoring of multiple slashes before the host in other special URLs. This seems to call for collapsing all slashes in the path. There also were interactions with the host that made things rather complex, especially to find a good trade-off between matching the different browser behaviours. We recently considered in #606 to add a percent-encoding-equivalence / normalisation to the URL equivalence section. The same could be done for (some necessarily limited form of) file-path normalisation. Then the parser (ao.) would not become more complex. Whether such a thing should be applied by default, is a question that I don't feel qualified to answer though. Browsers in any case, don't do this. @karwa has done a lot of research on file paths, posted in #621 (note #618 as well). I think this would be part of that project. It's some amount of work though! |
I think the proposal would be to just dedupe |
If you don't consider the hostname or windows drive letters? I don't see how there would be interactions with other parts of the URL. But it would equate more file URLs. I'd be surprised to find systems that allow empty directory names, but I would want to be careful about normalising too much. File path normalisation is quite a thing on its own. |
Certainly agreed, I think for the most part we've accepted the state of percent encoding at this point. This issue was specifically about the |
Given browser behavior I would tend toward this being something you do after parsing if you need it. |
I don't think it is an unreasonable proposal. But it requires answering a universal question about file-paths and that makes it a difficult issue. Specifically: does this normalisation agree with all file-paths, for all systems, and are we willing to impose it on future systems as well? The normalisation itself isn't complicated and quite natural, so if that question can be answered with a convincing "yes" then I wouldn't be opposed, at least. Related: RFC8089 |
RFC 8089 lists two cases, which doesn't play well which separator deduplication, see: https://datatracker.ietf.org/doc/html/rfc8089#appendix-E.3.2
If I understand this proposal correctly, then separator deduplication will change their meaning, as they become: |
Thanks |
@rmisev the UNC string in a file URL, as far as I'm aware, is always of the form |
The quote with these examples is taken from RFC 8089: https://datatracker.ietf.org/doc/html/rfc8089#appendix-E.3.2 |
@rmisev in Node.js we explicitly create |
I guess this isn't a blocker though - it just complicates the story a little:
It still seems like it wouldn't be that complex? |
Right now, with the exception of drive letters, pathnames are pretty consistent between different schemes. I think it'd be unfortunate to have the slash-collapsing behavior just for Additionally, parsing I would really want to preserve this level of interop, which is quite rare in |
Reclassifying this as something we could potentially offer on top of #729, but won't change the parser for. Hope that's acceptable. |
I would like to propose that file URLs include separator deduplication in the spec as a rule to match the file system identifiers themselves more closely.
The reason for this is that Node.js, Deno, and other platforms, build tools and tooling use
file:
URLs as their key in the module registry when loading ES modules. Whenever there are duplicate separators, this causes a double entry in the registry which can lead to duplicate module execution and a lack of code sharing.These platforms and build tools are now the primary consumers of
file:
URLs since browsers no longer encourage the use case at all, therefore it is important that this spec can aim to meet their needs as primary consumers of this aspect of the spec going forward.URL resolution already converts
\
into/
. This leaves only casing, percent encoding, query parameters and hashes as the comprehensive list of remaining non-unique components of the file URL representation's associated path representation.I understand the main consideration for such a change is considering the potentially major change in browsers, but perhaps it would not be so difficult since browsers so carefully discourage file URLs and the issue is rather that until now there hasn't been pressure on this use case from these new file modules platform directions.
This would smooth a number of environment bugs that otherwise need custom code to handle.
In Node.js we effectively add this renormalization step to all module resolution usage of file URLs. Other cases of fileURL resolution would benefit.
References:
The text was updated successfully, but these errors were encountered: