Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPath formatting of empty netloc breaks compatibility #151

Closed
tehunter opened this issue Sep 29, 2023 · 1 comment · Fixed by #162
Closed

UPath formatting of empty netloc breaks compatibility #151

tehunter opened this issue Sep 29, 2023 · 1 comment · Fixed by #162
Labels
enhancement 🚀 New feature or request

Comments

@tehunter
Copy link
Contributor

tehunter commented Sep 29, 2023

UPath._format_parsed_parts currently drops the netloc portion completely when netloc is empty. This leads to a shortened URI with single slash format.

For example, _format_parsed_parts currently shortens protocol:///path to protocol:/path. This unfortunately breaks compatibility with RFC 3986 format, which is assumed by fsspec and third-party libraries that use it (e.g. pandas). Per RFC 3986, an empty host/authority field can be valid.

UPath should offer an attribute to allow implementations to specify whether empty netloc's are valid or not. In the case where they are valid, it shouldn't treat empty string netloc any differently. Without a flag like this, implementations would need to override _format_parsed_parts to change one line.

I am building a UPath implementation for the boxfs implementation and I have to do just that. Netloc (URI host/authority) doesn't have a meaning in this "protocol", so I plan to leave it blank.

@ap--
Copy link
Collaborator

ap-- commented Sep 30, 2023

Hi @tehunter

Thanks for opening the issue!

Your suggestion definitely sounds like a useful addition. At some point it would also make sense to normalize upaths URIs to the at-least-2-slash versions by default to be better compatible with fsspec. Although I need to give this a bit more thought together with relative URIs.

My current understanding of RFC 3986 is that protocol:/path uri is within RFC 3986 spec and equivalent to protocol:///path unless we are provided with a specific RFC for the protocol (scheme) (see: rfc3986 Section 3: where hier-part would be path-absolute or below)

For now I would say we should add:

class UPath(...):
    _uri_keep_empty_authority: bool = False

Would you want to implement a PR with the changes? (I'm currently working on python 3.12 support)

Cheers,
Andreas 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚀 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants