Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unstrip_protocol not implemented correctly #619

Open
alexdmiller opened this issue Apr 29, 2024 · 4 comments
Open

unstrip_protocol not implemented correctly #619

alexdmiller opened this issue Apr 29, 2024 · 4 comments

Comments

@alexdmiller
Copy link
Contributor

fs.unstrip_protocol should return an URI that starts with gs://..., but instead returns gcs://..., which is not a valid GCS URI.

@martindurant
Copy link
Member

At the point of running unstrip, we no longer know what protocol prefix was used in the original string, of course. "gcs" and "gs" are both allowed by GCSFS, and the former is by far the more popular with users (see also examples in the documentation). The aim of unstrip is to produce a URL which fsspec will recognise, so either is "valid".
I believe our use of full URLs of the style "gcs://bucket/path/file" may predate google's.

@alexdmiller
Copy link
Contributor Author

Thanks for your reply!

"gcs" and "gs" are both allowed by GCSFS, and the former is by far the more popular with users

How are you measuring that? I believe gs:// is more popular amongst the general public.

...(see also examples in the documentation)

Do you mean the fsspec docs or the Google docs? I can't find a single mention of gcs:// in the Google docs.

The aim of unstrip is to produce a URL which fsspec will recognise

I believe the aim should also be to produce URIs that are digestible by other tools. Currently, fsspec produces bespoke non-standard URIs not recognized by other tools, including Google's official gsutil CLI:

$ gsutil ls gcs://my-bucket
InvalidUrlError: Unrecognized scheme "gcs".

I believe our use of full URLs of the style "gcs://bucket/path/file" may predate google's.

That may be true, and I apologize for not having the context on fsspec. I'm just a new client of the library trying to print out URIs which can be consumed by others on my team.

I should say that fsspec is an absolutely lovely library to work with and I'm such a fan. That's why I so badly want this little kink to be ironed out. Thanks for your hard work!

@martindurant
Copy link
Member

I believe gs:// is more popular amongst the general public.

but not among our user base, for obvious reasons.
Would you be prepared to switch the order of the gcsfs.core.GCSFileSystem.protocol for yourself at runtime? You could propose this as a PR and we can maybe get a feeling for whether this is disruptive for people here.

@alexdmiller
Copy link
Contributor Author

Sure, PR is here: #620

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants