Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HFFS] Hanlde "refs/convert/parquet" and "refs/pr/(\d)+" (PRs) revision correctly #1710

Closed
Wauplin opened this issue Oct 4, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Oct 4, 2023

Mentioned in #1707 (comment).

In HfFileSystem, revision with a "/" doesn't work properly if the / are not url-encoded properly.

Example

"hf://datasets/lambdalabs/pokemon-blip-captions@refs/convert/parquet/default/train/000.parquet"

is not a valid path to file ./default/train/000.parquet in revision refs/convert/parquet.

The correct path is

"hf://datasets/lambdalabs/pokemon-blip-captions@refs%2Fconvert%2Fparquet/default/train/000.parquet"

Since refs/convert/parquet is a quite special revision, let's handle it separately (same for PR revisions like refs/pr/1).

Note: this will make repos with a file refs/convert/parquet/default/train/000.parquet less convenient to request. However, this is very unlikely to happen compared to the normal use case of revision.

cc @mariosasko @Hakimovich99

@lhoestq
Copy link
Member

lhoestq commented Oct 5, 2023

Note that you can use the ~parquet alias:

dd.read_parquet("hf://datasets/jamescalam/llama-2-arxiv-papers-chunked@~parquet")

@Wauplin
Copy link
Contributor Author

Wauplin commented Oct 5, 2023

Resolved by #1712.

@Wauplin Wauplin closed this as completed Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants