Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: read_parquet mangles huggingface prefix #9689

Closed
1 task done
cboettig opened this issue Jul 24, 2024 · 1 comment · Fixed by #9691
Closed
1 task done

bug: read_parquet mangles huggingface prefix #9689

cboettig opened this issue Jul 24, 2024 · 1 comment · Fixed by #9691
Labels
bug Incorrect behavior inside of ibis
Milestone

Comments

@cboettig
Copy link

cboettig commented Jul 24, 2024

What happened?

duckdb supports reading data directly from huggingface! https://duckdb.org/2024/05/29/access-150k-plus-datasets-from-hugging-face-with-duckdb.html

But read_parquet() in ibis doesn't recognize the hf:// prefix, and decides that it must be some kind of local file path.

What version of ibis are you using?

9.10

What backend(s) are you using, if any?

DuckDB!

Relevant log output

parquet = "hf://datasets/boettiger-lab/gbif/usa_h3/*/*.parquet"
df = con.read_parquet(parquet)
FileNotFoundError: /home/rstudio/huggingface/spaces/gbif/hf:/datasets/boettiger-lab/gbif/usa_h3/*.parquet

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cboettig cboettig added the bug Incorrect behavior inside of ibis label Jul 24, 2024
@cpcloud
Copy link
Member

cpcloud commented Jul 24, 2024

I suspect that we should probably avoid hard-coding the list of ignored prefixes, which we are using so that we can turn files into absolute paths, but I don't think that's necessary either.

I'll remove the hardcoded list, which allows hf and any other prefix someone comes up with to work, and see what breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants