-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pl.read_parquet
cannot read signed GCS signed url on 0.20, but can on <0.19
#14908
Comments
Hi @stinodego we are having the same issue with the new version. Any idea when this might be prioritized? If you have any pointers on where we can start looking, happy to do that as well. Thanks! |
It stopped working because If you cannot live without signed URL support, you'll have to use fsspec for now to load the data and feed it to Polars, e.g. something like: import fsspec
with fsspec.open(url) as f:
df = pl.read_parquet(f) |
I have the same issue with |
Hi @stinodego just following up on this again in case you had any ideas. Thanks! |
@stinodego friendly ping on this - would really appreciate it if you had any ideas about why |
|
@nameexhaustion can you take a look here? |
My guess is that polars sees the "=" and then infers that is a hive and so is trying to get a list of files which is where the 405 method not allowed is coming from. This is just a wild ass though, I'm just on mobile so haven't verified that at all. |
The only "=" I see in the URL are for the query params (e.g ?X-Amz-Algorithm=x&X-Amz-Credential=y) - everything else is already url encoded. But the theory makes sense to me, as I can read and scan public http urls which don't have query params just fine. |
I think this needs to be more robust. Perhaps, have it split by ”/” and then don't look for "=" in the last part (the file and query parameter part). That, or use a full on URL parsing function to recognize query parameters. |
In case this helps, I tried calling Without query params (and hence no "="), it does a |
The thing I thought was needed is here
After further review, it looks like object_store makes a propfind request when trying to list a directory. On the polars side it tries to list when it thinks there's a glob pattern. polars/crates/polars-io/src/cloud/glob.rs Line 188 in 7888d3b
I can't tell on mobile why it thinks it's a glob pattern. |
Thank you for the prompt fix! I'll report back once this change is released. |
I've just tried this with 1.4.1 and unfortunately still does not work, but with a different error (same url works fine with pandas):
I'm able to create a lazyframe but when I collect it:
|
I've tried this with a CSV file from the same object store and weirdly, |
Hello, still getting the problem reading signed URLs from S3. I think there are two issues here and only the one concerning path expansion was resolved. The other problem with signed URLs remains. I believe this has happened when Polars switched to using |
Checks
Reproducible example
This is on
0.20.14
Log output
Issue description
This uri is from a signed url via the
google.cloud.storage.Client
packageExpected behavior
on
0.19.12
, for example, the reproducible example code worksInstalled versions
The text was updated successfully, but these errors were encountered: