Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mishandling of wildcard characters in read_parquet #564

Closed
Malkard opened this issue Feb 17, 2021 · 4 comments
Closed

Mishandling of wildcard characters in read_parquet #564

Malkard opened this issue Feb 17, 2021 · 4 comments
Assignees
Labels
documentation Improvement or bugfixes on docs ready to release
Milestone

Comments

@Malkard
Copy link

Malkard commented Feb 17, 2021

Describe the bug
When trying to read a single file containing wildcard characters (in our case, '['), the read_parquet function fails to find the file:

wr.s3.read_parquet(path="s3://bucket/some_file[something].parquet")

results in a NoFilesFound exception

To Reproduce
Version 2.4.0
Probably exists since the introduction of wildcards with #322

To reproduce: Put a parquet file containing square brackets in a bucket and try to read it.

The implementation should accept a preceding '' to indicate the literalness of a special character.

Workaround
Replace wildcards with '?' before passing to read_parquet.

@Malkard Malkard added the bug Something isn't working label Feb 17, 2021
@maxispeicher
Copy link
Contributor

As a quickfix you could just use glob.escape() before passing the path:

import glob
import awswrangler as wr

s3_path = "s3://bucket/some_file[something].parquet"
escaped_s3_path = glob.escape(s3_path)
wr.s3.read_parquet(path=escaped_s3_path)

@Malkard
Copy link
Author

Malkard commented Feb 18, 2021

Thanks, this works too, and is somewhat cleaner.

Since the intent of the "path" argument of read_parquet is to handle wildcards, this could be considered as the proper way to deal with this use case, instead of my suggestion. In which case a simple mention in the documentation might be valuable.

@igorborgest igorborgest added documentation Improvement or bugfixes on docs and removed bug Something isn't working labels Feb 24, 2021
@igorborgest igorborgest added this to the 2.5.0 milestone Feb 24, 2021
@igorborgest
Copy link
Contributor

Released on version 2.5.0!

@jtlz2
Copy link

jtlz2 commented Mar 2, 2023

This still doesn't work for me (REF: https://stackoverflow.com/questions/75614048/how-do-i-merge-several-parquet-files-into-one-using-awswrangler) - what am I doing wrong? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvement or bugfixes on docs ready to release
Projects
None yet
Development

No branches or pull requests

4 participants