Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awswrangler.s3.read_parquet_table fails on data which was created by spark.saveAsTable #495

Closed
ghost opened this issue Dec 30, 2020 · 2 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ghost
Copy link

ghost commented Dec 30, 2020

Describe the bug
When I try to load the data via read_parquet_table I get an error that magic byte is missing.
ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
The table was created by pyspark write.saveAsTable and can be read by pyspark or athena.
maybe wrangler tries to read non parquet files like SUCCESS etc (see below).
when I read with the same table with read_parquet:
awswrangler.s3.read_parquet(path='path', path_suffix='.parquet') it works but when I omit the path_suffix I get the same error

image

To Reproduce
private data cannot share.

awswrangler.s3.read_parquet_table(table='table', database='database')
@ghost ghost added the bug Something isn't working label Dec 30, 2020
@igorborgest igorborgest added enhancement New feature or request and removed bug Something isn't working labels Dec 30, 2020
@igorborgest igorborgest added this to the 2.3.0 milestone Dec 30, 2020
@igorborgest
Copy link
Contributor

@hanan-vian Thanks for reporting that. We will address that in the version 2.3.0.

@igorborgest
Copy link
Contributor

Released on version 2.3.0 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant