awswrangler.s3.read_parquet_table fails on data which was created by spark.saveAsTable #495

ghost · 2020-12-30T08:55:17Z

Describe the bug
When I try to load the data via read_parquet_table I get an error that magic byte is missing.
ArrowInvalid: Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
The table was created by pyspark write.saveAsTable and can be read by pyspark or athena.
maybe wrangler tries to read non parquet files like SUCCESS etc (see below).
when I read with the same table with read_parquet:
awswrangler.s3.read_parquet(path='path', path_suffix='.parquet') it works but when I omit the path_suffix I get the same error

To Reproduce
private data cannot share.

awswrangler.s3.read_parquet_table(table='table', database='database')

The text was updated successfully, but these errors were encountered:

igorborgest · 2020-12-30T17:31:22Z

@hanan-vian Thanks for reporting that. We will address that in the version 2.3.0.

igorborgest · 2021-01-10T14:40:29Z

Released on version 2.3.0 🚀

ghost added the bug Something isn't working label Dec 30, 2020

igorborgest added enhancement New feature or request and removed bug Something isn't working labels Dec 30, 2020

igorborgest added this to the 2.3.0 milestone Dec 30, 2020

igorborgest self-assigned this Jan 8, 2021

igorborgest added a commit that referenced this issue Jan 9, 2021

Add suffix filters for read_parquet_table() #495

cc5618c

igorborgest closed this as completed Jan 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awswrangler.s3.read_parquet_table fails on data which was created by spark.saveAsTable #495

awswrangler.s3.read_parquet_table fails on data which was created by spark.saveAsTable #495

ghost commented Dec 30, 2020 •

edited by ghost

Loading

igorborgest commented Dec 30, 2020

igorborgest commented Jan 10, 2021

awswrangler.s3.read_parquet_table fails on data which was created by spark.saveAsTable #495

awswrangler.s3.read_parquet_table fails on data which was created by spark.saveAsTable #495

Comments

ghost commented Dec 30, 2020 • edited by ghost Loading

igorborgest commented Dec 30, 2020

igorborgest commented Jan 10, 2021

ghost commented Dec 30, 2020 •

edited by ghost

Loading