Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing nested struct types when using s3.to_parquet #480

Closed
nivf33 opened this issue Dec 14, 2020 · 5 comments
Closed

Error parsing nested struct types when using s3.to_parquet #480

nivf33 opened this issue Dec 14, 2020 · 5 comments
Assignees
Labels
minor release Will be addressed in the next minor release
Milestone

Comments

@nivf33
Copy link

nivf33 commented Dec 14, 2020

Hi,

I'm using awswrangler 1.10.0
I've encountered with the following issue when using s3.to_parquet.
I'm trying to append a row as DataFrame to Athena, which one of the fields is of the following type:
{'Name': 'test', 'Type': 'array<struct<a:struct<id:string,name:string>,b:struct<id:string,name:string>>>'}

I'm getting the following error:
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/awswrangler/_data_types.py", line 58, in athena2pyarrow raise exceptions.UnsupportedType(f"Unsupported Athena type: {dtype}") awswrangler.exceptions.UnsupportedType: Unsupported Athena type: strin

I've debugged the code and the issue seems to be from the _data_types.py line 55.
This code is extracting each iteration another layer of the wrapping elements, but it doesn't support nested ones, this is why it gets an error of "strin" after splitting by ',' and the whole structure is ruined.

The issue seems to happen when the table already exists and I'm trying to append more data do it.

@nivf33 nivf33 changed the title Error handling nested array of dict when using s3.to_parquet Error handling nested array of dict when using s3.to_parquet - 1.10.0 Dec 14, 2020
@nivf33 nivf33 changed the title Error handling nested array of dict when using s3.to_parquet - 1.10.0 Error handling nested array of dict when using s3.to_parquet - awswrangler 1.10.0 Dec 14, 2020
@nivf33 nivf33 changed the title Error handling nested array of dict when using s3.to_parquet - awswrangler 1.10.0 Error handling nested object when using s3.to_parquet Dec 14, 2020
@nivf33 nivf33 changed the title Error handling nested object when using s3.to_parquet Error parsing nested struct types when using s3.to_parquet Dec 14, 2020
@igorborgest igorborgest self-assigned this Dec 15, 2020
@igorborgest igorborgest added minor release Will be addressed in the next minor release WIP Work in progress labels Dec 15, 2020
@igorborgest igorborgest added this to the 2.1.0 milestone Dec 15, 2020
@igorborgest
Copy link
Contributor

igorborgest commented Dec 15, 2020

Hi @nivf33, thanks for reporting it.

It already was fixed in our development branch (PR above 👆 ). Could you give it a try?

pip install git+https://github.com/awslabs/aws-data-wrangler.git@feature/nested-struct

@igorborgest
Copy link
Contributor

Released on version 2.1.0

@igorborgest igorborgest removed the WIP Work in progress label Dec 21, 2020
@nivf33
Copy link
Author

nivf33 commented Dec 21, 2020

Hi Igor, sorry for the delay
I'm trying to run the command you've mentioned but I'm getting the following:
WARNING: Did not find branch or tag 'feature/nested-struct', assuming revision or ref.

am I missing anything?
Thanks

@igorborgest
Copy link
Contributor

Oh sorry, the development branch already was deleted. Now you can use it directly from our official release into version 2.1.0

@nivf33
Copy link
Author

nivf33 commented Dec 21, 2020

great it works, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor release Will be addressed in the next minor release
Projects
None yet
Development

No branches or pull requests

2 participants