Release AWS Data Wrangler 2.8.0 · aws/aws-sdk-pandas

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
Clarified docs around potential in-place mutation of dataframe when using to_parquet #669

Enhancements

Enable parallel s3 downloads (~20% speedup) 🚀 #644
Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
Enable LOCK before concurrent COPY calls in Redshift #665
Make use of Pyarrow iter_batches (>= 3.0.0 only) #660
Enable additional options when overwriting Redshift table (drop, truncate, cascade) #671
Reuse s3 client across threads for s3 range requests #684

Bug Fix

Add dtypes for empty ctas athena queries #659
Add Serde properties when creating CSV table #672
Pass SSL properties from Glue Connection to MySQL #554

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Data Wrangler 2.8.0

Caveats

Documentation

Enhancements

Bug Fix

Thanks