Skip to content

AWS Data Wrangler 2.8.0

Compare
Choose a tag to compare
@jaidisido jaidisido released this 19 May 13:40
· 1255 commits to main since this release
b13fcd8

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

  • Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
  • Clarified docs around potential in-place mutation of dataframe when using to_parquet #669

Enhancements

  • Enable parallel s3 downloads (~20% speedup) 🚀 #644
  • Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
  • Enable LOCK before concurrent COPY calls in Redshift #665
  • Make use of Pyarrow iter_batches (>= 3.0.0 only) #660
  • Enable additional options when overwriting Redshift table (drop, truncate, cascade) #671
  • Reuse s3 client across threads for s3 range requests #684

Bug Fix

  • Add dtypes for empty ctas athena queries #659
  • Add Serde properties when creating CSV table #672
  • Pass SSL properties from Glue Connection to MySQL #554

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!