Skip to content

Releases: aws/aws-sdk-pandas

AWS Data Wrangler 1.9.3

08 Sep 22:53
Compare
Choose a tag to compare

Bug Fix

  • Fix bug for wr.s3.read_parquet() with timezone offset. #385

Thanks

We thank the following contributors/users for their work on this release:

@chrisrana, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

AWS Data Wrangler 1.9.2

07 Sep 16:26
Compare
Choose a tag to compare

Bug Fix

  • Fix issues in reading Parquet files with timestamp (timezone aware) columns. #382 #383

Thanks

We thank the following contributors/users for their work on this release:

@tasq-inc, @chrisrana, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

AWS Data Wrangler 1.9.1

05 Sep 15:50
Compare
Choose a tag to compare

Enhancements

  • Significant Amazon S3 I/O speed up for big files #377
  • Create Parquet Datasets with columns with CamelCase names #380

Bug Fix

  • Read Parquet error for some files created by DMS #376

Docs

  • Few updates.

Thanks

We thank the following contributors/users for their work on this release:

@jarretg, @chrisrana, @vikramshitole, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

AWS Data Wrangler 1.9.0

01 Sep 01:20
Compare
Choose a tag to compare

Breaking changes

  • Global configuration s3fs_block_size was replaced by s3_block_size #370

New Functionalities

  • Automatic recovery of Pandas indexes from Parquet files. #366
  • Automatic recovery of Pandas time zones from Parquet files. #366
  • Optional schema evolution disabling through the new schema_evolution argument. #353

Enhancements

  • s3fs dependency was replaced by builtin code. #370
  • Significant Amazon S3 I/O speed up for high latency environments (e.g. local, on-premises). #370

Bug Fix

  • Improve NaN handling. #362
  • Sanitise table name for partitions insertion #360

Docs

  • Few updates.

Thanks

We thank the following contributors/users for their work on this release:

@isrsal, @bppont, @weishao-aws, @alexifm, @Digma, @samcon, @TerrellV, @msantino, @alvaropc, @luigift, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!

AWS Data Wrangler 1.8.1

11 Aug 18:26
c6e4ff0
Compare
Choose a tag to compare

Bug Fix

  • Fix NaN values handling for wr.athena.read_sql_*(). #351

Docs

Thanks

We thank the following contributors/users for their work on this release:

@czagoni, @josecw, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!

AWS Data Wrangler 1.8.0

09 Aug 20:03
Compare
Choose a tag to compare

New Functionalities

  • wr.s3.to_parquet() now has max_rows_by_file argument. #283
  • Support for Unix path pattern matching (*, ?, [seq], [!seq]) for any list/read/delete/copy function on S3. #322

Enhancements

  • Mypy applied with strict mode.

Bug Fix

  • Fix unnecessary table versioning (glue catalog) creation for wr.s3.to_parquet() during appends. #342
  • Lack of sanitisation in indexes names for wr.s3.to_parquet/csv(). #343

Docs

Thanks

We thank the following contributors/users for their work on this release:

@Thiago-Dantas, @andre-marcos-perez, @ericct, @marcelo-vilela, @edvorkin, @nicholas-miles, @chrispruitt, @rparthas ,@igorborgest.


P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!

AWS Data Wrangler 1.7.0

30 Jul 13:58
Compare
Choose a tag to compare

Breaking changes

  • The partitioned parquet reading now has a different approach for pushdown filters. For details check the tutorial

New Functionalities

Enhancements

  • Support for PyArrow 1.0.0 #337
  • Support for Pandas 1.1.0
  • Support writing encrypted redshift copy manifest to S3 #327
  • wr.athane.read_sql_*() now accepts empty results #299
  • Allow connect_args to be passed when creating an SQL engine from a glue connection #309
  • Add skip_header_line_count argument to wr.catalog.create_csv_table() #338

Bug Fix

  • Add missing type annotations and fix types in docstrings. #321
  • KeyError: 'StatementType' with Athena using max_cache_seconds #323
  • wr.s3.read_csv() slow with chunksize #324
  • wr.s3.read_csv() with "chunksize" does not forward pandas_kwargs "encoding" #330
  • Ensure DataFrame mutability for wr.athane.read_sql_*() w/ ctas_approach=True #335

Docs

  • Several small updates.

Thanks

We thank the following contributors/users for their work on this release:

@kylepierce, @davidszotten, @meganburger, @erikcw, @JPFrancoia, @zacharycarter, @DavideBossoli88, @c-line, @anand086, @jasadams, @mrtns, @schot, @koiker, @flaviomax, @bryanyang0528, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!

AWS Data Wrangler 1.6.3

12 Jul 14:40
Compare
Choose a tag to compare

New Functionalities

  • Add wr.catalog.get_partitions(). #305

Enhancements

  • Improving Decimal casting.

Bug Fix

  • Fix support for support for boto3 >= 1.14.18. 🐞 #315

Docs

  • Add Spark Table Interoperability tutorial.
  • General small updates.

Thanks

We thank the following contributors/users for their work on this release:

@jasadams, @bryanyang0528, @qemtek, @igorborgest.


P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!

AWS Data Wrangler 1.6.2

01 Jul 19:39
Compare
Choose a tag to compare

Enhancements

  • Now casting columns before append on an existing table only if necessary (wr.s3.to_parquet()).
  • Add retry mechanism for InternalError on s3 object deletion.
  • Add handling of immutable numpy arrays. (flag.writeable==False)

P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!

P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).

AWS Data Wrangler 1.6.1

26 Jun 02:55
Compare
Choose a tag to compare

Enhancements

  • Casting support for any column type to string using dtype argument on wr.s3.to_parquet()

Bug Fix

  • General bugs related to Athena Cache. 🐞

Docs

  • General small updates.

P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!

P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).