Releases: aws/aws-sdk-pandas
AWS Data Wrangler 1.9.3
Bug Fix
- Fix bug for
wr.s3.read_parquet()
with timezone offset. #385
Thanks
We thank the following contributors/users for their work on this release:
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.9.2
Bug Fix
Thanks
We thank the following contributors/users for their work on this release:
@tasq-inc, @chrisrana, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.9.1
Enhancements
- Significant Amazon S3 I/O speed up for big files #377
- Create Parquet Datasets with columns with CamelCase names #380
Bug Fix
- Read Parquet error for some files created by DMS #376
Docs
- Few updates.
Thanks
We thank the following contributors/users for their work on this release:
@jarretg, @chrisrana, @vikramshitole, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.9.0
Breaking changes
- Global configuration
s3fs_block_size
was replaced bys3_block_size
#370
New Functionalities
- Automatic recovery of Pandas indexes from Parquet files. #366
- Automatic recovery of Pandas time zones from Parquet files. #366
- Optional schema evolution disabling through the new
schema_evolution
argument. #353
Enhancements
s3fs
dependency was replaced by builtin code. #370- Significant Amazon S3 I/O speed up for high latency environments (e.g. local, on-premises). #370
Bug Fix
Docs
- Few updates.
Thanks
We thank the following contributors/users for their work on this release:
@isrsal, @bppont, @weishao-aws, @alexifm, @Digma, @samcon, @TerrellV, @msantino, @alvaropc, @luigift, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.8.1
Bug Fix
- Fix NaN values handling for
wr.athena.read_sql_*()
. #351
Docs
- Instructions for installation in AWS Glue PySpark Jobs. #46
Thanks
We thank the following contributors/users for their work on this release:
@czagoni, @josecw, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.8.0
New Functionalities
wr.s3.to_parquet()
now hasmax_rows_by_file
argument. #283- Support for Unix path pattern matching (
*
,?
,[seq]
,[!seq]
) for any list/read/delete/copy function on S3. #322
Enhancements
- Mypy applied with strict mode.
Bug Fix
- Fix unnecessary table versioning (glue catalog) creation for
wr.s3.to_parquet()
during appends. #342 - Lack of sanitisation in indexes names for
wr.s3.to_parquet/csv()
. #343
Docs
- New Who uses AWS Data Wrangler? section!!!
Thanks
We thank the following contributors/users for their work on this release:
@Thiago-Dantas, @andre-marcos-perez, @ericct, @marcelo-vilela, @edvorkin, @nicholas-miles, @chrispruitt, @rparthas ,@igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.7.0
Breaking changes
- The partitioned parquet reading now has a different approach for pushdown filters. For details check the tutorial
New Functionalities
- Global configuration module - TUTORIAL
- Concurrently partitions write - TUTORIAL
- Flexible Partitions Filter (PUSH-DOWN) - TUTORIAL
- Add Athena query metadata to Pandas DataFrames returned by
wr.athane.read_sql_*()
- TUTORIAL #331 wr.athena.describe_table()
#329wr.athena.show_create_table()
#334- Add
path_ignore_suffix
argument to all read functions #326
Enhancements
- Support for
PyArrow 1.0.0
#337 - Support for
Pandas 1.1.0
- Support writing encrypted redshift copy manifest to S3 #327
wr.athane.read_sql_*()
now accepts empty results #299- Allow connect_args to be passed when creating an SQL engine from a glue connection #309
- Add
skip_header_line_count
argument towr.catalog.create_csv_table()
#338
Bug Fix
- Add missing type annotations and fix types in docstrings. #321
- KeyError: 'StatementType' with Athena using max_cache_seconds #323
wr.s3.read_csv()
slow with chunksize #324wr.s3.read_csv()
with "chunksize" does not forward pandas_kwargs "encoding" #330- Ensure DataFrame mutability for
wr.athane.read_sql_*()
w/ctas_approach=True
#335
Docs
- Several small updates.
Thanks
We thank the following contributors/users for their work on this release:
@kylepierce, @davidszotten, @meganburger, @erikcw, @JPFrancoia, @zacharycarter, @DavideBossoli88, @c-line, @anand086, @jasadams, @mrtns, @schot, @koiker, @flaviomax, @bryanyang0528, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.6.3
New Functionalities
- Add
wr.catalog.get_partitions()
. #305
Enhancements
- Improving Decimal casting.
Bug Fix
- Fix support for support for boto3 >= 1.14.18. 🐞 #315
Docs
- Add Spark Table Interoperability tutorial.
- General small updates.
Thanks
We thank the following contributors/users for their work on this release:
@jasadams, @bryanyang0528, @qemtek, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.6.2
Enhancements
- Now casting columns before append on an existing table only if necessary (
wr.s3.to_parquet()
). - Add retry mechanism for InternalError on s3 object deletion.
- Add handling of immutable numpy arrays. (
flag.writeable==False
)
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.1
Enhancements
- Casting support for any column type to string using
dtype
argument onwr.s3.to_parquet()
Bug Fix
- General bugs related to Athena Cache. 🐞
Docs
- General small updates.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).