Releases: aws/aws-sdk-pandas
AWS Data Wrangler 1.0.2
New Functionalities
Enhancements
- Add
validate_schema
to wr.s3.to_parquet() #167
Bug Fix
- Add CSV Dataset utilities to wr.s3.to_csv #170
- Fix CSV decompression #175
- Fix missing
boto3_session
#172
Thanks
We thank the following contributors/users for their work on this release:
@vfrank66, @JPFrancoia, @jewelltp, @hjuhel-cdpq, @jar-no1, @rmlove, @josecw, @igorborgest.
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.0.1
New Functionalities
categories
arg in s3.read_parquet, db.unload_redshift, athena.read_sql_query [#160]
Enhancements
- Athena's table and columns names sanitisation revisited [#161]
Bug Fix
- Add support for Athena queries on workgroups without encryption [#159]
Thanks
We thank the following contributors/users for their work on this release:
@vfrank66, @nitin-kakkar, @sapientderek, @nagomiso, @igorborgest.
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.0.0
1.0.0 🎉
Check out the brand new documentation page!
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.3.2
New Functionalities
- Add
header
andfilename
arguments to Pandas.to_csv()
Enhancements
- Pandas.read_parquet() will return Int64 for integers with null values mixed #132
- Pandas.to_redshift() now is able to cast Int64 for integers with null values mixed #132
Bug Fixies
- s3.head_object_with_retry() public again #133
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.3.1
New Functionalities
- Add pandas.read_fwf(), read_fwf_list(), read_fwf_prefix() for fixed-width files #131
- Support for compressed files for pandas.read_csv(), read_csv_list() and read_csv_prefix() #129
- Support for consistent view on emr.create_cluste() #130
Enhancements
- Support for Python 3.8
- Bumping Pandas version to 1.0.1
- Bumping PyArrow version to 0.16.0
Docs
- New documentation page
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.3.0
Enhancements
- Support for Pandas 1.0.0
- Support for all pandas.read_csv() arguments
- Support for custom VARCHAR length for Aurora and Redshift
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.2.6
Enhancements
- Smaller Lambda layers #113
- Support for categorical partitions for Pandas.to_parquet() #115
- Support for RangeIndex for Pandas.to_parquet() #111
- Add columns parameter for Pandas.to_csv() #110
- Add columns parameter for Pandas.to_aurora() #110
- Improving NaN handling during Pandas.read_sql_athena()
- Small performance improvements
Bugfixes
- Fixing bug to unload null values from Aurora #114
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.2.5
Enhancements
- Pandas.to_aurora() improvements
- Pandas.to_redshift() improvements
- Pandas.read_sql_athena(ctas_approach=True) improvements
- Pandas.read_parquet() improvements
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.2.1
Enhancements
- Support for empty dataframe for Pandas.read_sql_athena(ctas_approach=True)
- Cleaning temp S3 files for Pandas.read_sql_athena(ctas_approach=True)
- Inverting file format and file compression extensions (key suffix) (Hadoop/Spark/Hive compatibility)
- Aurora ingestion revisited
- Bumping dependencies version
- Add Pandas.read_csv_prefix()
- Improve Athena._normalize_name() rules
- Improving autocomplete support
- Simplifying everything on Sagemaker
- Adding Glue.get_connection()
- Adapt read_sql_athena(ctas_approach=True) for eventual consistency caveats.
Bugfixes
- Fixing bug to fetch Glue tables comments
- Fixing Spark for default Session
Docs
- Add athena_nested.ipynb tutorial
- Add catalog_and_metadata.ipynb tutorial
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.2.0
Enhancements
- Add description, parameters and column's comments as arguments to all methods that creates any Glue tables (METADATA).
- Add several methods to explore the Glue Catalog.
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).