Skip to content

Commit

Permalink
Preparing release 2.9.0 (#751)
Browse files Browse the repository at this point in the history
* Preparing release 2.9.0
* 3.6 compatibility fixes
  • Loading branch information
kukushking authored Jun 17, 2021
1 parent c685abb commit 89b459d
Show file tree
Hide file tree
Showing 24 changed files with 112 additions and 91 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.8.0
current_version = 2.9.0
commit = False
tag = False
tag_name = {new_version}
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING_COMMON_ERRORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in ./.venv/lib/python3.7/site-
Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed.
Installing collected packages: awswrangler, python-Levenshtein
Attempting uninstall: awswrangler
Found existing installation: awswrangler 2.8.0
Uninstalling awswrangler-2.8.0:
Successfully uninstalled awswrangler-2.8.0
Found existing installation: awswrangler 2.9.0
Uninstalling awswrangler-2.9.0:
Successfully uninstalled awswrangler-2.9.0
Running setup.py develop for awswrangler
Running setup.py install for python-Levenshtein ... error
ERROR: Command errored out with exit status 1:
Expand Down
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo

> An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com
[![Release](https://img.shields.io/badge/release-2.8.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Release](https://img.shields.io/badge/release-2.9.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Expand All @@ -24,7 +24,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo
| **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` |
| **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` |

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️ `pip install pyarrow==2 awswrangler`

Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)
Expand All @@ -42,7 +42,7 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http

Installation command: `pip install awswrangler`

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️`pip install pyarrow==2 awswrangler`

```py3
Expand Down Expand Up @@ -96,17 +96,17 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3

## [Read The Docs](https://aws-data-wrangler.readthedocs.io/)

- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.8.0/what.html)
- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html)
- [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#pypi-pip)
- [Conda](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#conda)
- [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/2.8.0/install.html#from-source)
- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.9.0/what.html)
- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html)
- [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#pypi-pip)
- [Conda](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#conda)
- [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/2.9.0/install.html#from-source)
- [**Tutorials**](https://github.com/awslabs/aws-data-wrangler/tree/main/tutorials)
- [001 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/002%20-%20Sessions.ipynb)
Expand Down Expand Up @@ -136,22 +136,22 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
- [026 - Amazon Timestream](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/026%20-%20Amazon%20Timestream.ipynb)
- [027 - Amazon Timestream 2](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/027%20-%20Amazon%20Timestream%202.ipynb)
- [028 - Amazon DynamoDB](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/028%20-%20DynamoDB.ipynb)
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html)
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#postgresql)
- [MySQL](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#mysql)
- [SQL Server](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#sqlserver)
- [DynamoDB](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-data-wrangler.readthedocs.io/en/2.8.0/api.html#aws-secrets-manager)
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html)
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#postgresql)
- [MySQL](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#mysql)
- [SQL Server](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#sqlserver)
- [DynamoDB](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-data-wrangler.readthedocs.io/en/2.9.0/api.html#aws-secrets-manager)
- [**License**](https://github.com/awslabs/aws-data-wrangler/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/awslabs/aws-data-wrangler/blob/main/CONTRIBUTING.md)
- [**Legacy Docs** (pre-1.0.0)](https://aws-data-wrangler.readthedocs.io/en/0.3.3/)
Expand Down
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "2.8.0"
__version__: str = "2.9.0"
__license__: str = "Apache License 2.0"
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -605,11 +605,11 @@ def read_sql_query(
**Related tutorial:**
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -657,7 +657,7 @@ def read_sql_query(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
Expand Down Expand Up @@ -872,11 +872,11 @@ def read_sql_table(
**Related tutorial:**
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -921,7 +921,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.8.0/
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.9.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
Expand Down
2 changes: 1 addition & 1 deletion awswrangler/s3/_read_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ def read_parquet_table(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
Names of columns to read from the file(s).
validate_schema:
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_read_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def read_csv(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs :
KEYWORD arguments forwarded to pandas.read_csv(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down Expand Up @@ -389,7 +389,7 @@ def read_fwf(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_fwf(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down Expand Up @@ -541,7 +541,7 @@ def read_json(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_json(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_write_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,18 +298,18 @@ def to_parquet( # pylint: disable=too-many-arguments,too-many-locals
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode: str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.8.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.9.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/014%20-%20Schema%20Evolution.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down
4 changes: 2 additions & 2 deletions awswrangler/s3/_write_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,11 +175,11 @@ def to_csv( # pylint: disable=too-many-arguments,too-many-locals,too-many-state
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.8.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.9.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode : str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.8.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.9.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
database : str, optional
Expand Down
2 changes: 1 addition & 1 deletion building/lambda/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ RUN pip3 install -r /root/requirements.txt

ADD requirements-dev.txt /root/
# Removing "-e ." installation
RUN head -n -2 /root/requirements-dev.txt > /root/temp.txt
RUN head -n -3 /root/requirements-dev.txt > /root/temp.txt
RUN mv /root/temp.txt /root/requirements-dev.txt
RUN pip3 install -r /root/requirements-dev.txt

Expand Down
6 changes: 3 additions & 3 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ Go to your Glue PySpark job and create a new *Job parameters* key/value:

To install a specific version, set the value for above Job parameter as follows:

* Value: ``pyarrow==2,awswrangler==2.8.0``
* Value: ``pyarrow==2,awswrangler==2.9.0``

.. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why a previous installation of pyarrow 2 is required.

Expand Down Expand Up @@ -95,7 +95,7 @@ Here is an example of how to reference the Lambda layer in your CDK app:
"wrangler-bucket",
bucket_arn="arn:aws:s3:::aws-data-wrangler-public-artifacts",
),
key="releases/2.8.0/awswrangler-layer-2.8.0-py3.8.zip",
key="releases/2.9.0/awswrangler-layer-2.9.0-py3.8.zip",
),
layer_version_name="aws-data-wrangler"
)
Expand Down Expand Up @@ -190,7 +190,7 @@ complement Big Data pipelines.
sudo pip install pyarrow==2 awswrangler
.. note:: Make sure to freeze the Wrangler version in the bootstrap for productive
environments (e.g. awswrangler==2.8.0)
environments (e.g. awswrangler==2.9.0)
.. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why a previous installation of pyarrow 2 is required.
Expand Down
Loading

0 comments on commit 89b459d

Please sign in to comment.