Skip to content

Commit

Permalink
Preparing release 2.11.0 (#885)
Browse files Browse the repository at this point in the history
* Adding Data Api tutorial and fixing docs

* [skip ci] - Minor - Incorrect bump

* CB Tests 2.11.0

* Minor - Clarifying Glue Spark docs
  • Loading branch information
jaidisido authored Sep 1, 2021
1 parent 11e58a8 commit e216d53
Show file tree
Hide file tree
Showing 24 changed files with 198 additions and 87 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.10.0
current_version = 2.11.0
commit = False
tag = False
tag_name = {new_version}
Expand Down
6 changes: 3 additions & 3 deletions CONTRIBUTING_COMMON_ERRORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in ./.venv/lib/python3.7/site-
Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed.
Installing collected packages: awswrangler, python-Levenshtein
Attempting uninstall: awswrangler
Found existing installation: awswrangler 2.10.0
Uninstalling awswrangler-2.10.0:
Successfully uninstalled awswrangler-2.10.0
Found existing installation: awswrangler 2.11.0
Uninstalling awswrangler-2.11.0:
Successfully uninstalled awswrangler-2.11.0
Running setup.py develop for awswrangler
Running setup.py install for python-Levenshtein ... error
ERROR: Command errored out with exit status 1:
Expand Down
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo

> An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com
[![Release](https://img.shields.io/badge/release-2.10.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Release](https://img.shields.io/badge/release-2.11.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
Expand All @@ -23,7 +23,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo
| **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` |
| **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` |

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️ `pip install pyarrow==2 awswrangler`

Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)
Expand All @@ -42,7 +42,7 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http

Installation command: `pip install awswrangler`

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs), MWAA):**<br>
➡️`pip install pyarrow==2 awswrangler`

```py3
Expand Down Expand Up @@ -96,17 +96,17 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3

## [Read The Docs](https://aws-data-wrangler.readthedocs.io/)

- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.10.0/what.html)
- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html)
- [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#pypi-pip)
- [Conda](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#conda)
- [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#from-source)
- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.11.0/what.html)
- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html)
- [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#pypi-pip)
- [Conda](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#conda)
- [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr)
- [From source](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#from-source)
- [**Tutorials**](https://github.com/awslabs/aws-data-wrangler/tree/main/tutorials)
- [001 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/002%20-%20Sessions.ipynb)
Expand Down Expand Up @@ -136,22 +136,22 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
- [026 - Amazon Timestream](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/026%20-%20Amazon%20Timestream.ipynb)
- [027 - Amazon Timestream 2](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/027%20-%20Amazon%20Timestream%202.ipynb)
- [028 - Amazon DynamoDB](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/028%20-%20DynamoDB.ipynb)
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html)
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#postgresql)
- [MySQL](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#mysql)
- [SQL Server](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#sqlserver)
- [DynamoDB](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#aws-secrets-manager)
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html)
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#postgresql)
- [MySQL](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#mysql)
- [SQL Server](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#sqlserver)
- [DynamoDB](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#aws-secrets-manager)
- [**License**](https://github.com/awslabs/aws-data-wrangler/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/awslabs/aws-data-wrangler/blob/main/CONTRIBUTING.md)
- [**Legacy Docs** (pre-1.0.0)](https://aws-data-wrangler.readthedocs.io/en/0.3.3/)
Expand Down
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "2.10.0"
__version__: str = "2.11.0"
__license__: str = "Apache License 2.0"
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -617,11 +617,11 @@ def read_sql_query(
**Related tutorial:**
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -669,7 +669,7 @@ def read_sql_query(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
Expand Down Expand Up @@ -890,11 +890,11 @@ def read_sql_table(
**Related tutorial:**
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
- `Amazon Athena <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
- `Athena Cache <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
- `Global Configurations <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/021%20-%20Global%20Configurations.html>`_
**There are two approaches to be defined through ctas_approach parameter:**
Expand Down Expand Up @@ -939,7 +939,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .
For a practical example check out the
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.10.0/
`related tutorial <https://aws-data-wrangler.readthedocs.io/en/2.11.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
Expand Down
2 changes: 2 additions & 0 deletions awswrangler/data_api/rds.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,8 @@ def read_sql_query(sql: str, con: RdsDataApi, database: Optional[str] = None) ->
----------
sql: str
SQL query to run.
con: RdsDataApi
A RdsDataApi connection instance
database: str
Database to run query on - defaults to the database specified by `con`.
Expand Down
2 changes: 2 additions & 0 deletions awswrangler/data_api/redshift.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,8 @@ def read_sql_query(sql: str, con: RedshiftDataApi, database: Optional[str] = Non
----------
sql: str
SQL query to run.
con: RedshiftDataApi
A RedshiftDataApi connection instance
database: str
Database to run query on - defaults to the database specified by `con`.
Expand Down
2 changes: 1 addition & 1 deletion awswrangler/s3/_read_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ def read_parquet_table(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns : List[str], optional
Names of columns to read from the file(s).
validate_schema:
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_read_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def read_csv(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs :
KEYWORD arguments forwarded to pandas.read_csv(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down Expand Up @@ -389,7 +389,7 @@ def read_fwf(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_fwf(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down Expand Up @@ -541,7 +541,7 @@ def read_json(
This function MUST return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
pandas_kwargs:
KEYWORD arguments forwarded to pandas.read_json(). You can NOT pass `pandas_kwargs` explicit, just add valid
Pandas arguments in the function call and Wrangler will accept it.
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_write_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,18 +279,18 @@ def to_parquet( # pylint: disable=too-many-arguments,too-many-locals
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode: str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/014%20-%20Schema%20Evolution.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_write_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,18 +174,18 @@ def to_csv( # pylint: disable=too-many-arguments,too-many-locals,too-many-state
concurrent_partitioning: bool
If True will increase the parallelism level during the partitions writing. It will decrease the
writing time and increase the memory usage.
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html
mode : str, optional
``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True.
For details check the related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet
catalog_versioning : bool
If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it.
schema_evolution : bool
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/014%20-%20Schema%20Evolution.html
https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/014%20-%20Schema%20Evolution.html
database : str, optional
Glue/Athena catalog: Database name.
table : str, optional
Expand Down
2 changes: 2 additions & 0 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ Data API Redshift
.. autosummary::
:toctree: stubs

RedshiftDataApi
connect
read_sql_query

Expand All @@ -194,6 +195,7 @@ Data API RDS
.. autosummary::
:toctree: stubs

RdsDataApi
connect
read_sql_query

Expand Down
Loading

0 comments on commit e216d53

Please sign in to comment.