diff --git a/.bumpversion.cfg b/.bumpversion.cfg index 5636ff7d4..b60afc641 100644 --- a/.bumpversion.cfg +++ b/.bumpversion.cfg @@ -1,5 +1,5 @@ [bumpversion] -current_version = 2.10.0 +current_version = 2.11.0 commit = False tag = False tag_name = {new_version} diff --git a/CONTRIBUTING_COMMON_ERRORS.md b/CONTRIBUTING_COMMON_ERRORS.md index 92f8aa94d..5d820d28e 100644 --- a/CONTRIBUTING_COMMON_ERRORS.md +++ b/CONTRIBUTING_COMMON_ERRORS.md @@ -13,9 +13,9 @@ Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in ./.venv/lib/python3.7/site- Using legacy 'setup.py install' for python-Levenshtein, since package 'wheel' is not installed. Installing collected packages: awswrangler, python-Levenshtein Attempting uninstall: awswrangler - Found existing installation: awswrangler 2.10.0 - Uninstalling awswrangler-2.10.0: - Successfully uninstalled awswrangler-2.10.0 + Found existing installation: awswrangler 2.11.0 + Uninstalling awswrangler-2.11.0: + Successfully uninstalled awswrangler-2.11.0 Running setup.py develop for awswrangler Running setup.py install for python-Levenshtein ... error ERROR: Command errored out with exit status 1: diff --git a/README.md b/README.md index 22f86e0a0..01b595472 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo > An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com -[![Release](https://img.shields.io/badge/release-2.10.0-brightgreen.svg)](https://pypi.org/project/awswrangler/) +[![Release](https://img.shields.io/badge/release-2.11.0-brightgreen.svg)](https://pypi.org/project/awswrangler/) [![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) @@ -23,7 +23,7 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo | **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` | | **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` | -> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs), MWAA):**
+> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs), MWAA):**
➡️ `pip install pyarrow==2 awswrangler` Powered By [](https://arrow.apache.org/powered_by/) @@ -42,7 +42,7 @@ Powered By [](http Installation command: `pip install awswrangler` -> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs), MWAA):**
+> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs), MWAA):**
➡️`pip install pyarrow==2 awswrangler` ```py3 @@ -96,17 +96,17 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3 ## [Read The Docs](https://aws-data-wrangler.readthedocs.io/) -- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.10.0/what.html) -- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html) - - [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#pypi-pip) - - [Conda](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#conda) - - [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-lambda-layer) - - [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-python-shell-jobs) - - [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs) - - [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#amazon-sagemaker-notebook) - - [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#amazon-sagemaker-notebook-lifecycle) - - [EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr) - - [From source](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#from-source) +- [**What is AWS Data Wrangler?**](https://aws-data-wrangler.readthedocs.io/en/2.11.0/what.html) +- [**Install**](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html) + - [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#pypi-pip) + - [Conda](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#conda) + - [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-lambda-layer) + - [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-python-shell-jobs) + - [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs) + - [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#amazon-sagemaker-notebook) + - [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#amazon-sagemaker-notebook-lifecycle) + - [EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr) + - [From source](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#from-source) - [**Tutorials**](https://github.com/awslabs/aws-data-wrangler/tree/main/tutorials) - [001 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/001%20-%20Introduction.ipynb) - [002 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/002%20-%20Sessions.ipynb) @@ -136,22 +136,22 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3 - [026 - Amazon Timestream](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/026%20-%20Amazon%20Timestream.ipynb) - [027 - Amazon Timestream 2](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/027%20-%20Amazon%20Timestream%202.ipynb) - [028 - Amazon DynamoDB](https://github.com/awslabs/aws-data-wrangler/blob/main/tutorials/028%20-%20DynamoDB.ipynb) -- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html) - - [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-s3) - - [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#aws-glue-catalog) - - [Amazon Athena](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-athena) - - [Amazon Redshift](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-redshift) - - [PostgreSQL](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#postgresql) - - [MySQL](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#mysql) - - [SQL Server](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#sqlserver) - - [DynamoDB](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#dynamodb) - - [Amazon Timestream](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-timestream) - - [Amazon EMR](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-emr) - - [Amazon CloudWatch Logs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-cloudwatch-logs) - - [Amazon Chime](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-chime) - - [Amazon QuickSight](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#amazon-quicksight) - - [AWS STS](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#aws-sts) - - [AWS Secrets Manager](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html#aws-secrets-manager) +- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html) + - [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-s3) + - [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#aws-glue-catalog) + - [Amazon Athena](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-athena) + - [Amazon Redshift](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-redshift) + - [PostgreSQL](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#postgresql) + - [MySQL](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#mysql) + - [SQL Server](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#sqlserver) + - [DynamoDB](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#dynamodb) + - [Amazon Timestream](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-timestream) + - [Amazon EMR](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-emr) + - [Amazon CloudWatch Logs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-cloudwatch-logs) + - [Amazon Chime](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-chime) + - [Amazon QuickSight](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#amazon-quicksight) + - [AWS STS](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#aws-sts) + - [AWS Secrets Manager](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html#aws-secrets-manager) - [**License**](https://github.com/awslabs/aws-data-wrangler/blob/main/LICENSE.txt) - [**Contributing**](https://github.com/awslabs/aws-data-wrangler/blob/main/CONTRIBUTING.md) - [**Legacy Docs** (pre-1.0.0)](https://aws-data-wrangler.readthedocs.io/en/0.3.3/) diff --git a/awswrangler/__metadata__.py b/awswrangler/__metadata__.py index ec682bbe5..4872e3912 100644 --- a/awswrangler/__metadata__.py +++ b/awswrangler/__metadata__.py @@ -7,5 +7,5 @@ __title__: str = "awswrangler" __description__: str = "Pandas on AWS." -__version__: str = "2.10.0" +__version__: str = "2.11.0" __license__: str = "Apache License 2.0" diff --git a/awswrangler/athena/_read.py b/awswrangler/athena/_read.py index 1229bba88..cd828ccf8 100644 --- a/awswrangler/athena/_read.py +++ b/awswrangler/athena/_read.py @@ -617,11 +617,11 @@ def read_sql_query( **Related tutorial:** - - `Amazon Athena `_ - - `Athena Cache `_ - - `Global Configurations `_ **There are two approaches to be defined through ctas_approach parameter:** @@ -669,7 +669,7 @@ def read_sql_query( /athena.html#Athena.Client.get_query_execution>`_ . For a practical example check out the - `related tutorial `_! @@ -890,11 +890,11 @@ def read_sql_table( **Related tutorial:** - - `Amazon Athena `_ - - `Athena Cache `_ - - `Global Configurations `_ **There are two approaches to be defined through ctas_approach parameter:** @@ -939,7 +939,7 @@ def read_sql_table( /athena.html#Athena.Client.get_query_execution>`_ . For a practical example check out the - `related tutorial `_! diff --git a/awswrangler/data_api/rds.py b/awswrangler/data_api/rds.py index 71b34be51..e95dc5692 100644 --- a/awswrangler/data_api/rds.py +++ b/awswrangler/data_api/rds.py @@ -139,6 +139,8 @@ def read_sql_query(sql: str, con: RdsDataApi, database: Optional[str] = None) -> ---------- sql: str SQL query to run. + con: RdsDataApi + A RdsDataApi connection instance database: str Database to run query on - defaults to the database specified by `con`. diff --git a/awswrangler/data_api/redshift.py b/awswrangler/data_api/redshift.py index d3947d91d..a6a5cc3a8 100644 --- a/awswrangler/data_api/redshift.py +++ b/awswrangler/data_api/redshift.py @@ -189,6 +189,8 @@ def read_sql_query(sql: str, con: RedshiftDataApi, database: Optional[str] = Non ---------- sql: str SQL query to run. + con: RedshiftDataApi + A RedshiftDataApi connection instance database: str Database to run query on - defaults to the database specified by `con`. diff --git a/awswrangler/s3/_read_parquet.py b/awswrangler/s3/_read_parquet.py index 660363a52..6b4ba0c54 100644 --- a/awswrangler/s3/_read_parquet.py +++ b/awswrangler/s3/_read_parquet.py @@ -788,7 +788,7 @@ def read_parquet_table( This function MUST return a bool, True to read the partition or False to ignore it. Ignored if `dataset=False`. E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False`` - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html columns : List[str], optional Names of columns to read from the file(s). validate_schema: diff --git a/awswrangler/s3/_read_text.py b/awswrangler/s3/_read_text.py index c6b4e9042..a51dd4ed6 100644 --- a/awswrangler/s3/_read_text.py +++ b/awswrangler/s3/_read_text.py @@ -241,7 +241,7 @@ def read_csv( This function MUST return a bool, True to read the partition or False to ignore it. Ignored if `dataset=False`. E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False`` - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html pandas_kwargs : KEYWORD arguments forwarded to pandas.read_csv(). You can NOT pass `pandas_kwargs` explicit, just add valid Pandas arguments in the function call and Wrangler will accept it. @@ -389,7 +389,7 @@ def read_fwf( This function MUST return a bool, True to read the partition or False to ignore it. Ignored if `dataset=False`. E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False`` - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html pandas_kwargs: KEYWORD arguments forwarded to pandas.read_fwf(). You can NOT pass `pandas_kwargs` explicit, just add valid Pandas arguments in the function call and Wrangler will accept it. @@ -541,7 +541,7 @@ def read_json( This function MUST return a bool, True to read the partition or False to ignore it. Ignored if `dataset=False`. E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False`` - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html pandas_kwargs: KEYWORD arguments forwarded to pandas.read_json(). You can NOT pass `pandas_kwargs` explicit, just add valid Pandas arguments in the function call and Wrangler will accept it. diff --git a/awswrangler/s3/_write_parquet.py b/awswrangler/s3/_write_parquet.py index 3f61e582b..0eee9961d 100644 --- a/awswrangler/s3/_write_parquet.py +++ b/awswrangler/s3/_write_parquet.py @@ -279,18 +279,18 @@ def to_parquet( # pylint: disable=too-many-arguments,too-many-locals concurrent_partitioning: bool If True will increase the parallelism level during the partitions writing. It will decrease the writing time and increase the memory usage. - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html mode: str, optional ``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True. For details check the related tutorial: - https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet + https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet catalog_versioning : bool If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it. schema_evolution : bool If True allows schema evolution (new or missing columns), otherwise a exception will be raised. (Only considered if dataset=True and mode in ("append", "overwrite_partitions")) Related tutorial: - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/014%20-%20Schema%20Evolution.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/014%20-%20Schema%20Evolution.html database : str, optional Glue/Athena catalog: Database name. table : str, optional diff --git a/awswrangler/s3/_write_text.py b/awswrangler/s3/_write_text.py index 3d352ca06..bcfef1855 100644 --- a/awswrangler/s3/_write_text.py +++ b/awswrangler/s3/_write_text.py @@ -174,18 +174,18 @@ def to_csv( # pylint: disable=too-many-arguments,too-many-locals,too-many-state concurrent_partitioning: bool If True will increase the parallelism level during the partitions writing. It will decrease the writing time and increase the memory usage. - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/022%20-%20Writing%20Partitions%20Concurrently.html mode : str, optional ``append`` (Default), ``overwrite``, ``overwrite_partitions``. Only takes effect if dataset=True. For details check the related tutorial: - https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet + https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet catalog_versioning : bool If True and `mode="overwrite"`, creates an archived version of the table catalog before updating it. schema_evolution : bool If True allows schema evolution (new or missing columns), otherwise a exception will be raised. (Only considered if dataset=True and mode in ("append", "overwrite_partitions")) Related tutorial: - https://aws-data-wrangler.readthedocs.io/en/2.10.0/tutorials/014%20-%20Schema%20Evolution.html + https://aws-data-wrangler.readthedocs.io/en/2.11.0/tutorials/014%20-%20Schema%20Evolution.html database : str, optional Glue/Athena catalog: Database name. table : str, optional diff --git a/docs/source/api.rst b/docs/source/api.rst index 9058fefea..4233f72cd 100644 --- a/docs/source/api.rst +++ b/docs/source/api.rst @@ -183,6 +183,7 @@ Data API Redshift .. autosummary:: :toctree: stubs + RedshiftDataApi connect read_sql_query @@ -194,6 +195,7 @@ Data API RDS .. autosummary:: :toctree: stubs + RdsDataApi connect read_sql_query diff --git a/docs/source/install.rst b/docs/source/install.rst index d846a6c9b..d71bda417 100644 --- a/docs/source/install.rst +++ b/docs/source/install.rst @@ -62,7 +62,7 @@ Go to your Glue PySpark job and create a new *Job parameters* key/value: To install a specific version, set the value for above Job parameter as follows: -* Value: ``pyarrow==2,awswrangler==2.10.0`` +* Value: ``cython==0.29.21,pg8000==1.21.0,pyarrow==2,pandas==1.3.0,awswrangler==2.11.0`` .. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why a previous installation of pyarrow 2 is required. @@ -95,7 +95,7 @@ Here is an example of how to reference the Lambda layer in your CDK app: "wrangler-bucket", bucket_arn="arn:aws:s3:::aws-data-wrangler-public-artifacts", ), - key="releases/2.10.0/awswrangler-layer-2.10.0-py3.8.zip", + key="releases/2.11.0/awswrangler-layer-2.11.0-py3.8.zip", ), layer_version_name="aws-data-wrangler" ) @@ -190,7 +190,7 @@ complement Big Data pipelines. sudo pip install pyarrow==2 awswrangler .. note:: Make sure to freeze the Wrangler version in the bootstrap for productive - environments (e.g. awswrangler==2.10.0) + environments (e.g. awswrangler==2.11.0) .. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why a previous installation of pyarrow 2 is required. diff --git a/docs/source/what.rst b/docs/source/what.rst index 12e6995bd..d1b741f96 100644 --- a/docs/source/what.rst +++ b/docs/source/what.rst @@ -8,4 +8,4 @@ SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL Built on top of other open-source projects like `Pandas `_, `Apache Arrow `_ and `Boto3 `_, it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**. -Check our `tutorials `_ or the `list of functionalities `_. \ No newline at end of file +Check our `tutorials `_ or the `list of functionalities `_. \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index cbedcd806..4372928d2 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "awswrangler" -version = "2.10.0" +version = "2.11.0" description = "Pandas on AWS." authors = ["Igor Tavares"] license = "Apache License 2.0" diff --git a/test_infra/pyproject.toml b/test_infra/pyproject.toml index 5b0f7191f..e6dda67cb 100644 --- a/test_infra/pyproject.toml +++ b/test_infra/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "awswrangler - test infrastructure" -version = "2.10.0" +version = "2.11.0" description = "CDK test infrastructure for AWS" authors = ["Igor Tavares"] license = "Apache License 2.0" diff --git a/tests/test_metadata.py b/tests/test_metadata.py index d63273380..4031f5a86 100644 --- a/tests/test_metadata.py +++ b/tests/test_metadata.py @@ -2,7 +2,7 @@ def test_metadata(): - assert wr.__version__ == "2.10.0" + assert wr.__version__ == "2.11.0" assert wr.__title__ == "awswrangler" assert wr.__description__ == "Pandas on AWS." assert wr.__license__ == "Apache License 2.0" diff --git a/tutorials/001 - Introduction.ipynb b/tutorials/001 - Introduction.ipynb index bf5a9be54..2ef8932cf 100644 --- a/tutorials/001 - Introduction.ipynb +++ b/tutorials/001 - Introduction.ipynb @@ -19,7 +19,7 @@ "\n", "Built on top of other open-source projects like [Pandas](https://github.com/pandas-dev/pandas), [Apache Arrow](https://github.com/apache/arrow) and [Boto3](https://github.com/boto/boto3), it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**.\n", "\n", - "Check our [list of functionalities](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html)." + "Check our [list of functionalities](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html)." ] }, { @@ -30,15 +30,15 @@ "\n", "The Wrangler runs almost anywhere over Python 3.6, 3.7, 3.8 and 3.9, so there are several different ways to install it in the desired enviroment.\n", "\n", - " - [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#pypi-pip)\n", - " - [Conda](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#conda)\n", - " - [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-lambda-layer)\n", - " - [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-python-shell-jobs)\n", - " - [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#aws-glue-pyspark-jobs)\n", - " - [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#amazon-sagemaker-notebook)\n", - " - [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#amazon-sagemaker-notebook-lifecycle)\n", - " - [EMR Cluster](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#emr-cluster)\n", - " - [From source](https://aws-data-wrangler.readthedocs.io/en/2.10.0/install.html#from-source)\n", + " - [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#pypi-pip)\n", + " - [Conda](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#conda)\n", + " - [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-lambda-layer)\n", + " - [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-python-shell-jobs)\n", + " - [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#aws-glue-pyspark-jobs)\n", + " - [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#amazon-sagemaker-notebook)\n", + " - [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#amazon-sagemaker-notebook-lifecycle)\n", + " - [EMR Cluster](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#emr-cluster)\n", + " - [From source](https://aws-data-wrangler.readthedocs.io/en/2.11.0/install.html#from-source)\n", "\n", "Some good practices for most of the above methods are:\n", " - Use new and individual Virtual Environments for each project ([venv](https://docs.python.org/3/library/venv.html))\n", diff --git a/tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server.ipynb b/tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server.ipynb index fdd15458a..41797521f 100644 --- a/tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server.ipynb +++ b/tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server.ipynb @@ -10,14 +10,14 @@ "\n", "[Wrangler](https://github.com/awslabs/aws-data-wrangler)'s Redshift, MySQL and PostgreSQL have two basic function in common that tries to follow the Pandas conventions, but add more data type consistency.\n", "\n", - "- [wr.redshift.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.redshift.to_sql.html)\n", - "- [wr.redshift.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.redshift.read_sql_query.html)\n", - "- [wr.mysql.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.mysql.to_sql.html)\n", - "- [wr.mysql.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.mysql.read_sql_query.html)\n", - "- [wr.postgresql.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.postgresql.to_sql.html)\n", - "- [wr.postgresql.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.postgresql.read_sql_query.html)\n", - "- [wr.sqlserver.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.sqlserver.to_sql.html)\n", - "- [wr.sqlserver.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.sqlserver.read_sql_query.html)" + "- [wr.redshift.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.redshift.to_sql.html)\n", + "- [wr.redshift.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.redshift.read_sql_query.html)\n", + "- [wr.mysql.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.mysql.to_sql.html)\n", + "- [wr.mysql.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.mysql.read_sql_query.html)\n", + "- [wr.postgresql.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.postgresql.to_sql.html)\n", + "- [wr.postgresql.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.postgresql.read_sql_query.html)\n", + "- [wr.sqlserver.to_sql()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.sqlserver.to_sql.html)\n", + "- [wr.sqlserver.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.sqlserver.read_sql_query.html)" ] }, { @@ -41,10 +41,10 @@ "source": [ "## Connect using the Glue Catalog Connections\n", "\n", - "- [wr.redshift.connect()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.redshift.connect.html)\n", - "- [wr.mysql.connect()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.mysql.connect.html)\n", - "- [wr.postgresql.connect()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.postgresql.connect.html)\n", - "- [wr.sqlserver.connect()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.sqlserver.connect.html)" + "- [wr.redshift.connect()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.redshift.connect.html)\n", + "- [wr.mysql.connect()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.mysql.connect.html)\n", + "- [wr.postgresql.connect()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.postgresql.connect.html)\n", + "- [wr.sqlserver.connect()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.sqlserver.connect.html)" ] }, { diff --git a/tutorials/014 - Schema Evolution.ipynb b/tutorials/014 - Schema Evolution.ipynb index d3dcff769..a48b202cb 100644 --- a/tutorials/014 - Schema Evolution.ipynb +++ b/tutorials/014 - Schema Evolution.ipynb @@ -10,8 +10,8 @@ "\n", "Wrangler support new **columns** on Parquet Dataset through:\n", "\n", - "- [wr.s3.to_parquet()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet)\n", - "- [wr.s3.store_parquet_metadata()](https://aws-data-wrangler.readthedocs.io/en/2.10.0/stubs/awswrangler.s3.store_parquet_metadata.html#awswrangler.s3.store_parquet_metadata) i.e. \"Crawler\"" + "- [wr.s3.to_parquet()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet)\n", + "- [wr.s3.store_parquet_metadata()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.s3.store_parquet_metadata.html#awswrangler.s3.store_parquet_metadata) i.e. \"Crawler\"" ] }, { diff --git a/tutorials/021 - Global Configurations.ipynb b/tutorials/021 - Global Configurations.ipynb index b990873c7..39615e993 100644 --- a/tutorials/021 - Global Configurations.ipynb +++ b/tutorials/021 - Global Configurations.ipynb @@ -13,7 +13,7 @@ "- **Environment variables**\n", "- **wr.config**\n", "\n", - "*P.S. Check the [function API doc](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html) to see if your function has some argument that can be configured through Global configurations.*\n", + "*P.S. Check the [function API doc](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html) to see if your function has some argument that can be configured through Global configurations.*\n", "\n", "*P.P.S. One exception to the above mentioned rules is the `botocore_config` property. It cannot be set through environment variables\n", "but only via `wr.config`. It will be used as the `botocore.config.Config` for all underlying `boto3` calls.\n", diff --git a/tutorials/022 - Writing Partitions Concurrently.ipynb b/tutorials/022 - Writing Partitions Concurrently.ipynb index 3f4f1d127..ecd861ec2 100644 --- a/tutorials/022 - Writing Partitions Concurrently.ipynb +++ b/tutorials/022 - Writing Partitions Concurrently.ipynb @@ -13,7 +13,7 @@ " If True will increase the parallelism level during the partitions writing. It will decrease the\n", " writing time and increase the memory usage.\n", "\n", - "*P.S. Check the [function API doc](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html) to see it has some argument that can be configured through Global configurations.*" + "*P.S. Check the [function API doc](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html) to see it has some argument that can be configured through Global configurations.*" ] }, { diff --git a/tutorials/023 - Flexible Partitions Filter.ipynb b/tutorials/023 - Flexible Partitions Filter.ipynb index d162c9656..c1c54377d 100644 --- a/tutorials/023 - Flexible Partitions Filter.ipynb +++ b/tutorials/023 - Flexible Partitions Filter.ipynb @@ -16,7 +16,7 @@ " - Ignored if `dataset=False`.\n", " \n", "\n", - "*P.S. Check the [function API doc](https://aws-data-wrangler.readthedocs.io/en/2.10.0/api.html) to see it has some argument that can be configured through Global configurations.*" + "*P.S. Check the [function API doc](https://aws-data-wrangler.readthedocs.io/en/2.11.0/api.html) to see it has some argument that can be configured through Global configurations.*" ] }, { diff --git a/tutorials/030 - Data Api.ipynb b/tutorials/030 - Data Api.ipynb new file mode 100644 index 000000000..ed8cceaf2 --- /dev/null +++ b/tutorials/030 - Data Api.ipynb @@ -0,0 +1,105 @@ +{ + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.7" + }, + "orig_nbformat": 2, + "kernelspec": { + "name": "pythonjvsc74a57bd0e4beff3b9c91951bd870e0e6d1ba9dfdd106cfe45c6f3d0f8d31550063fd3386", + "display_name": "Python 3.7.7 64-bit ('.env': venv)" + }, + "metadata": { + "interpreter": { + "hash": "e4beff3b9c91951bd870e0e6d1ba9dfdd106cfe45c6f3d0f8d31550063fd3386" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "[![AWS Data Wrangler](_static/logo.png \"AWS Data Wrangler\")](https://github.com/awslabs/aws-data-wrangler)" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "# 30 - Data Api" + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "The Data Api simplifies access to Amazon Redshift and RDS by removing the need to manage database connections and credentials. Instead, you can execute SQL commands to an Amazon Redshift cluster or Amazon Aurora cluster by simply invoking an HTTPS API endpoint provided by the Data API. It takes care of managing database connections and returning data. Since the Data API leverages IAM user credentials or database credentials stored in AWS Secrets Manager, you don’t need to pass credentials in API calls." + ], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Connect to the cluster\n", + "- [wr.data_api.redshift.connect()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.data_api.redshift.connect.html)\n", + "- [wr.data_api.rds.connect()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.data_api.rds.connect.html)" + ], + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "source": [ + "con_redshift = wr.data_api.redshift.connect(\n", + " cluster_id=\"aws-data-wrangler-1xn5lqxrdxrv3\",\n", + " database=\"test_redshift\",\n", + " secret_arn=\"arn:aws:secretsmanager:us-east-1:111111111111:secret:aws-data-wrangler/redshift-ewn43d\"\n", + ")\n", + "\n", + "con_mysql = wr.data_api.rds.connect(\n", + " cluster_id=\"arn:aws:rds:us-east-1:111111111111:cluster:mysql-serverless-cluster-wrangler\",\n", + " database=\"test_rds\",\n", + " secret_arn=\"arn:aws:secretsmanager:us-east-1:111111111111:secret:aws-data-wrangler/mysql-23df3\"\n", + ")" + ], + "outputs": [], + "metadata": {} + }, + { + "cell_type": "markdown", + "source": [ + "## Read from database\n", + "- [wr.data_api.redshift.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.data_api.redshift.read_sql_query.html)\n", + "- [wr.data_api.rds.read_sql_query()](https://aws-data-wrangler.readthedocs.io/en/2.11.0/stubs/awswrangler.data_api.rds.read_sql_query.html)" + ], + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": null, + "source": [ + "df = wr.data_api.redshift.read_sql_query(\n", + " sql=\"SELECT * FROM public.test_table\",\n", + " con=con_redshift,\n", + ")\n", + "\n", + "df = wr.data_api.rds.read_sql_query(\n", + " sql=\"SELECT * FROM test.test_table\",\n", + " con=con_rds,\n", + ")" + ], + "outputs": [], + "metadata": {} + } + ] +} \ No newline at end of file