Skip to content

Commit

Permalink
Add PyArrow 3 caveats on the docs. #546 #547
Browse files Browse the repository at this point in the history
  • Loading branch information
igorborgest committed Feb 4, 2021
1 parent 4b9f270 commit ea92e5c
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 7 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo
| **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` |
| **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` |

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):** `pip install pyarrow==2 awswrangler`
Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)

## Table of contents
Expand All @@ -38,6 +40,8 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http

Installation command: `pip install awswrangler`

> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):** `pip install pyarrow==2 awswrangler`
```py3
import awswrangler as wr
import pandas as pd
Expand Down
30 changes: 23 additions & 7 deletions docs/source/install.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Install
=======

**AWS Data Wrangler** runs with Python ``3.6``, ``3.7`` and ``3.8``
**AWS Data Wrangler** runs with Python ``3.6``, ``3.7``, ``3.8`` and ``3.9``
and on several platforms (AWS Lambda, AWS Glue Python Shell, EMR, EC2,
on-premises, Amazon SageMaker, local, etc).

Expand Down Expand Up @@ -57,10 +57,13 @@ AWS Glue PySpark Jobs
Go to your Glue PySpark job and create a new *Job parameters* key/value:

* Key: ``--additional-python-modules``
* Value: ``awswrangler==2.3.0``
* Value: ``pyarrow==2,awswrangler``

P.S. By now AWS Glue PySpark Jobs does not support PyArrow +3.0.0.
Please use awswrangler==2.3.0 that uses PyArrow 2.0.0 to overcome this limitation.
To install a specific version, set the value for above Job parameter as follows:

* Value: ``pyarrow==2,awswrangler==2.4.0``

.. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why a previous installation of pyarrow 2 is required.

`Official Glue PySpark Reference <https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html#reduced-start-times-new-features>`_

Expand Down Expand Up @@ -115,7 +118,7 @@ AWS Data Wrangler could be a good helper to
complement Big Data pipelines.
- Configure Python 3 as the default interpreter for
PySpark under your cluster configuration
PySpark on your cluster configuration [ONLY REQUIRED FOR EMR < 6]
.. code-block:: json
Expand All @@ -135,15 +138,28 @@ complement Big Data pipelines.
- Keep the bootstrap script above on S3 and reference it on your cluster.
- For EMR Release < 6
.. code-block:: sh
#!/usr/bin/env bash
set -ex
sudo pip-3.6 install awswrangler
sudo pip-3.6 install pyarrow==2 awswrangler
- For EMR Release >= 6
.. code-block:: sh
#!/usr/bin/env bash
set -ex
sudo pip install pyarrow==2 awswrangler
.. note:: Make sure to freeze the Wrangler version in the bootstrap for productive
environments (e.g. awswrangler==1.8.1)
environments (e.g. awswrangler==2.4.0)
.. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why a previous installation of pyarrow 2 is required.
From Source
-----------
Expand Down

0 comments on commit ea92e5c

Please sign in to comment.