From d8b3e5d525cb8a35a6814708617414465d87b247 Mon Sep 17 00:00:00 2001 From: Colebow Date: Tue, 19 Jul 2022 13:30:21 -0700 Subject: [PATCH] Add Athena partition projection docs --- docs/src/main/sphinx/connector/hive.rst | 105 ++++++++++++++++++++++++ 1 file changed, 105 insertions(+) diff --git a/docs/src/main/sphinx/connector/hive.rst b/docs/src/main/sphinx/connector/hive.rst index 17c53fa2ebb9..473dea155e80 100644 --- a/docs/src/main/sphinx/connector/hive.rst +++ b/docs/src/main/sphinx/connector/hive.rst @@ -250,6 +250,26 @@ security options in the Hive connector. .. _hive_configuration_properties: +Accessing tables with Athena partition projection metadata +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`Partition projection `_ +is a feature of AWS Athena often used to speed up query processing with highly +partitioned tables. + +Trino supports partition projection table properties stored in the metastore, +and it reimplements this functionality. Currently, there is a limitation in +comparison to AWS Athena for date projection, as it only supports intervals of +``DAYS``, ``HOURS``, ``MINUTES``, and ``SECONDS``. + +If there are any compatibility issues blocking access to a requested table when +you have partition projection enabled, you can set the +``partition_projection_ignore`` table property to ``true`` for a table to bypass +any errors. + +Refer to :ref:`hive_table_properties` and :ref:`hive_column_properties` for +configuration of partition projection. + Hive configuration properties ----------------------------- @@ -408,6 +428,8 @@ Property Name Description managed tables. See the :ref:`hive_table_properties` for more information on auto_purge. + +``hive.partition-projection-enabled`` Enables Athena partition projection support ``false`` ================================================== ============================================================ ============ ORC format configuration properties @@ -1143,6 +1165,89 @@ See the :ref:`hive_examples` for more information. Requires ORC format. This property may be shown as true for insert-only tables created using older versions of Hive. - + * - ``partition_projection_enabled`` + - Enables partition projection for selected table. + Mapped from AWS Athena table property + `projection.enabled `_. + - + * - ``partition_projection_ignore`` + - Ignore any partition projection properties stored in the metastore for + the selected table. This is a Trino-only property which allows you to + work around compatibility issues on a specific table, and if enabled, + Trino ignores all other configuration options related to partition + projection. + - + * - ``partition_projection_location_template`` + - Projected partition location template, such as + ``s3a://test/name=${name}/``. Mapped from the AWS Athena table property + `storage.location.template `_ + - ``${table_location}/${partition_name}`` + +.. _hive_column_properties: + +Column properties +----------------- + +.. list-table:: Hive connector column properties + :widths: 20, 60, 20 + :header-rows: 1 + + * - Property name + - Description + - Default + * - ``partition_projection_type`` + - Defines the type of partition projection to use on this column. + May be used only on partition columns. Available types: + ``ENUM``, ``INTEGER``, ``DATE``, ``INJECTED``. + Mapped from the AWS Athena table property + `projection.${columnName}.type `_. + - + * - ``partition_projection_values`` + - Used with ``partition_projection_type`` set to ``ENUM``. Contains a static + list of values used to generate partitions. + Mapped from the AWS Athena table property + `projection.${columnName}.values `_. + - + * - ``partition_projection_range`` + - Used with ``partition_projection_type`` set to ``INTEGER`` or ``DATE`` to + define a range. It is a two-element array, describing the minimum and + maximum range values used to generate partitions. Generation starts from + the minimum, then increments by the defined + ``partition_projection_interval`` to the maximum. For example, the format + is ``['1', '4']`` for a ``partition_projection_type`` of ``INTEGER`` and + ``['2001-01-01', '2001-01-07']`` or ``['NOW-3DAYS', 'NOW']`` for a + ``partition_projection_type`` of ``DATE``. Mapped from the AWS Athena + table property + `projection.${columnName}.range `_. + - + * - ``partition_projection_interval`` + - Used with ``partition_projection_type`` set to ``INTEGER`` or ``DATE``. It + represents the interval used to generate partitions within + the given range ``partition_projection_range``. Mapped from the AWS Athena + table property + `projection.${columnName}.interval `_. + - + * - ``partition_projection_digits`` + - Used with ``partition_projection_type`` set to ``INTEGER``. + The number of digits to be used with integer column projection. + Mapped from the AWS Athena table property + `projection.${columnName}.digits `_. + - + * - ``partition_projection_format`` + - Used with ``partition_projection_type`` set to ``DATE``. + The date column projection format, defined as a string such as ``yyyy MM`` + or ``MM-dd-yy HH:mm:ss`` for use with the + `Java DateTimeFormatter class `_. + Mapped from the AWS Athena table property + `projection.${columnName}.format `_. + - + * - ``partition_projection_interval_unit`` + - Used with ``partition_projection_type=DATA``. + The date column projection range interval unit + given in ``partition_projection_interval``. + Mapped from the AWS Athena table property + `projection.${columnName}.interval.unit `_. + - .. _hive_special_columns: