You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, when writing Hive tables, Presto arbitrarily distributes the data across the writer nodes. This results in good parallelization for the common case of a query writing a single partition, since each worker will write a separate file for the partition. For queries that need to writer hundreds or thousands of partitions, this behavior causes problems as each worker will end up writing a file to each partition, so there are hundreds (or thousands) of large output buffers and open file streams.
To resolve this issue, we should add a session property that when set changes the InsertLayout and CreateTableLayout in HiveMetadata to declare a partitioning based on the partition keys.
The text was updated successfully, but these errors were encountered:
Today, when writing Hive tables, Presto arbitrarily distributes the data across the writer nodes. This results in good parallelization for the common case of a query writing a single partition, since each worker will write a separate file for the partition. For queries that need to writer hundreds or thousands of partitions, this behavior causes problems as each worker will end up writing a file to each partition, so there are hundreds (or thousands) of large output buffers and open file streams.
To resolve this issue, we should add a session property that when set changes the InsertLayout and CreateTableLayout in HiveMetadata to declare a partitioning based on the partition keys.
The text was updated successfully, but these errors were encountered: