Add ANALYZE statement to collect column statistics on demand #11376

jessesleeping · 2018-08-28T21:28:17Z

Added the following SQL statement:
ANALYZE qualifiedName (WITH properties)?

User can trigger column statistic collection on a table by calling this statement. Connector decides what statistic to collect and how to store the result. Connector can also support certain WITH properties to customized the statistic collection. If the WITH property is not specified, the default behavior is to collect default statistics for the whole table.

The third commit implements the SPI in hive connector. It supports an analyze property named partitions which is expected to be ARRAY[ARRAY[VARCHAR]]. The value of this property is a list of partitions, where each partition is represented by a list of partition column value in varchar.

kokosing

Just skimmed.

kokosing · 2018-08-29T06:33:26Z

presto-main/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

@@ -400,6 +401,27 @@ protected Scope visitDelete(Delete node, Optional<Scope> scope)
            return createAndAssignScope(node, scope, Field.newUnqualified("rows", BIGINT));
        }

+        @Override
+        protected Scope visitAnalyze(Analyze node, Optional<Scope> scope)


are you going to update access controls also, so we could check if user is privileged to analyze the table?

Will add it later.

Why do we need access control? If you can read the table, why can't you analyze it? Why not?

Because one may want to disable that for certain users.

ANALYZE changes the state of the table. It should require read+write access to the table. I am not sure if we need a separate "can analyze" privilege.

kokosing · 2018-08-29T06:35:05Z

presto-main/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java

+        analyze("ANALYZE t1 WITH (p1 = 'p1')");
+
+        assertFails(DUPLICATE_PROPERTY, ".* Duplicate property: p1", "ANALYZE t1 WITH (p1 = 'p1', p2 = 'p2', p1 = 'p3')");
+        assertFails(DUPLICATE_PROPERTY, ".* Duplicate property: p1", "ANALYZE t1 WITH (p1 = 'p1', \"p1\" = 'p2')");


if you pass properties like p1=p1,p1=p1, would it pass?

Good question. I assume you mean p1='p1', p1='p1'. The query will fail and the analyzer will report duplicate properties p1. I think this behavior is expected.

kokosing · 2018-08-29T06:36:15Z

presto-parser/src/main/java/com/facebook/presto/sql/tree/Analyze.java

+    public List<? extends Node> getChildren()
+    {
+        return ImmutableList.<Node>builder()
+                .addAll(properties).build();


what about the table?

move .build() to separate line

Good catch. I missed that when changing from a QualifiedName member to a Table member.

kokosing · 2018-08-29T06:41:06Z

presto-main/src/main/java/com/facebook/presto/sql/planner/plan/PlanNode.java

@@ -60,7 +60,8 @@
        @JsonSubTypes.Type(value = ExplainAnalyzeNode.class, name = "explainAnalyze"),
        @JsonSubTypes.Type(value = ApplyNode.class, name = "apply"),
        @JsonSubTypes.Type(value = AssignUniqueId.class, name = "assignUniqueId"),
-        @JsonSubTypes.Type(value = LateralJoinNode.class, name = "lateralJoin")})
+        @JsonSubTypes.Type(value = LateralJoinNode.class, name = "lateralJoin"),
+        @JsonSubTypes.Type(value = AnalyzeFinishNode.class, name = "analyzecommit")})


update name

The naming convention here is not consistent... I followed the one from TableFinishNode.class but I don't think it's a good example (e.g. missing upper case and mismatched name).

Agreed, the existing ones are a mess. I would go with analyzeFinish as that makes the name match and the camel case makes it slightly more readable.

findepi · 2018-08-29T07:50:31Z

third commit implements the SPI in hive connector. It supports an analyze property named partitions which is expected to be ARRAY[ARRAY[VARCHAR]].

shouldn't this be ARRAY[ROW]? That would save users from doing cast to varchar manually

jessesleeping · 2018-08-29T17:56:46Z

@findepi I didn't use ROW because the column types will become dynamic according to table schema. I think the definition of property type should be static.

findepi · 2018-08-29T17:58:43Z

I think the definition of property type should be static.

I didn't think about this. Did you check what would it take to remove this staticness constraint?

electrum · 2018-08-29T18:12:02Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorAnalyzeTableMetadata.java

+        return tableMetadata;
+    }
+
+    public ConnectorTableHandle getTableHandle()


I don't think we need this here. This analyze metadata object is created from a table handle, so the caller can hold onto it.

We need the getAnalyzeTableMetadata() to return an updated table handle which has dependency on the analyze properties (e.g. what range of table to analyze). It will later be passed to getTableLayouts()

electrum · 2018-08-29T18:14:55Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorAnalyzeTableMetadata.java

+        return statisticsMetadata;
+    }
+
+    public ConnectorTableMetadata getTableMetadata()


Why do we need table metadata here? It seems more natural for the engine to ask for the table metadata separately, as normal.

Is there a use case for the metadata here to be different than normal metadata (maybe dependent on the analyze properties)?

I used that mainly to save a getTableMetadata() call when planning table scan for the analyze statement. The connector will need to call getTableMetadata() when generating TableStatisticsMetadata.

Alternatively, we can add ColumnMetadata to TableStatisticsMetadata to describe the property of columns that we are collecting statistics for. Then we can use these information when planning the table scan.

I don't see any use case that we want to return different TableMetadata for analyze statement.

I used that mainly to save a getTableMetadata() call when planning table scan for the analyze statement.

It should be cheap. When including the Metadata here you are doing all the expensive operation (like going to the metastore, and so on) anyway. You are using this metadata in a single place. So it really seems to be just a matter of a Java virtual method invocation overhead. This is negligible, and the overhead is constant. So i woudn't bother.

The idea is that, when generating the TableStatisticsMetadata, the connector has already done a getTableMetadata() which contains all the expensive operations. And in order to plan the table scan for ANALYZE, we need a list of column handle which is not in TableStatisticsMetadata. So I included a TableMetadata object in the AnalyzeTableMetadata to reuse it and avoid one more call.

Like I said, an alternative is to include the column information in TableStatisticsMetadata to describe the properties of columns we are collecting statistics for. So that later we can use it to plan table scan for ANALYZE. Any thoughts?

I don't feel strong about it. You can keep metadata in place if you would like.

electrum · 2018-08-29T18:16:20Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

@@ -285,6 +286,30 @@ default TableStatisticsMetadata getStatisticsCollectionMetadata(ConnectorSession
        return TableStatisticsMetadata.empty();
    }

+    /**
+     * Get metadata for an ANALYZE query, describing what statistics to collect and how to collect them.


Nit: lowercase "analyze", since we are talking about logical functionality, not specific SQL syntax

Are we actually mentioning the SQL syntax here? It's similar to the documentation for beginQuery().

Based on @martint's comment below, something like "Get metadata for table analysis" seems more appropriate.

electrum · 2018-08-29T18:17:13Z

presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java

+     */
+    default ConnectorAnalyzeTableHandle beginAnalyze(ConnectorSession session, ConnectorTableHandle tableHandle)
+    {
+        throw new PrestoException(NOT_SUPPORTED, "This connector does not support analyze");


Should this be

throw new PrestoException(GENERIC_INTERNAL_ERROR, "ConnectorMetadata getAnalyzeTableMetadata() is implemented without beginAnalyze()");

electrum · 2018-08-29T18:20:41Z

presto-tpch/src/main/java/com/facebook/presto/tpch/TpchMetadata.java

+    public ConnectorAnalyzeTableHandle beginAnalyze(ConnectorSession session, ConnectorTableHandle tableHandle)
+    {
+        // do nothing
+        return new ConnectorAnalyzeTableHandle() {};


I think this won't work because the handle class isn't returned from TpchHandleResolver (which is needed by the handle serialization system).

Add an ANALYZE query in TestTpchDistributedQueries

Thanks. I added it for the TestLogicalPlanner and didn't think about the consequence of actual execution. Will fix it.

electrum · 2018-08-29T18:22:55Z

presto-spi/src/main/java/com/facebook/presto/spi/ConnectorAnalyzeTableHandle.java

+ */
+package com.facebook.presto.spi;
+
+public interface ConnectorAnalyzeTableHandle


Nit: write this as

@SuppressWarnings("MarkerInterface") public interface ConnectorAnalyzeTableHandle {}

electrum · 2018-08-29T18:28:32Z

presto-main/src/main/java/com/facebook/presto/sql/planner/optimizations/BeginTableWrite.java

+            PlanNode child = node.getSource();
+            child.accept(this, context);
+
+            AnalyzeFinishNode.AnalyzeHandle materializedHandle =


Should this be analyzeHandle?

electrum · 2018-08-29T18:29:54Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveAnalyzeProperties.java

+                        false,
+                        value -> ImmutableList.copyOf(((Collection<?>) value).stream()
+                                .map(partition -> ImmutableList.copyOf(((Collection<?>) partition).stream()
+                                        .map(name -> ((String) name).toLowerCase(ENGLISH)).collect(Collectors.toList())))


Use toImmutableList() in both places

electrum · 2018-08-29T18:30:34Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveAnalyzeProperties.java

+                        List.class,
+                        ImmutableList.of(),
+                        false,
+                        value -> ImmutableList.copyOf(((Collection<?>) value).stream()


Why do a a copy of the collection before calling stream()? We don't do that in other places. Just casting should be sufficient.

electrum · 2018-08-29T18:37:21Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveMetadata.java

    {
        Set<ColumnStatisticMetadata> columnStatistics = columns.stream()
                .filter(column -> !partitionedBy.contains(column.getName()))
                .filter(column -> !column.isHidden())
                .map(this::getColumnStatisticMetadata)
                .flatMap(List::stream)
                .collect(toImmutableSet());
-        return new TableStatisticsMetadata(columnStatistics, ImmutableSet.of(), partitionedBy);
+
+        if (!includeRowCount) {


I think this would be cleaner as

Set<TableStatisticType> statTypes = includeRowCount ? ImmutableSet.of(ROW_COUNT) : ImmutableSet.of(); return new TableStatisticsMetadata(columnStatistics, statTypes, partitionedBy);

This removes the duplication and makes it easy to see which part is variable.

electrum · 2018-08-29T19:06:29Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java

+                .collect(toList());
+
+        Iterable<HivePartition> partitionsIterable = () -> partitionValues.stream()
+                .map(partitionValue -> {


It might be cleaner to extract a method for this lambda

electrum · 2018-08-29T19:08:34Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveTableHandle.java

@@ -27,14 +28,27 @@
 {
    private final String schemaName;
    private final String tableName;
+    private final Optional<HiveAnalyzePropertiesHandle> analyzeHandle;


Calling this inner object a "handle" is a bit confusing since it's not a handle, it's just a bean. It would probably be cleaner to inline it here as analyzePartitions.

I made it a separate class because I though we might add new analyze properties in the future and it would be good if we can wrap all of them in a class. I can inline analyzePartitions since we don't have the need now.

It's fine to have a class if you or anyone else has specific ideas of what we would add. In that case, let's name it HiveAnalyzeProperties and analyzeProperties, as "handle" is confusing. Otherwise, inline it, since that makes it easier to read and we avoid adding stuff speculatively as it often goes unused and just clutters the code.

electrum · 2018-08-29T19:09:14Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveTableHandle.java

    }

    @Override
    public String toString()
    {
-        return schemaName + ":" + tableName;
+        return schemaName + ":" + tableName + ":" + analyzeHandle;


This toString() is used when printing the query plan. We probably don't want to include the partition values.

I think we may want to print it.

If the user is analyzing the whole table, the partition values filed here will be empty so nothing would be printed.

If the user specify a list of partitions to analyze, having them in the plan can help user verify if we actually analyzed those partitions. The plan will be as long as the analyze query itself.

An alternative way is to print up to N partitions here.

electrum · 2018-08-29T19:10:26Z

...to-hive/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java

+    public synchronized void overwritePartitionStatistics(Table table, Map<List<String>, PartitionStatistics> partitionStatisticsMap)
+    {
+        setExclusive((delegate, hdfsEnvironment) -> {
+            for (Map.Entry<List<String>, PartitionStatistics> entry : partitionStatisticsMap.entrySet()) {


Use forEach on Map as that lets you name the key/value

electrum · 2018-08-29T19:11:50Z

...to-hive/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java

        List<String> columnNames = table.getPartitionColumns().stream()
                .map(Column::getName)
                .collect(toImmutableList());
        return makePartName(columnNames, partitionValues);
    }

+    private String getPartitionName(String databaseName, String tableName, List<String> partitionValues)


Nit: move this method above, as you usually chain methods down rather than up (the root method is at the bottom of the file)

electrum · 2018-08-29T19:12:51Z

presto-hive/src/main/java/com/facebook/presto/hive/HiveErrorCode.java

@@ -61,6 +61,7 @@
    HIVE_TABLE_NOT_READABLE(34, USER_ERROR),
    HIVE_TABLE_DROPPED_DURING_QUERY(35, EXTERNAL),
    // HIVE_TOO_MANY_BUCKET_SORT_FILES(36) is deprecated
+    HIVE_PARTITION_DOES_NOT_EXIST(37, USER_ERROR),


HIVE_PARTITION_NOT_FOUND would be more consistent with other error codes

jessesleeping · 2018-08-29T19:17:17Z

@findepi Today the when connector registering properties (e.g. session property, table property, column property), it register both the name and type of the property (PropertyMetadata.java). When resolving properties, we don't depend on any query specific information (AbstractPropertyManager::getProperties()).

If we are going to remove that static constraint, we will need to change those things.

But I am more worrying about the semantics than the implementation. Should we ever allow dynamic property type? What are the other use cases?

arhimondr

Neat. Thanks for doing this.

arhimondr · 2018-08-30T01:14:31Z

presto-main/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

@@ -400,6 +401,27 @@ protected Scope visitDelete(Delete node, Optional<Scope> scope)
            return createAndAssignScope(node, scope, Field.newUnqualified("rows", BIGINT));
        }

+        @Override
+        protected Scope visitAnalyze(Analyze node, Optional<Scope> scope)


Why do we need access control? If you can read the table, why can't you analyze it? Why not?

arhimondr · 2018-08-30T01:32:47Z

presto-main/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

@@ -508,6 +530,7 @@ protected Scope visitRenameSchema(RenameSchema node, Optional<Scope> scope)
        protected Scope visitCreateTable(CreateTable node, Optional<Scope> scope)
        {
            validateProperties(node.getProperties(), scope);
+


supernit: extra change

arhimondr · 2018-08-30T01:59:34Z

presto-main/src/main/java/com/facebook/presto/metadata/AnalyzeTableMetadata.java

+public class AnalyzeTableMetadata
+{
+    private final TableStatisticsMetadata statisticsMetadata;
+    private final TableMetadata tableMetadata;


Why do we need both here? TableHandle and TableMetadata? We can get TableMetadata by calling getTableMetadata(..., tableHandle).

arhimondr · 2018-08-30T02:32:36Z

presto-main/src/main/java/com/facebook/presto/operator/AnalyzeFinishOperator.java

+
+        Collection<ComputedStatistics> computedStatistics = computedStatisticsBuilder.build();
+        analyzeFinisher.finishAnalyze(computedStatistics);
+        return new Page(0);


Did we decide to go with an empty? Maybe we should follow the convention for other DDLS and return single boolean column "result" with a value true?

I am changing it to return null. But I am not sure if we should add an OutputNode in the ANALYZE query plan in the first place.

arhimondr · 2018-08-30T02:38:45Z

presto-main/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.java

+                analysis.getParameters());
+
+        AnalyzeTableMetadata analyzeTableMetadata = metadata.getAnalyzeTableMetadata(session, targetTable, analyzeProperties);
+        // Replace the target table with the one from analyze table metadata


Add a comment explaining why do we need this. Say that TableHandle returned by getAnalyzeTableMetadata may contain some additional information required for ANALYZE

arhimondr · 2018-08-30T03:42:07Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java

+                .map(partitionValue -> getPartitionNamesByParts(metastore, tableName, partitionValue))
+                .flatMap(partitionNames -> partitionNames.stream()
+                        .map(partitionName -> parseValuesAndFilterPartition(tableName, partitionName, partitionColumns, partitionTypes, alwaysTrue())))
+                .filter(Optional::isPresent)


Don't filter, but verify. It should never return empty, since you are passing alwaysTrue() domain

arhimondr · 2018-08-30T03:42:55Z

presto-hive/src/main/java/com/facebook/presto/hive/HivePartitionManager.java

+                .map(Optional::get)
+                .iterator();
+
+        return new HivePartitionResult(partitionColumns, partitionsIterable, all(), all(), all(), hiveBucketHandle, Optional.empty());


all,all,none

arhimondr · 2018-08-30T03:43:48Z

...to-hive/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java

+                                table.getDatabaseName(),
+                                table.getTableName(),
+                                getPartitionName(table, partitionValues),
+                                statistics -> partitionStatistics)));


Preserve basic statistics here

arhimondr · 2018-08-30T03:43:54Z

...to-hive/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java

@@ -293,6 +293,22 @@ public synchronized void renameDatabase(String source, String target)
        setExclusive((delegate, hdfsEnvironment) -> delegate.renameDatabase(source, target));
    }

+    public synchronized void overwriteTableStatistics(Table table, PartitionStatistics tableStatistics)
+    {
+        setExclusive((delegate, hdfsEnvironment) -> delegate.updateTableStatistics(table.getDatabaseName(), table.getTableName(), statistics -> tableStatistics));


preserve basic statistics here

arhimondr · 2018-08-30T03:46:10Z

presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java

@@ -2776,6 +2776,210 @@ public void testCollectColumnStatisticsOnInsert()
        assertUpdate(format("DROP TABLE %s", tableName));
    }

+    @Test


This test looks good, but could you please update the product test (TestHiveTableStatistics) as well? I think since we got our own analyze we no longer need to call Hive's analyze . So you can simply replace the database the ANALYZE is being called at.

CC: @findepi @kokosing

@jessesleeping Let me know if you need any help with figuring out product tests

@jessesleeping @arhimondr i am concerned about removing analyze-on-hive from product tests. We test with different Hive versions and they can be setting different table/partition properties during ANALYZE (now or some future versions, like Hive 3). We should have tests ensuring proper interop between Hive's ANALYZE and Presto's reading statistics.

However, as we have our ANALYZE now, we could try the contrary -- but I don't know if Hive has useful SHOW STATS (or equivalent)

i am concerned about removing analyze-on-hive from product tests

I was so looking forward to it. Running map resuces to analyze table in Hive used to slow down overall test run time a lot =) I don't feel strong about it, but i seems to be a little bit extra to test the Hive's analyze.

We are not testing Hive's analyze. We are testing that Presto can handle table properties as those set by Hive's analyze.

What about defining some 3 tables (unanalyzed, analyzed, analyzed with columns) directly in the image and analyze it during image build time?
Of course, having test data setup in the image is not elastic (eg think we need to cover new data type), but maybe saved time pays off overall.
or, maybe, we can speed up analyze in Hive by using different execution framework?

jessesleeping · 2018-09-01T03:47:13Z

While adding product test for ANALYZE in TestHiveTableStatistics.java, I found something wired in testStatisticsForUnpartitionedTable(). In the column statistic test it asserts that the n_nationkey, n_name and n_comment columns have respectively 19, 24 and 31 distinct values, which seems totally wrong. The table only has 25 rows, it's impossible that the n_comment column has 31 distinct values. The correct values should be 25, 25 and 25.

The test still passed successfully. Is this a known bug?

@kokosing @findepi @losipiuk

findepi · 2018-09-01T20:06:24Z

test it asserts that the n_nationkey, n_name and n_comment columns have respectively 19, 24 and 31 distinct values, which seems totally wrong.

@jessesleeping the test asserts what hive is expected to produce as NDV for these columns. As with any estimate, this can deviate from actual values. In Hive, I observed NDV estimate deviating from actual number of distinct values by as much as 100% for small tables.

jessesleeping · 2018-09-05T00:02:45Z

Fixed:
(1) Nit fixes.
(2) Fixed the return value of AnalyzeFinishOperator::getOutput(). Returned null instead of an empty page.
(3) Avoided overwriting existing hive basic stats.
(4) Added production test for HiveBasicStatistics

Pending fixes:
(1) Collect file count and file sizes in finishAnalyze()
(2) Add production test for ANALYZE to TestHiveTableStatistics.

Pending discussions:
(1) Implicit wildcard.
(2) Partition name VS partition column value list.

jessesleeping · 2018-09-12T20:35:43Z

Update the PR with fixes.

Pending commit:

Calculate file count and size in finishAnalyze

arhimondr

Add error message on column count mismatch when making partition name

LGTM

Extracted-From: prestodb/presto#11376

Fix a bug where creating empty parition using CALL statement throws exceptions when using file based metastore implementation. Extracted-From: prestodb/presto#11376

Extracted-From: prestodb/presto#11376

jessesleeping assigned arhimondr and jessesleeping Aug 28, 2018

jessesleeping requested a review from arhimondr August 28, 2018 21:28

facebook-github-bot added the CLA Signed label Aug 28, 2018

jessesleeping force-pushed the analyze branch 3 times, most recently from 19dff4f to 2f7dab0 Compare August 29, 2018 00:40

kokosing reviewed Aug 29, 2018

View reviewed changes

electrum reviewed Aug 29, 2018

View reviewed changes

jessesleeping force-pushed the analyze branch from 2f7dab0 to ae85f6d Compare August 30, 2018 01:49

arhimondr reviewed Aug 30, 2018

View reviewed changes

findepi self-assigned this Aug 30, 2018

jessesleeping force-pushed the analyze branch 2 times, most recently from b15f3e9 to caa95cc Compare September 4, 2018 23:52

jessesleeping force-pushed the analyze branch from caa95cc to e3c2731 Compare September 5, 2018 00:04

jessesleeping force-pushed the analyze branch from e3c2731 to b88b714 Compare September 12, 2018 20:34

jessesleeping force-pushed the analyze branch 3 times, most recently from e638d7c to ad13914 Compare September 14, 2018 21:13

arhimondr reviewed Sep 14, 2018

View reviewed changes

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

Add error message on column count mismatch when making partition name

ae19c09

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

Move getPartitionLocation to MetastoreUtil

4220e1a

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

Validate no partition value is null when making partition name

de589d4

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

Fix partition column range statistics for empty partitions

7c6b846

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

ANALYZE statement: Parser

618b4d0

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

ANALYZE statement: Analyzer, planner and execution

b5466ff

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to starburstdata/trino that referenced this pull request Jan 29, 2019

ANALYZE statement: Implement ANALYZE in Hive connector

ef74699

Extracted-From: prestodb/presto#11376

sopel39 mentioned this pull request Jan 29, 2019

Analyze PR trinodb/trino#99

Merged

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

Add error message on column count mismatch when making partition name

82bc87f

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

Move getPartitionLocation to MetastoreUtil

ecd0c0f

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

Validate no partition value is null when making partition name

be61588

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

Fix partition column range statistics for empty partitions

f800115

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

ANALYZE statement: Parser

82574c8

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

ANALYZE statement: Analyzer, planner and execution

df3f4df

Extracted-From: prestodb/presto#11376

sopel39 pushed a commit to trinodb/trino that referenced this pull request Jan 29, 2019

ANALYZE statement: Implement ANALYZE in Hive connector

b747ecb

Extracted-From: prestodb/presto#11376

wenleix mentioned this pull request Jan 30, 2019

Fix creating empty non-bucketed Hive partition #12204

Merged

jessesleeping force-pushed the analyze branch from 013dc7f to ffa1b60 Compare February 1, 2019 02:34

jessesleeping unassigned martint Feb 1, 2019

jessesleeping added 7 commits January 31, 2019 18:38

Add error message on column count mismatch when making partition name

becd677

Move getPartitionLocation to MetastoreUtil

64dbe1b

Validate no partition value is null when making partition name

2798fa6

Fix partition column range statistics for empty partitions

aa0ddb9

ANALYZE statement: Parser

dbb82b5

ANALYZE statement: Analyzer, planner and execution

70add48

ANALYZE statement: Implement ANALYZE in Hive connector

b7079d3

jessesleeping force-pushed the analyze branch from ffa1b60 to b7079d3 Compare February 1, 2019 18:45

jessesleeping merged commit cca4e4a into prestodb:master Feb 1, 2019

mbasmanova mentioned this pull request Feb 28, 2019

Document ANALYZE statement for stats collection #12400

Closed

jessesleeping deleted the analyze branch June 5, 2019 18:30

Add ANALYZE statement to collect column statistics on demand #11376

Add ANALYZE statement to collect column statistics on demand #11376

Conversation

jessesleeping commented Aug 28, 2018 • edited Loading

kokosing left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

electrum Aug 29, 2018 • edited Loading

Choose a reason for hiding this comment

findepi commented Aug 29, 2018

jessesleeping commented Aug 29, 2018

findepi commented Aug 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessesleeping commented Aug 29, 2018

arhimondr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jessesleeping commented Sep 1, 2018

findepi commented Sep 1, 2018

jessesleeping commented Sep 5, 2018

jessesleeping commented Sep 12, 2018

arhimondr left a comment

Choose a reason for hiding this comment

jessesleeping commented Aug 28, 2018 •

edited

Loading

electrum Aug 29, 2018 •

edited

Loading