Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Hive stats definition to use standard aggregations #14294

Merged
merged 4 commits into from
Sep 28, 2022

Conversation

findepi
Copy link
Member

@findepi findepi commented Sep 26, 2022

This updates Hive to use standard aggregation functions provided with
Trino for stats aggregation (where possible), instead of using the
ColumnStatisticType enum. The now-unused options in that enum are
deprecated.

Follows #14233 and replaces #14220

@findepi findepi added the no-release-notes This pull request does not require release notes entry label Sep 26, 2022
@cla-bot cla-bot bot added the cla-signed label Sep 26, 2022
@alexjo2144
Copy link
Member

Seems reasonable, is the hope to eventually remove the code that handles the deprecated ColumnStatisticType enum values?

@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

Seems reasonable, is the hope to eventually remove the code that handles the deprecated ColumnStatisticType enum values?

That's my goal, yes.

Introduce HiveColumnStatisticType with only the set of values used by
Hive connector.
This updates Hive to use standard aggregation functions provided with
Trino for stats aggregation (where possible), instead of using the
`ColumnStatisticType` enum. The now-unused options in that enum are
deprecated.
@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

CI #13946 and one related

@findepi findepi force-pushed the findepi/hive-stats-type branch from fb9992b to 11cdd73 Compare September 27, 2022 08:47
@findepi findepi requested a review from ebyhr September 27, 2022 08:48
@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

CI https://github.com/trinodb/trino/actions/runs/3134207114/jobs/5088871945

tests               | 2022-09-27 16:18:32 INFO: FAILURE     /    io.trino.tests.product.hive.TestRoles.testAdminCanAddColumnToAnyTable (Groups: authorization, profile_specific_tests, roles) took 0.5 seconds
tests               | 2022-09-27 16:18:32 SEVERE: Failure cause:
tests               | io.trino.tempto.query.QueryExecutionException: java.sql.SQLException: Query failed (#20220927_103332_00246_dkfr9): line 1:1: Table 'hive.default.test_table' of unsupported type already exists
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:119)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQuery(JdbcQueryExecutor.java:84)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$1.lambda$executeQuery$0(QueryExecutors.java:60)
tests               | 	at net.jodah.failsafe.Functions.lambda$get$0(Functions.java:48)
tests               | 	at net.jodah.failsafe.RetryPolicyExecutor.lambda$supply$0(RetryPolicyExecutor.java:62)
tests               | 	at net.jodah.failsafe.Execution.executeSync(Execution.java:129)
tests               | 	at net.jodah.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
tests               | 	at net.jodah.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:67)
tests               | 	at io.trino.tests.product.utils.QueryExecutors$1.executeQuery(QueryExecutors.java:60)
tests               | 	at io.trino.tests.product.hive.TestRoles.testAdminCanAddColumnToAnyTable(TestRoles.java:652)
tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
tests               | 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
tests               | 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
tests               | 	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
tests               | 	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:104)
tests               | 	at org.testng.internal.Invoker.invokeMethod(Invoker.java:645)
tests               | 	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:851)
tests               | 	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1177)
tests               | 	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:129)
tests               | 	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:112)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:833)
tests               | Caused by: java.sql.SQLException: Query failed (#20220927_103332_00246_dkfr9): line 1:1: Table 'hive.default.test_table' of unsupported type already exists
tests               | 	at io.trino.jdbc.AbstractTrinoResultSet.resultsException(AbstractTrinoResultSet.java:1937)
tests               | 	at io.trino.jdbc.TrinoResultSet.getColumns(TrinoResultSet.java:319)
tests               | 	at io.trino.jdbc.TrinoResultSet.create(TrinoResultSet.java:62)
tests               | 	at io.trino.jdbc.TrinoStatement.internalExecute(TrinoStatement.java:262)
tests               | 	at io.trino.jdbc.TrinoStatement.execute(TrinoStatement.java:240)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:128)
tests               | 	at io.trino.tempto.query.JdbcQueryExecutor.execute(JdbcQueryExecutor.java:112)
tests               | 	... 22 more
tests               | 	Suppressed: java.lang.Exception: Query: CREATE TABLE hive.default.test_table (foo BIGINT)
tests               | 		at io.trino.tempto.query.JdbcQueryExecutor.executeQueryNoParams(JdbcQueryExecutor.java:136)
tests               | 		... 23 more
tests               | Caused by: io.trino.spi.TrinoException: line 1:1: Table 'hive.default.test_table' of unsupported type already exists
tests               | 	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:48)
tests               | 	at io.trino.sql.analyzer.SemanticExceptions.semanticException(SemanticExceptions.java:43)
tests               | 	at io.trino.execution.CreateTableTask.internalExecute(CreateTableTask.java:135)
tests               | 	at io.trino.execution.CreateTableTask.execute(CreateTableTask.java:119)
tests               | 	at io.trino.execution.CreateTableTask.execute(CreateTableTask.java:85)
tests               | 	at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:145)
tests               | 	at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:249)
tests               | 	at io.trino.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
tests               | 	at io.trino.$gen.Trino_397_58_g2172ab3____20220927_102755_2.run(Unknown Source)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
tests               | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
tests               | 	at java.base/java.lang.Thread.run(Thread.java:833)

& other TestRoles failures.

i don't know yet how this is related, investigating.

@ebyhr
Copy link
Member

ebyhr commented Sep 27, 2022

The upstream job looks facing the same issue. https://github.com/trinodb/trino/actions/runs/3134131222/jobs/5088779482

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.
Why not migrate the remaining two also?

@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

Why not migrate the remaining two also?

This should be done leveraging #14222 or equivalent
See also #14233 (comment)

@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

The upstream job looks facing the same issue. https://github.com/trinodb/trino/actions/runs/3134131222/jobs/5088779482

@ebyhr thanks. I noticed too that master is currently failing.
I run the tests locally and couldn't reproduce the failure though.

@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

CI #13288 , #11140

@findepi
Copy link
Member Author

findepi commented Sep 27, 2022

CI also TestRoles -> potential fix #14323

@losipiuk
Copy link
Member

Why not migrate the remaining two also?

This should be done leveraging #14222 or equivalent See also #14233 (comment)

Agreed. Eventually we should migrate to aggregations on composite expressions. But we could use already defined $internal$max_data_size_for_stats and $internal$sum_data_size_for_stats for now, right?
Ok for me both ways anyway.

@findepi
Copy link
Member Author

findepi commented Sep 28, 2022

But we could use already defined $internal$max_data_size_for_stats and $internal$sum_data_size_for_stats for now, right?

I would rename them before letting connectors use them (in one of the PRs incarnations i did that).
But then we would expose them only to get rid of them later... Maybe we can decompose them and never expose them?

@findepi findepi merged commit 2a56ea9 into trinodb:master Sep 28, 2022
@findepi findepi deleted the findepi/hive-stats-type branch September 28, 2022 15:35
@github-actions github-actions bot added this to the 398 milestone Sep 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

4 participants