Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BigQuery dependencies to support HOURLY partitioning of tables #4968

Merged

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Aug 25, 2020

I did not update to latest BOM version to minimize the impact of this change.

Hourly partitioning was added in googleapis/java-bigquery@90f9980#diff-3dfcf884473c6d47c7c3debc58ce20c0

@losipiuk
Copy link
Member

LGTM - it would be great to add some test coverage for this one. PT?

@findepi
Copy link
Member

findepi commented Aug 25, 2020

@losipiuk We have no tests for BigQuery currently.
cc @davidrabinowitz

@losipiuk
Copy link
Member

@losipiuk We have no tests for BigQuery currently.

Yeah - I know :)

cc @davidrabinowitz

@ebyhr
Copy link
Member

ebyhr commented Aug 25, 2020

Though I'm testing this PR manually, it throws the following exception regardless of hourly partition.

 CREATE TABLE
   ebyhr.t4968d (transaction_id INT64,
     transaction_ts TIMESTAMP)
 PARTITION BY
   TIMESTAMP_TRUNC(transaction_ts, HOUR)
presto> select * from bigquery.ebyhr.t4968d;
Query 20200825_134423_00000_k6umf failed: io/grpc/protobuf/ProtoUtils
java.lang.NoClassDefFoundError: io/grpc/protobuf/ProtoUtils
	at com.google.cloud.bigquery.storage.v1beta1.stub.GrpcBigQueryStorageStub.<clinit>(GrpcBigQueryStorageStub.java:62)
	at com.google.cloud.bigquery.storage.v1beta1.stub.EnhancedBigQueryStorageStub.create(EnhancedBigQueryStorageStub.java:109)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.<init>(BigQueryStorageClient.java:144)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.create(BigQueryStorageClient.java:125)
	at io.prestosql.plugin.bigquery.BigQueryStorageClientFactory.createBigQueryStorageClient(BigQueryStorageClientFactory.java:55)
	at io.prestosql.plugin.bigquery.ReadSessionCreator.create(ReadSessionCreator.java:82)
	at io.prestosql.plugin.bigquery.BigQuerySplitManager.readFromBigQuery(BigQuerySplitManager.java:108)
	at io.prestosql.plugin.bigquery.BigQuerySplitManager.getSplits(BigQuerySplitManager.java:90)
	at io.prestosql.split.SplitManager.getSplits(SplitManager.java:87)
	at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:195)
	at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:177)
	at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:154)
	at io.prestosql.sql.planner.plan.TableScanNode.accept(TableScanNode.java:131)
	at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
	at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:129)
	at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:101)
	at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:451)
	at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:375)
	at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:245)
	at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:132)
	at io.prestosql.$gen.Presto_testversion____20200825_134414_3.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.ClassNotFoundException: io.grpc.protobuf.ProtoUtils
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
	... 24 more

presto>
presto> select * from bigquery.ebyhr.t4968d;
Query 20200825_134427_00001_k6umf failed: Could not initialize class com.google.cloud.bigquery.storage.v1beta1.stub.GrpcBigQueryStorageStub
java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.bigquery.storage.v1beta1.stub.GrpcBigQueryStorageStub
	at com.google.cloud.bigquery.storage.v1beta1.stub.EnhancedBigQueryStorageStub.create(EnhancedBigQueryStorageStub.java:109)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.<init>(BigQueryStorageClient.java:144)
	at com.google.cloud.bigquery.storage.v1beta1.BigQueryStorageClient.create(BigQueryStorageClient.java:125)
	at io.prestosql.plugin.bigquery.BigQueryStorageClientFactory.createBigQueryStorageClient(BigQueryStorageClientFactory.java:55)
	at io.prestosql.plugin.bigquery.ReadSessionCreator.create(ReadSessionCreator.java:82)
	at io.prestosql.plugin.bigquery.BigQuerySplitManager.readFromBigQuery(BigQuerySplitManager.java:108)
	at io.prestosql.plugin.bigquery.BigQuerySplitManager.getSplits(BigQuerySplitManager.java:90)
	at io.prestosql.split.SplitManager.getSplits(SplitManager.java:87)
	at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:195)
	at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:177)
	at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:154)
	at io.prestosql.sql.planner.plan.TableScanNode.accept(TableScanNode.java:131)
	at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
	at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:129)
	at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:101)
	at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:451)
	at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:375)
	at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:245)
	at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:132)
	at io.prestosql.$gen.Presto_testversion____20200825_134414_3.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

@findepi
Copy link
Member

findepi commented Aug 25, 2020

@losipiuk @ebyhr i put it wrong.
We do have tests. We just do not have testing.

@ebyhr would you be able to add some notes in TestBigQueryIntegrationSmokeTest or in some README
instructing how to run it?

@ebyhr
Copy link
Member

ebyhr commented Aug 25, 2020

@findepi Sure, basically I just run BigQueryRunner with VM options -Dbigquery.credentials-file=<path-to-credential-json>.

@wendigo wendigo force-pushed the serafin/update-bigquery-dependencies branch from f8ed562 to 0625d7f Compare August 26, 2020 09:43
@wendigo
Copy link
Contributor Author

wendigo commented Aug 26, 2020

@ebyhr you were right after all :)

I've tested the previous version of this PR and there was indeed a problem with missing classes/dependencies. I've updated further to version 8.0.0 of BOM and it seems to work fine. I've been able to query HOUR-ly partitioned table created like this (taken from docs):

CREATE TABLE
   test.wendigo (transaction_id INT64,
     transaction_ts TIMESTAMP)
 PARTITION BY
   TIMESTAMP_TRUNC(transaction_ts, HOUR)
 OPTIONS
   ( partition_expiration_days=3,
     description="a table partitioned by transaction_ts" )

It's displayed in UI as HOUR-ly partitioned:

Screenshot 2020-08-26 at 11 45 02

I've been able to successfully query it using presto-cli (first query is before bumping dependencies):

presto:test> select * from wendigo;

Query 20200826_093322_00000_kss8b, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
4.94 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20200826_093322_00000_kss8b failed: com/google/cloud/bigquery/storage/v1beta1/BigQueryStorageGrpc

presto:test> select * from wendigo;
 transaction_id |       transaction_ts
----------------+-----------------------------
           1337 | 2020-08-26 09:25:34.016 UTC
(1 row)

Query 20200826_094100_00000_nm56m, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
6.23 [1 rows, 12B] [0 rows/s, 2B/s]

@findepi
Copy link
Member

findepi commented Aug 26, 2020

I've been able to query HOUR-ly partitioned table created like this (taken from docs):

CREATE TABLE
   test.wendigo (transaction_id INT64,
     transaction_ts TIMESTAMP)
 PARTITION BY
   TIMESTAMP_TRUNC(transaction_ts, HOUR)
 OPTIONS
   ( partition_expiration_days=3,
     description="a table partitioned by transaction_ts" )

It's displayed in UI as HOUR-ly partitioned:

Can this be added to TestBigQueryIntegrationSmokeTest?

@wendigo wendigo force-pushed the serafin/update-bigquery-dependencies branch from 0f4e5e6 to 89b8782 Compare August 26, 2020 12:44
@findepi findepi requested a review from ebyhr August 27, 2020 13:36
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for minor comments.

@wendigo wendigo requested a review from ebyhr August 31, 2020 10:42
@wendigo wendigo force-pushed the serafin/update-bigquery-dependencies branch from 89b8782 to 87963bd Compare August 31, 2020 10:42
@losipiuk
Copy link
Member

losipiuk commented Sep 1, 2020

#2390

@losipiuk losipiuk merged commit ae6c21b into trinodb:master Sep 1, 2020
@wendigo wendigo deleted the serafin/update-bigquery-dependencies branch September 1, 2020 13:50
@losipiuk losipiuk mentioned this pull request Sep 3, 2020
9 tasks
@martint martint added this to the 341 milestone Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants