Not able to use Z-Order spark extension #2612

Dearkano · 2022-05-10T00:20:23Z

Dearkano
May 10, 2022

Hi Kyuubi Community,

I'm trying to use Kyuubi's Z-Order extension alone to optimize the hive table in S3, but I met with this issue.

Download the jar from the release package, and copy it to the jars folder in $SPARK_HOME.
In the Python script, I add this line to spark_builder
spark_builder = SparkSession.builder.config("spark.sql.extensions", "org.apache.kyuubi.sql.KyuubiSparkSQLExtension").getOrCreate()
Run the OPTIMIZE command
OPTIMIZE parquet.`s3a://my_bucket/my_db/my_table/` ZORDER BY id, name

When I run the script, the error occurs:
mismatched input 'OPTIMIZE' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

Kyuubi: 1.5.1
Spark: 3.2.1

I have struggled for quite a while but cannot find any clue, any help will be much appreciated!
Thank you!

Answered by ulysses-you

May 10, 2022

hi @Dearkano look at the error message, it seems the kyuubi extension does not work, please make sure the config spark.sql.extensions=org.apache.kyuubi.sql.KyuubiSparkSQLExtension actually apply. In case your spark session is from get rather than create, you can add the config into conf/spark-defaults.conf and restart the pyspark application.

Another thing you may need to know is, for now Kyuubi only support optimize Hive table. That said, we do not support the optimize sqlOnFiles which is your case.

Here is the Z-Order docs.

View full answer

ulysses-you · 2022-05-10T01:28:46Z

ulysses-you
May 10, 2022
Collaborator

hi @Dearkano look at the error message, it seems the kyuubi extension does not work, please make sure the config spark.sql.extensions=org.apache.kyuubi.sql.KyuubiSparkSQLExtension actually apply. In case your spark session is from get rather than create, you can add the config into conf/spark-defaults.conf and restart the pyspark application.

Another thing you may need to know is, for now Kyuubi only support optimize Hive table. That said, we do not support the optimize sqlOnFiles which is your case.

Here is the Z-Order docs.

0 replies

Dearkano · 2022-05-11T04:24:03Z

Dearkano
May 11, 2022
Author

Hi @ulysses-you , thank you for your reply!!
I checked again with my code, this time I tried it on AWS EMR with Spark3.2
I downloaded the jar and copied it into $SPARK_HOME/jars, then I added
spark.sql.extensions org.apache.kyuubi.sql.KyuubiSparkSQLExtension
(space instead of =, actually I tried both)
and then ran the script to OPTIMIZE the table,
OPTIMIZE database.table ZORDER BY col1, col2
the same error occurred again.
mismatched input 'OPTIMIZE' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 0)

I also confirmed that jar is in the right place since if I removed that jar, and ran the script again, the following error would occur:

22/05/11 04:34:38 WARN SparkSession: Cannot use org.apache.kyuubi.sql.KyuubiSparkSQLExtension to configure session extensions.
java.lang.ClassNotFoundException: org.apache.kyuubi.sql.KyuubiSparkSQLExtension

Also, for the case, you said that Kyuubi only supports hive table, I'm using hive on S3 with Glue as the metastore and queried by Spark/Trino (no hive server or thrift service), my intention is to take advantage of Kyuubi's Z-Order optimize on parquet files instead of using the whole Kyuubi. I'm not sure if this is supported by Kyuubi? Thank you again!

1 reply

ulysses-you May 11, 2022
Collaborator

then I added spark.sql.extensions org.apache.kyuubi.sql.KyuubiSparkSQLExtension

Can you explain the details ? I look at the docs of aws emr Spark (sorry I have not used it). Have you tried put the config into spark-defaults—Sets values in the spark-defaults.conf file. ?

Also, is it possible that provide the environment Tab UI snapshot ?

Dearkano · 2022-05-12T00:13:51Z

Dearkano
May 12, 2022
Author

Hi @ulysses-you thank you for investigating!
Just one quick question, Z-Order only supports hive table means we cannot optimize delta tables in delta lake right?

2 replies

ulysses-you May 12, 2022
Collaborator

That's right. But delta has already planned to support optimize Z-Order, see it's roadmap delta-io/delta#920. It can be expected in future.

Dearkano May 12, 2022
Author

Thank you, this is very helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to use Z-Order spark extension #2612

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Not able to use Z-Order spark extension #2612

Dearkano May 10, 2022

Replies: 3 comments · 3 replies

ulysses-you May 10, 2022 Collaborator

Dearkano May 11, 2022 Author

ulysses-you May 11, 2022 Collaborator

Dearkano May 12, 2022 Author

ulysses-you May 12, 2022 Collaborator

Dearkano May 12, 2022 Author

Dearkano
May 10, 2022

Replies: 3 comments 3 replies

ulysses-you
May 10, 2022
Collaborator

Dearkano
May 11, 2022
Author

ulysses-you May 11, 2022
Collaborator

Dearkano
May 12, 2022
Author

ulysses-you May 12, 2022
Collaborator

Dearkano May 12, 2022
Author