Table schema is not written to hive metastore #695

bn2302 · 2021-06-11T12:41:36Z

Description
When writing a delta table using pyspark, the table schema is not written into the hive metastore. When querying the table using spark thrift server via jdbc , I can't see the columns.

Steps
The table is created using
df.write.format("delta").saveAsTable("mytable").

Results
The information in the hive metastore is as followed:

SELECT * FROM TABLE_PARAMS

TBL_ID	PARAM_KEY	PARAM_VALUE
5	spark.sql.create.version	3.1.2
5	numFiles	6
5	spark.sql.sources.provider	delta
5	transient_lastDdlTime	1623398918
5	totalSize	2688705839
5	spark.sql.partitionProvider	catalog
5	spark.sql.sources.schema.numParts	1
5	spark.sql.sources.schema.part.0	"{""type"":""struct"",""fields"":[]}"

with the following warning in the spark session:

21/06/11 07:08:37 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table default.myname into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.

Environments
Tested configurations

`Pyspark: 3.1.2
Hadoop: 3.2.0
Delta: 1.0.0
HiveMetastore: 3.0.0

Pyspark: 3.1.2
Hadoop: 3.2.0
Delta: 1.0.0
HiveMetastore: 2.7.3

Pyspark: 3.1.1
Hadoop: 3.2.0
Delta: 1.0.0
HiveMetastore: 2.7.3)`

Settings
conf.set("spark.hadoop.hive.metastore.client.connect.retry.delay", "5") conf.set("spark.hadoop.hive.metastore.client.socket.timeout", "1800") conf.set("spark.hadoop.hive.metastore.uris", "thrift://metastore.hive-metastore.svc.cluster.local:9083") conf.set("spark.hadoop.hive.input.format", "io.delta.hive.HiveInputFormat") conf.set("spark.hadoop.hive.tez.input.format", "io.delta.hive.HiveInputFormat")

The text was updated successfully, but these errors were encountered:

azsefi · 2021-06-11T16:28:23Z

It is because currently thriftserver's SparkGetColumnsOperation uses old SessionCatalog object, which relies on hive metastore. Instead, TableCatalog should be used, which will delegate to correct catalog in case of different table providers (delta provider in your case).
https://github.com/apache/spark/blob/e958833c727442fc9efa4fc92f93db16cd5c8476/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala#L57

bn2302 · 2021-06-11T20:11:31Z

As a workaround, I ceated a view on the delta table

spark.sql("CREATE OR REPLACE VIEW mytable_view AS SELECT * FROM mytable;")

This allows me to view the columns, and also query the data from PowerBi.

ghost · 2021-06-13T05:30:38Z

The view creation worked for me using DBeaver, though I have a problem with PowerBI reading the views. Can you tell me the driver and settings you are using to connect PowerBI?

bn2302 · 2021-06-15T21:25:03Z

I use the following settings:
Server myserver:10000
Protocol: Standard
Data connection mode: DirectQuery

Since I didn't configure any auth for the STS, yet, I just put in some arbitrary user/pw combination.

Data-drone · 2021-06-21T12:43:14Z

It is because currently thriftserver's SparkGetColumnsOperation uses old SessionCatalog object, which relies on hive metastore. Instead, TableCatalog should be used, which will delegate to correct catalog in case of different table providers (delta provider in your case).
https://github.com/apache/spark/blob/e958833c727442fc9efa4fc92f93db16cd5c8476/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala#L57

is there a way to fix it through the way we start the thrift server?

azsefi · 2021-06-22T06:17:25Z

I don't think so. We've changed the SparkGetColumnsOperation.scala, using it internally.

…

On Mon, Jun 21, 2021 at 2:43 PM Brian ***@***.***> wrote: It is because currently thriftserver's SparkGetColumnsOperation uses old SessionCatalog object, which relies on hive metastore. Instead, TableCatalog should be used, which will delegate to correct catalog in case of different table providers (delta provider in your case). https://github.com/apache/spark/blob/e958833c727442fc9efa4fc92f93db16cd5c8476/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala#L57 is there a way to fix it through the way we start the thrift server? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#695 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACTLW2RP4XEDQFWKZNJM5F3TT4XXDANCNFSM46QVA6XQ> .

zsxwing · 2021-07-09T17:45:54Z

Since this is a Spark issue, it would be great to open a new ticket to Spark community instead. You can raise a ticket in https://issues.apache.org/jira/projects/SPARK

dennyglee · 2021-10-12T02:04:17Z

By any chance, @bn2302 were you able to create a new ticket in the Spark community? It would be great if you could tag this here prior to closing? Thanks!

hanna-liashchuk · 2021-12-14T21:40:32Z

I've created Spark issue here https://issues.apache.org/jira/browse/SPARK-37648
Meanwhile, @azsefi could you please share how you fixed it on your side? I probably could use it until some fix is done

dennyglee · 2021-12-16T01:50:15Z

Thanks, @hanna-liashchuk - appreciate you creating the issue. And yes, @azsefi - if you could share what you did that would be super helpful. thanks!

pan3793 · 2021-12-20T02:39:34Z

Hi @bn2302, we have a workaround for this issue in Apache Kyuubi (Incubating), apache/kyuubi#1476
Kyuubi can be considered as a more powerful Spark Thrift Server, it's worth a try.

dennyglee · 2022-01-12T17:55:24Z

Closing this issue as it is a Spark bug; please re-open if this is incorrect. Thanks!

nitindatta · 2024-06-08T03:21:17Z

Hi , I looked at this issue in spark it is still open, anyone aware apart from creating view over table is there any other way we can directly get delta table schema in hive,

felipepessoto · 2024-08-16T00:01:09Z

Isn't fixed by #2409?

Related #1478

dennyglee added acknowledged This issue has been read and acknowledged by Delta admins need author feedback Issue is waiting for the author to respond labels Oct 12, 2021

dennyglee removed the need author feedback Issue is waiting for the author to respond label Dec 16, 2021

dennyglee closed this as completed Jan 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table schema is not written to hive metastore #695

Table schema is not written to hive metastore #695

bn2302 commented Jun 11, 2021

azsefi commented Jun 11, 2021 •

edited

Loading

bn2302 commented Jun 11, 2021

ghost commented Jun 13, 2021

bn2302 commented Jun 15, 2021

Data-drone commented Jun 21, 2021

azsefi commented Jun 22, 2021 via email

zsxwing commented Jul 9, 2021

dennyglee commented Oct 12, 2021

hanna-liashchuk commented Dec 14, 2021

dennyglee commented Dec 16, 2021

pan3793 commented Dec 20, 2021

dennyglee commented Jan 12, 2022

nitindatta commented Jun 8, 2024

felipepessoto commented Aug 16, 2024

Table schema is not written to hive metastore #695

Table schema is not written to hive metastore #695

Comments

bn2302 commented Jun 11, 2021

azsefi commented Jun 11, 2021 • edited Loading

bn2302 commented Jun 11, 2021

ghost commented Jun 13, 2021

bn2302 commented Jun 15, 2021

Data-drone commented Jun 21, 2021

azsefi commented Jun 22, 2021 via email

zsxwing commented Jul 9, 2021

dennyglee commented Oct 12, 2021

hanna-liashchuk commented Dec 14, 2021

dennyglee commented Dec 16, 2021

pan3793 commented Dec 20, 2021

dennyglee commented Jan 12, 2022

nitindatta commented Jun 8, 2024

felipepessoto commented Aug 16, 2024

azsefi commented Jun 11, 2021 •

edited

Loading