-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table schema is not written to hive metastore #695
Comments
It is because currently thriftserver's SparkGetColumnsOperation uses old SessionCatalog object, which relies on hive metastore. Instead, TableCatalog should be used, which will delegate to correct catalog in case of different table providers (delta provider in your case). |
As a workaround, I ceated a view on the delta table
This allows me to view the columns, and also query the data from PowerBi. |
The view creation worked for me using DBeaver, though I have a problem with PowerBI reading the views. Can you tell me the driver and settings you are using to connect PowerBI? |
I use the following settings: Since I didn't configure any auth for the STS, yet, I just put in some arbitrary user/pw combination. |
is there a way to fix it through the way we start the thrift server? |
I don't think so. We've changed the SparkGetColumnsOperation.scala, using
it internally.
…On Mon, Jun 21, 2021 at 2:43 PM Brian ***@***.***> wrote:
It is because currently thriftserver's SparkGetColumnsOperation uses old
SessionCatalog object, which relies on hive metastore. Instead,
TableCatalog should be used, which will delegate to correct catalog in case
of different table providers (delta provider in your case).
https://github.com/apache/spark/blob/e958833c727442fc9efa4fc92f93db16cd5c8476/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala#L57
is there a way to fix it through the way we start the thrift server?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#695 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACTLW2RP4XEDQFWKZNJM5F3TT4XXDANCNFSM46QVA6XQ>
.
|
Since this is a Spark issue, it would be great to open a new ticket to Spark community instead. You can raise a ticket in https://issues.apache.org/jira/projects/SPARK |
By any chance, @bn2302 were you able to create a new ticket in the Spark community? It would be great if you could tag this here prior to closing? Thanks! |
I've created Spark issue here https://issues.apache.org/jira/browse/SPARK-37648 |
Thanks, @hanna-liashchuk - appreciate you creating the issue. And yes, @azsefi - if you could share what you did that would be super helpful. thanks! |
Hi @bn2302, we have a workaround for this issue in Apache Kyuubi (Incubating), apache/kyuubi#1476 |
Closing this issue as it is a Spark bug; please re-open if this is incorrect. Thanks! |
Hi , I looked at this issue in spark it is still open, anyone aware apart from creating view over table is there any other way we can directly get delta table schema in hive, |
Description
When writing a delta table using pyspark, the table schema is not written into the hive metastore. When querying the table using spark thrift server via jdbc , I can't see the columns.
Steps
The table is created using
df.write.format("delta").saveAsTable("mytable")
.Results
The information in the hive metastore is as followed:
SELECT * FROM TABLE_PARAMS
with the following warning in the spark session:
21/06/11 07:08:37 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider delta. Persisting data source table
default.
mynameinto Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
Environments
Tested configurations
`Pyspark: 3.1.2
Hadoop: 3.2.0
Delta: 1.0.0
HiveMetastore: 3.0.0
Pyspark: 3.1.2
Hadoop: 3.2.0
Delta: 1.0.0
HiveMetastore: 2.7.3
Pyspark: 3.1.1
Hadoop: 3.2.0
Delta: 1.0.0
HiveMetastore: 2.7.3)`
Settings
conf.set("spark.hadoop.hive.metastore.client.connect.retry.delay", "5") conf.set("spark.hadoop.hive.metastore.client.socket.timeout", "1800") conf.set("spark.hadoop.hive.metastore.uris", "thrift://metastore.hive-metastore.svc.cluster.local:9083") conf.set("spark.hadoop.hive.input.format", "io.delta.hive.HiveInputFormat") conf.set("spark.hadoop.hive.tez.input.format", "io.delta.hive.HiveInputFormat")
The text was updated successfully, but these errors were encountered: