Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I create hive table on top of delta? #18

Closed
skp33 opened this issue Apr 29, 2019 · 11 comments
Closed

Can I create hive table on top of delta? #18

skp33 opened this issue Apr 29, 2019 · 11 comments
Labels
question Questions on how to use Delta Lake

Comments

@skp33
Copy link
Contributor

skp33 commented Apr 29, 2019

We have a datalake based on lambda architecture to solve real time and batch data sink problem. For which we are using hive as a datastore.

So my question is that, if I use delta, is it possible to create a hive table on top of that?

@tdas
Copy link
Contributor

tdas commented Apr 29, 2019

Hive metastore support is still not yet available in 0.1.0 but we want to make it work well with Hive Metastore soon.

@tdas tdas added the question Questions on how to use Delta Lake label Apr 29, 2019
@skp33
Copy link
Contributor Author

skp33 commented May 2, 2019

Thanks @tdas, looking forward to see this feature.

@nonsleepr
Copy link

You can manually create an external Hive table, you would have to create partitions manually as well though.

E.g.:

CREATE EXTERNAL TABLE delta_table (
  col1 STRING,
  col2 INT)
PARTITION BY (`date` INT, hour INT)
STORED AS PARQUET
LOCATION '/path/to/delta/table/location';

ALTER TABLE delta_table ADD PARTITION (`date`=20190506, hour=10);

@mukulmurthy
Copy link
Collaborator

Hi @nonsleepr,

Reading a Delta table by looking at the files directly is not guaranteed to return a consistent snapshot of the table. The only currently canonical way to read a Delta table is to go through the streaming or batch DataFrame reader APIs.

@spmp
Copy link

spmp commented Jul 23, 2019

Hive metastore support is still not yet available in 0.1.0 but we want to make it work well with Hive Metastore soon.

@tdas can you give any indication of how soon is soon please? Is this work happening in a branch somewhere that we can try out and possibly contribute to?

Otherwise is there a recommended intermediate path such as Delta -> JBDC -> Hive ??

Cheers

@prabgemini
Copy link

Any update on when this feature will be available?

Also now that the delta lake supports vacuum, can we assume that if we execute vacuum on a delta table and then create an external hive table on top of that, it should be consistent?

@tdas
Copy link
Contributor

tdas commented Aug 22, 2019

We are working with Spark community to add all the necessary pluggable interfaces needed to add table support for Delta. We are hoping that this should be available with Spark 3.0.0 which is targeted for Q4.

You could use "Vacuum with 0 retention" to clean off all data not needed by the latest version. After that, it is possible to treat it is as an external parquet table in Hive metastore. Just do not run vacuum while a concurrent write to the table is going on as yet-to-be-committed files may get deleted.

@ivoson
Copy link

ivoson commented Jan 19, 2020

Hi, @tdas
do we have any update on this topic? It will be great if anything new can be shared, thanks

@tdas
Copy link
Contributor

tdas commented Jan 20, 2020

It's hard to share anything before Spark 3.0 has been finalized in any form since that is necessary for all the hive metastore support to work. Spark 3.0 is still a few months away from being released. That said, once there is an RC for Spark 3.0, we will try to have a branch in this repo with all the changes needed to migrate Delta to Spark 3.0. Then there may be something for you try out.

LantaoJin added a commit to LantaoJin/delta that referenced this issue Mar 24, 2020
@pranavanand
Copy link
Contributor

#85 is the issue tracking metastore support.

@tdas
Copy link
Contributor

tdas commented Mar 26, 2020

I am closing this issue in favor of the #85. Please subscribe to that issue.

@tdas tdas closed this as completed Mar 26, 2020
LantaoJin added a commit to LantaoJin/delta that referenced this issue Mar 12, 2021
tdas pushed a commit to tdas/delta that referenced this issue May 31, 2023
Add build support for Scala 2.11.

Closes delta-io#18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions on how to use Delta Lake
Projects
None yet
Development

No branches or pull requests

8 participants