-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I create hive table on top of delta? #18
Comments
Hive metastore support is still not yet available in 0.1.0 but we want to make it work well with Hive Metastore soon. |
Thanks @tdas, looking forward to see this feature. |
You can manually create an external Hive table, you would have to create partitions manually as well though. E.g.: CREATE EXTERNAL TABLE delta_table (
col1 STRING,
col2 INT)
PARTITION BY (`date` INT, hour INT)
STORED AS PARQUET
LOCATION '/path/to/delta/table/location';
ALTER TABLE delta_table ADD PARTITION (`date`=20190506, hour=10); |
Hi @nonsleepr, Reading a Delta table by looking at the files directly is not guaranteed to return a consistent snapshot of the table. The only currently canonical way to read a Delta table is to go through the streaming or batch DataFrame reader APIs. |
@tdas can you give any indication of how soon is soon please? Is this work happening in a branch somewhere that we can try out and possibly contribute to? Otherwise is there a recommended intermediate path such as Delta -> JBDC -> Hive ?? Cheers |
Any update on when this feature will be available? Also now that the delta lake supports vacuum, can we assume that if we execute vacuum on a delta table and then create an external hive table on top of that, it should be consistent? |
We are working with Spark community to add all the necessary pluggable interfaces needed to add table support for Delta. We are hoping that this should be available with Spark 3.0.0 which is targeted for Q4. You could use "Vacuum with 0 retention" to clean off all data not needed by the latest version. After that, it is possible to treat it is as an external parquet table in Hive metastore. Just do not run vacuum while a concurrent write to the table is going on as yet-to-be-committed files may get deleted. |
Hi, @tdas |
It's hard to share anything before Spark 3.0 has been finalized in any form since that is necessary for all the hive metastore support to work. Spark 3.0 is still a few months away from being released. That said, once there is an RC for Spark 3.0, we will try to have a branch in this repo with all the changes needed to migrate Delta to Spark 3.0. Then there may be something for you try out. |
#85 is the issue tracking metastore support. |
I am closing this issue in favor of the #85. Please subscribe to that issue. |
Add build support for Scala 2.11. Closes delta-io#18
We have a datalake based on lambda architecture to solve real time and batch data sink problem. For which we are using hive as a datastore.
So my question is that, if I use delta, is it possible to create a hive table on top of that?
The text was updated successfully, but these errors were encountered: