Skip to content

[QST] Does Spark RAPIDS support Delta Lake? #5340

Answered by revans2
Niharikadutta asked this question in General
Discussion options

You must be logged in to vote

We have not explicitly been testing delta lake. Reading for the most part it should just work. I am not sure on writing though.

Delta lake stores the data in parquet format internally along with metadata stored in a combination of JSON and parquet. When reading the data the metadata is queried and cached. This metadata query often involves JSON data, which the Rapids Accelerator does not yet support. So it either ends up being not on the GPU or partially on the GPU. Generally the amount of data is relatively small so it has little impact to the overall performance of the read.

As for writes it has been a while so I am not sure I remember exactly what happens with it. I'll try to test it a…

Replies: 5 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #3185 on April 27, 2022 16:49.