Skip to content

[QST] ETL performance #5379

Answered by revans2
rlu-aa asked this question in General
Discussion options

You must be logged in to vote

So there are several issues here and we are working on all of them in one form or another. The reason for the initial slowness comes down to moving data between the GPU and the CPU.

In spark 3.0 there is no way to accelerate dataframe.persist so all we can do is convert the data back to rows and let the existing spark code handle it. In spark 3.1 we were able to work with the spark community to make this part of spark plug-able and in the process of implementing code for that. But because of the API requirements this will not be available until after Spark 3.1 is released.

The UDF also has a similar issue in that we do not have a generic compiler from java to cuda. This is a very difficul…

Replies: 4 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
3 participants
Converted from issue

This discussion was converted from issue #1021 on April 28, 2022 22:59.