Skip to content

[QST]Which operations of Spark DataFrame are suitable for GPU acceleration? #5377

Answered by jlowe
YeahNew asked this question in General
Discussion options

You must be logged in to vote

I try to add timestamps before and after an operation, such as filter, join, agg, to calculate the execution time of the operation.

Note that Spark normally executes in a row-by-row fashion, while the RAPIDS Accelerator operates on columnar batches. Can you elaborate more on how you isolated the timing for these operations? It's easy to accidentally measure more than what was intended (i.e.: also the cost of the operations producing the input).

Also the scale factor of the data is fairly low. GPUs do not excel at processing small amounts of data. You will probably see better performance by increasing the amount of data each task sees (e.g.: increasing spark.sql.files.maxPartitionBytes, …

Replies: 5 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #1043 on April 28, 2022 22:56.