Skip to content

[QST]Spark-Rapids with Java is way slower than Spark-Rapids with Python #5330

Answered by SoumyaB57
SoumyaB57 asked this question in General
Discussion options

You must be logged in to vote

I have found the issue
In my Java Code I am reading the CSV file with the following code
Dataset<Row> df = sqlContext.read() .format("com.databricks.spark.csv") .option("inferSchema", "true") .option("header", "true") .load("countries.csv");
But it had a problem as it has some undefined behavior and it is scanning CSV file with CPU in some queries and that was very slow compared to GPU
Then I have changed the mode of reading the CSV file and it worked
Dataset<Row> df = spark.read().format("csv").option("header","true").load("countries.csv");

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #5210 on April 27, 2022 15:22.