Skip to content

[QST] Spark hangs during initialization #5412

Answered by tgravescs
SidWeng asked this question in General
Discussion options

You must be logged in to vote

I'm guessing it means you don't have any GPUs available to schedule.

You can find instructions on setting up standalone cluster here:
https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-on-prem.html#spark-standalone-cluster

One of the main things is making sure your worker is configured to have the GPU resources available:
SPARK_WORKER_OPTS="-Dspark.worker.resource.gpu.amount=1 -Dspark.worker.resource.gpu.discoveryScript=/opt/sparkRapidsPlugin/getGpusResources.sh"

Once you bring it up you can check the spark master ui to make sure your workers have a GPU available to hand out.

The other possibility, though you don't usually see that error message, would be GPUs they are…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@SidWeng
Comment options

@tgravescs
Comment options

Answer selected by tgravescs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #5411 on May 03, 2022 13:21.