Skip to content

[QST] Understanding of "Maximum pool size exceeded" #5373

Answered by jlowe
martinstuder asked this question in General
Discussion options

You must be logged in to vote

"Maximum pool size exceeded" from RMM means the GPU memory pool has been exhausted, and it was unable to satisfy a GPU memory allocation request. There can be lots of causes. Try to run with too much GPU data generated per task or running too many tasks simultaneously on the GPU are primary causes, so setting spark.rapids.sql.concurrentGpuTasks=1 from a higher initial value will reduce at least some of that memory pressure.

Increasing the number of shuffle partitions should also help, assuming your processing does not have high key skew, causing most of the data to show up in only a few task partitions.

If I understand correctly, increasing spark.task.resource.gpu.amount (e.g. to 0.2 or …

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by sameerz
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
2 participants
Converted from issue

This discussion was converted from issue #1365 on April 28, 2022 22:51.