Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] very long tail task is observed when many tasks are contending for PrioritySemaphore #11573

Closed
binmahone opened this issue Oct 9, 2024 · 1 comment · Fixed by #11574
Assignees
Labels
bug Something isn't working

Comments

@binmahone
Copy link
Collaborator

In some of our customer queries, very long tail task is observed when many tasks are contending for PrioritySemaphore, taking as long as 3 hours to finish. (the whole stage last for 3h too).
The long tail tasks occupies the CPU slot doing nothing, this could potentially harm CPU resource utilization.

This bug can be reproduced with following test code:

echo "reproduce long tail problem, at aggv3 latest" && bin/spark-shell    \
       --master 'local[16]'  --driver-memory 20g  --conf spark.rapids.sql.concurrentGpuTasks=2  \
       --conf spark.celeborn.client.shuffle.compression.codec=zstd --conf spark.io.compression.codec=zstd \
       --conf spark.rapids.memory.pinnedPool.size=10G --conf spark.rapids.memory.host.spillStorageSize=40G \
       --conf spark.sql.files.maxPartitionBytes=2g \
       --conf spark.driver.extraJavaOptions=-Dai.rapids.cudf.nvtx.enabled=true \
       --conf spark.plugins=com.nvidia.spark.SQLPlugin \
       --conf  spark.rapids.sql.metrics.level='DEBUG' \
       --conf spark.eventLog.enabled=true \
       --conf spark.shuffle.manager=org.apache.spark.shuffle.celeborn.SparkShuffleManager \
       --conf spark.celeborn.master.endpoints=10.19.129.151:9097 \
       --jars /home/hongbin/develop/spark-3.2.1-bin-hadoop2.7/rapids_jars/fresh.jar -i query_1009_long_tail_semaphore.scala  2>&1 | tee spill_`date +'%Y-%m-%d-%H-%M-%S'`.output

with query_1009_long_tail_semaphore.scala being:

spark.conf.set("spark.rapids.sql.agg.singlePassPartialSortEnabled", false)

spark.time(spark.range(0,9000000000L, 1, 100).selectExpr("cast(CAST(rand(0) * 100000000000 AS LONG) DIV 1 as string) as id", "id % 2 as data").groupBy("id").agg(count(lit(1)), avg(col("data"))).orderBy("id").show())

System.exit(0)

The long tail tasks can be found in the below snapshot:

Image

@binmahone binmahone added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 9, 2024
@binmahone binmahone self-assigned this Oct 9, 2024
@binmahone
Copy link
Collaborator Author

This issue is fixed by #11574 + #11587

binmahone added a commit that referenced this issue Oct 11, 2024
* avoid long tail tasks due to PrioritySemaphore (#11574)

* use task id as tie breaker

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* save threadlocal lookup

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

* addressing jason's comment

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

---------

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants