You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some of our customer queries, very long tail task is observed when many tasks are contending for PrioritySemaphore, taking as long as 3 hours to finish. (the whole stage last for 3h too).
The long tail tasks occupies the CPU slot doing nothing, this could potentially harm CPU resource utilization.
This bug can be reproduced with following test code:
spark.conf.set("spark.rapids.sql.agg.singlePassPartialSortEnabled", false)
spark.time(spark.range(0,9000000000L, 1, 100).selectExpr("cast(CAST(rand(0) * 100000000000 AS LONG) DIV 1 as string) as id", "id % 2 as data").groupBy("id").agg(count(lit(1)), avg(col("data"))).orderBy("id").show())
System.exit(0)
The long tail tasks can be found in the below snapshot:
The text was updated successfully, but these errors were encountered:
* avoid long tail tasks due to PrioritySemaphore (#11574)
* use task id as tie breaker
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
* save threadlocal lookup
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
---------
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
* addressing jason's comment
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
---------
Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
In some of our customer queries, very long tail task is observed when many tasks are contending for PrioritySemaphore, taking as long as 3 hours to finish. (the whole stage last for 3h too).
The long tail tasks occupies the CPU slot doing nothing, this could potentially harm CPU resource utilization.
This bug can be reproduced with following test code:
with query_1009_long_tail_semaphore.scala being:
The long tail tasks can be found in the below snapshot:
The text was updated successfully, but these errors were encountered: