[BUG] The `rs. serialization time` metric is misleading #11173

jihoonson · 2024-07-11T23:36:53Z

Describe the bug
The rs. serialization time metric is a metric indicating how much time you spend for serializing shuffle data in a stage. This metric is shown in the spark UI when the metric level is set to debug. This metric is computed here by measuring the time to execute the whole logic and then excluding the input processing time (records.next) from it. However, we are currently not taking the time for records.haxNext into account in which actual processing seems to be happening rather than in records.next. Here is a breakdown of the time of a write call in a task.

Total time to execute the entire logic in the call: 56,780 ms (this time includes everything below)
records.hasNext: 56,757 ms
records.next: 0.629 ms
Time to prepare a columnar batch (incrementing its reference count and computing its size): 5 ms
Wait time on BytesInFlightLimiter: 1 ms
Write task enqueue time: 10 ms
Wait time for all write tasks to finish: 0.066 ms

As you see, this write call spent 99.9% of its execution time in records.hasNext.

Steps/Code to reproduce bug
Set spark.rapids.sql.metrics.level to DEBUG and run any query that involves non-trival shuffle.

Expected behavior
We should exclude the records.hasNext time when measuring writeTimeNs, just like records.next. This will make the current way of estimating the serialization time more accurate.

Environment details (please complete the following information)

Environment location: Standalone
Spark configuration settings related to the issue: spark.rapids.sql.metrics.level should be set to DEBUG.

Additional context
The more I think about it, even after we fix this bug, I would still find this rs. serialization time metric a bit misleading. Here is how we compute it today.

We measure the time to write the shuffle data with multiple write threads. This is called writeTimeNs in the code.
We measure the accumulated I/O time and record serialization time as if they happened in sequential.
We compute ratios of I/O time and record serialization time to the sum of results from 2).
rs. serialization time is computed by writeTimeNs * record serialization ratio computed in 3).

This logic assumes that the serialization and the I/O do not happen at the same time. However, in fact, there could be overlaps between serialization and I/O as one thread can perform the serialization while another is writing serialized data on disk. As such, this metric is rather an approximation, which I'm not sure how useful it could be.

I suppose we could first make it crystal clear in the doc that this metric is an approximation and thus people should use it just to get a sense of how expensive the serialization cost could be, but should not trust the number as it is. Though, I don't seem to find any existing doc explaining this metric. Please correct me if I'm wrong.

As for a long term solution, we may want to come up with better metrics, maybe avgSerializationTimePerRecord, that would be more obvious and thus easier to interpret.

The text was updated successfully, but these errors were encountered:

jihoonson added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jul 11, 2024

jihoonson self-assigned this Jul 12, 2024

jihoonson mentioned this issue Jul 12, 2024

More accurate estimation for the result serialization time in RapidsShuffleThreadedWriterBase #11180

Merged

mattahrens removed the ? - Needs Triage Need team to review and classify label Jul 16, 2024

jihoonson closed this as completed in #11180 Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The `rs. serialization time` metric is misleading #11173

[BUG] The `rs. serialization time` metric is misleading #11173

jihoonson commented Jul 11, 2024 •

edited

Loading

[BUG] The rs. serialization time metric is misleading #11173

[BUG] The rs. serialization time metric is misleading #11173

Comments

jihoonson commented Jul 11, 2024 • edited Loading

[BUG] The `rs. serialization time` metric is misleading #11173

[BUG] The `rs. serialization time` metric is misleading #11173

jihoonson commented Jul 11, 2024 •

edited

Loading