Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More accurate estimation for the result serialization time in RapidsShuffleThreadedWriterBase #11180

Merged
merged 6 commits into from
Jul 18, 2024

Conversation

jihoonson
Copy link
Collaborator

Fixes #11173.

The shuffle result serialization time metric currently includes input data processing time as well, which is misleading. This PR excludes the processing time from the serialization time estimation.

@jihoonson jihoonson changed the title Exclude the processing time in records.hasNext from the serialization time estimation Exclude input data processing time from the result serialization time estimation in RapidsShuffleThreadedWriterBase Jul 12, 2024
… time estimation

Signed-off-by: Jihoon Son <ghoonson@gmail.com>
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the time spent blocking on the limiter -- is that still desired in the serialization time?

@jihoonson
Copy link
Collaborator Author

What about the time spent blocking on the limiter -- is that still desired in the serialization time?

Good point. It should be excluded as well. In fact, there are other things as well we may want to exclude from the serialization time estimation. They were trivial in my testing as seen in #11173, but could have larger impacts with different cluster settings or data sets. I will fix it soon.

@jihoonson
Copy link
Collaborator Author

jihoonson commented Jul 12, 2024

Alright, the batch size computing time and the wait time on the limiter are both excluded from the serialization time estimation now. The former is usually trivial, but maybe will become non-trivial in some cases when you have lots of columns. It is not expensive to compute anyway.

@jihoonson jihoonson changed the title Exclude input data processing time from the result serialization time estimation in RapidsShuffleThreadedWriterBase More accurate estimation for the result serialization time in RapidsShuffleThreadedWriterBase Jul 12, 2024
@abellina abellina self-requested a review July 12, 2024 20:54
Comment on lines 397 to 398
// writeTime is the amount of time it took to push bytes through the stream
// minus the amount of time it took to get the batch from the upstream execs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is out of date

Copy link
Collaborator Author

@jihoonson jihoonson Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed now.

val recordWriteTime: AtomicLong = new AtomicLong(0L)
var computeTime: Long = 0L
// Time spent waiting on the limiter
var waitTimeOnLimiterNs: Long = 0L
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future work, we may want to expose waitTimeOnLimiterNs as a metric. It's hard to figure out we are waiting for a limit otherwise. Filed #11187

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This will be useful!

Copy link
Collaborator

@abellina abellina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, looking good

write(new TimeTrackingIterator(records))
}

def write(records: TimeTrackingIterator): Unit = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, lets mark this private, I like the addition of the new method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this and a couple of others private.

@abellina
Copy link
Collaborator

build

@jihoonson jihoonson merged commit f8439b4 into NVIDIA:branch-24.08 Jul 18, 2024
43 checks passed
@sameerz sameerz added the bug Something isn't working label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] The rs. serialization time metric is misleading
4 participants