-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More accurate estimation for the result serialization time in RapidsShuffleThreadedWriterBase #11180
Conversation
… time estimation Signed-off-by: Jihoon Son <ghoonson@gmail.com>
e931294
to
6cf48ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the time spent blocking on the limiter -- is that still desired in the serialization time?
Good point. It should be excluded as well. In fact, there are other things as well we may want to exclude from the serialization time estimation. They were trivial in my testing as seen in #11173, but could have larger impacts with different cluster settings or data sets. I will fix it soon. |
Alright, the batch size computing time and the wait time on the limiter are both excluded from the serialization time estimation now. The former is usually trivial, but maybe will become non-trivial in some cases when you have lots of columns. It is not expensive to compute anyway. |
// writeTime is the amount of time it took to push bytes through the stream | ||
// minus the amount of time it took to get the batch from the upstream execs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is out of date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed now.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/RapidsShuffleInternalManagerBase.scala
Outdated
Show resolved
Hide resolved
val recordWriteTime: AtomicLong = new AtomicLong(0L) | ||
var computeTime: Long = 0L | ||
// Time spent waiting on the limiter | ||
var waitTimeOnLimiterNs: Long = 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for future work, we may want to expose waitTimeOnLimiterNs
as a metric. It's hard to figure out we are waiting for a limit otherwise. Filed #11187
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This will be useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, looking good
write(new TimeTrackingIterator(records)) | ||
} | ||
|
||
def write(records: TimeTrackingIterator): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, lets mark this private
, I like the addition of the new method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this and a couple of others private.
build |
Fixes #11173.
The shuffle result serialization time metric currently includes input data processing time as well, which is misleading. This PR excludes the processing time from the serialization time estimation.