-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the new chunked API from multi-get_json_object #11289
Conversation
Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nits
validPathsIndex += 1 | ||
} | ||
withResource(JSONUtils.getJsonObjectMultiplePaths(input.getBase, | ||
java.util.Arrays.asList(validPaths: _*), 4 * targetBatchSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment about memory budget is being removed. Consider re-adding or making 4 a mnemonic constant.
validPathsIndex += 1 | ||
} | ||
withResource(JSONUtils.getJsonObjectMultiplePaths(input.getBase, | ||
java.util.Arrays.asList(validPaths: _*), 4 * targetBatchSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
throughout this file constructs like java.util.Arrays.asList(validPaths: _*)
can be replaced with validPaths.asJava
if we import scala.collection.JavaConverters._
This depends on NVIDIA/spark-rapids-jni#2299
and technically fixes #11263
I think there is more that we could do to make it even better, but I think this is good enough in the short term.
The performance improvement appears to be about 9% for large numbers of paths. I want to spend some more time testing on other GPUs though.