Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the new chunked API from multi-get_json_object #11289

Merged
merged 1 commit into from
Aug 5, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Aug 2, 2024

This depends on NVIDIA/spark-rapids-jni#2299

and technically fixes #11263

I think there is more that we could do to make it even better, but I think this is good enough in the short term.

The performance improvement appears to be about 9% for large numbers of paths. I want to spend some more time testing on other GPUs though.

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
@revans2
Copy link
Collaborator Author

revans2 commented Aug 5, 2024

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nits

validPathsIndex += 1
}
withResource(JSONUtils.getJsonObjectMultiplePaths(input.getBase,
java.util.Arrays.asList(validPaths: _*), 4 * targetBatchSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment about memory budget is being removed. Consider re-adding or making 4 a mnemonic constant.

validPathsIndex += 1
}
withResource(JSONUtils.getJsonObjectMultiplePaths(input.getBase,
java.util.Arrays.asList(validPaths: _*), 4 * targetBatchSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

throughout this file constructs like java.util.Arrays.asList(validPaths: _*)
can be replaced with validPaths.asJava
if we import scala.collection.JavaConverters._

@revans2 revans2 merged commit 93cdae1 into NVIDIA:branch-24.10 Aug 5, 2024
44 checks passed
@revans2 revans2 deleted the use_chunked_get_json_api branch August 5, 2024 20:06
@sameerz sameerz changed the title Use the new chunked API fro multi-get_json_object Use the new chunked API from multi-get_json_object Aug 8, 2024
@sameerz sameerz added the performance A performance related task/issue label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Cluster/pack multi_get_json_object paths by common prefixes
4 participants