Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New implementation of getJsonObject #1893

Merged
merged 24 commits into from
Mar 27, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
0c67f0b
get-json-object: Add JSON parser and parser utility (#1836)
res-life Mar 10, 2024
e3dfdaa
get-json-object: match current field name (#1857)
res-life Mar 12, 2024
72d2994
get-json-object: add utility write_escaped_text for JSON generator (#…
res-life Mar 12, 2024
1f06b34
Add JNI for GetJsonObject (#1862)
thirtiseven Mar 20, 2024
304bf71
get-json-object: main flow (#1868)
res-life Mar 20, 2024
d23d1e3
Optimize memory usage in match_current_field_name (#1889)
thirtiseven Mar 23, 2024
5b922ce
get-json-object: Recursive to iterative (#1890)
res-life Mar 23, 2024
d312a7a
Fix bug
Mar 25, 2024
50481ab
Format
Mar 25, 2024
c971f61
Merge branch 'branch-24.04' into get-json-object-feature
thirtiseven Mar 25, 2024
e7801bd
Use uppercase for path_instruction_type
thirtiseven Mar 25, 2024
3c561f7
Add test cases from Baidu
Mar 25, 2024
848c92b
Fix escape char error; add test case
Mar 26, 2024
2ae268a
getJsonObject number normalization (#1897)
thirtiseven Mar 26, 2024
1febcdd
Add test case
Mar 26, 2024
a9bb79f
Fix a escape/unescape size bug
thirtiseven Mar 26, 2024
67ae598
Fix bug: handle leading zeros for number; Refactor
Mar 26, 2024
44d5d2d
Apply suggestions from code review
thirtiseven Mar 26, 2024
3abc3fc
Address comments
thirtiseven Mar 26, 2024
02d4cfa
fix java test
thirtiseven Mar 26, 2024
a204efa
Add test cases; Fix a bug
Mar 27, 2024
9686754
follow up escape/unescape bug fix
thirtiseven Mar 27, 2024
01e6029
Minor refactor
Mar 27, 2024
fa02fc9
Add a case; Fix bug
Mar 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/main/cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ add_library(
src/GpuTimeZoneDBJni.cpp
src/HashJni.cpp
src/HistogramJni.cpp
src/JSONUtilsJni.cpp
src/MapUtilsJni.cpp
src/NativeParquetJni.cpp
src/ParseURIJni.cpp
Expand All @@ -170,6 +171,7 @@ add_library(
src/cast_string_to_float.cu
src/datetime_rebase.cu
src/decimal_utils.cu
src/get_json_object.cu
src/histogram.cu
src/map_utils.cu
src/murmur_hash.cu
Expand Down
69 changes: 69 additions & 0 deletions src/main/cpp/src/JSONUtilsJni.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
/*
* Copyright (c) 2024, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#include "cudf_jni_apis.hpp"
#include "get_json_object.hpp"

#include <cudf/strings/strings_column_view.hpp>

#include <vector>

using path_instruction_type = spark_rapids_jni::path_instruction_type;

extern "C" {
JNIEXPORT jlong JNICALL Java_com_nvidia_spark_rapids_jni_JSONUtils_getJsonObject(
JNIEnv* env, jclass, jlong input_column, jint size, jobjectArray path_instructions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do we need size if the path_instructions is an array that stores the size in it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

{
JNI_NULL_CHECK(env, input_column, "input column is null", 0);
JNI_NULL_CHECK(env, path_instructions, "path_instructions is null", 0);
try {
cudf::jni::auto_set_device(env);
auto const n_column_view = reinterpret_cast<cudf::column_view const*>(input_column);
auto const n_strings_col_view = cudf::strings_column_view{*n_column_view};

std::vector<std::tuple<path_instruction_type, std::string, int64_t>> instructions;
for (int i = 0; i < size; i++) {
jobject instruction = env->GetObjectArrayElement(path_instructions, i);
JNI_NULL_CHECK(env, instruction, "path_instruction is null", 0);
jclass instruction_class = env->GetObjectClass(instruction);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we have a follow on where we change this to be 3 arrays? one that is an int array for the type, another that is a string array, and the last one would be a long array? This would reduce the number of reflection calls that would need to be done. This is very minor.

Copy link
Collaborator

@ttnghia ttnghia Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humnn, probably I was the one that asked for using array of structs instead of individual arrays. But yeah for data transfer from Java to C++ then individual arrays seem to be better. I'm sorry @res-life.

Edit: I have a better idea. We want to have array of structs because that is simpler to iterate through the instructions. With individual arrays, we still can use thrust::zip_iterator to zip together multiple arrays with just one iterator and iterate through them easily. Of course this is not required to be in this PR but future work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thrust::zip_iterator looks good, will do in a follow-on

JNI_NULL_CHECK(env, instruction_class, "instruction_class is null", 0);

jfieldID field_id = env->GetFieldID(instruction_class, "type", "I");
JNI_NULL_CHECK(env, field_id, "field_id is null", 0);
jint type = env->GetIntField(instruction, field_id);
path_instruction_type instruction_type = static_cast<path_instruction_type>(type);

field_id = env->GetFieldID(instruction_class, "name", "Ljava/lang/String;");
JNI_NULL_CHECK(env, field_id, "field_id is null", 0);
jstring name = (jstring)env->GetObjectField(instruction, field_id);
JNI_NULL_CHECK(env, name, "name is null", 0);
const char* name_str = env->GetStringUTFChars(name, JNI_FALSE);

field_id = env->GetFieldID(instruction_class, "index", "J");
JNI_NULL_CHECK(env, field_id, "field_id is null", 0);
jlong index = env->GetLongField(instruction, field_id);

instructions.emplace_back(instruction_type, name_str, index);

env->ReleaseStringUTFChars(name, name_str);
}

return cudf::jni::release_as_jlong(
spark_rapids_jni::get_json_object(n_strings_col_view, instructions));
}
CATCH_STD(env, 0);
}
}
Loading
Loading