Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Semi-structured data function check_json #4558

Closed
b41sh opened this issue Mar 24, 2022 · 13 comments · Fixed by #4606
Closed

Feature: Semi-structured data function check_json #4558

b41sh opened this issue Mar 24, 2022 · 13 comments · Fixed by #4606
Assignees
Labels
C-feature Category: feature community-take good first issue Category: good first issue

Comments

@b41sh
Copy link
Member

b41sh commented Mar 24, 2022

Summary

Add support Semi-structured Data Function CHECK_JSON to check the validity of a JSON document.

https://docs.snowflake.com/en/sql-reference/functions/check_json.html

@b41sh b41sh added C-feature Category: feature good first issue Category: good first issue labels Mar 24, 2022
@kevinw66
Copy link
Contributor

/assignme

@kevinw66
Copy link
Contributor

kevinw66 commented Mar 25, 2022

@b41sh
How to deal with variant type column
We only check the StringColumn and ObjectColumn which data type is string, and return NULL for all other columns and types?
And the ObjectColumn contains a vec value, we loop the vec and only check string, what if there are more than one invalid json string? should we concat all error message?

@kevinw66
Copy link
Contributor

/assign

@b41sh
Copy link
Member Author

b41sh commented Mar 25, 2022

@b41sh How to deal with variant type column We only check the StringColumn and ObjectColumn which data type is string, and return NULL for all other columns and types? And the ObjectColumn contains a vec value, we loop the vec and only check string, what if there are more than one invalid json string? should we concat all error message?

@kevinw66
If the data type is Null, Boolean and primitive types, like UInt8, Float32, etc. we can return NULL directly, as they are all valid JSON data.
Array and Struct data types are invalid JSON data, we can return the error message as Invalid argument types for function 'CHECK_JSON': (ARRAY).
Other types need to convert to string and check if it is a valid JSON string.

If an invalid JSON string is found, instead of returning an error message, return the error message as a string in a Nullable StringColumn.

@kevinw66
Copy link
Contributor

@b41sh How to deal with variant type column We only check the StringColumn and ObjectColumn which data type is string, and return NULL for all other columns and types? And the ObjectColumn contains a vec value, we loop the vec and only check string, what if there are more than one invalid json string? should we concat all error message?

@kevinw66 If the data type is Null, Boolean and primitive types, like UInt8, Float32, etc. we can return NULL directly, as they are all valid JSON data. Array and Struct data types are invalid JSON data, we can return the error message as Invalid argument types for function 'CHECK_JSON': (ARRAY). Other types need to convert to string and check if it is a valid JSON string.

If an invalid JSON string is found, instead of returning an error message, return the error message as a string in a Nullable StringColumn.

How to create array and struct type for test?
I try to use sql in snowflake document like create table t0(a array, b object), but I got an error.

And how to test null column, I can create a column which type is null, but I cannot insert any value.

@b41sh
Copy link
Member Author

b41sh commented Mar 25, 2022

@kevinw66
array and object are not SQL data types currently, so we cannot create such tables.
It is possible to create ArrayColumn and StructColumn for test, refer to the implementation here:
https://github.com/datafuselabs/databend/blob/main/common/datavalues/tests/it/types/serializations.rs#L89
https://github.com/datafuselabs/databend/blob/main/common/datavalues/tests/it/types/serializations.rs#L108

Create NullColumn can refer here:
https://github.com/datafuselabs/databend/blob/main/common/functions/tests/it/scalars/conditionals.rs#L91

@kevinw66
Copy link
Contributor

@kevinw66 array and object are not SQL data types currently, so we cannot create such tables. It is possible to create ArrayColumn and StructColumn for test, refer to the implementation here: https://github.com/datafuselabs/databend/blob/main/common/datavalues/tests/it/types/serializations.rs#L89 https://github.com/datafuselabs/databend/blob/main/common/datavalues/tests/it/types/serializations.rs#L108

Create NullColumn can refer here: https://github.com/datafuselabs/databend/blob/main/common/functions/tests/it/scalars/conditionals.rs#L91

I try to write tests like semi_structures, but when I use wrong expected value, the tests will exit unexpectedly instead of print an error message.

thread panicked while panicking. aborting.
error: test failed, to rerun pass '--test it'

Caused by:
process didn't exit successfully: /Users/changye/Develop/Projects/Github/databend/target/debug/deps/it-474f88849eeb6ffa 'scalars::semi_structureds::test_parse_json_function' --format=json --exact -Z unstable-options --show-output (signal: 6, SIGABRT: process abort signal)

Process finished with exit code 101

@kevinw66
Copy link
Contributor

@kevinw66 array and object are not SQL data types currently, so we cannot create such tables. It is possible to create ArrayColumn and StructColumn for test, refer to the implementation here: https://github.com/datafuselabs/databend/blob/main/common/datavalues/tests/it/types/serializations.rs#L89 https://github.com/datafuselabs/databend/blob/main/common/datavalues/tests/it/types/serializations.rs#L108
Create NullColumn can refer here: https://github.com/datafuselabs/databend/blob/main/common/functions/tests/it/scalars/conditionals.rs#L91

I try to write tests like semi_structures, but when I use wrong expected value, the tests will exit unexpectedly instead of print an error message.

thread panicked while panicking. aborting. error: test failed, to rerun pass '--test it'

Caused by: process didn't exit successfully: /Users/changye/Develop/Projects/Github/databend/target/debug/deps/it-474f88849eeb6ffa 'scalars::semi_structureds::test_parse_json_function' --format=json --exact -Z unstable-options --show-output (signal: 6, SIGABRT: process abort signal)

Process finished with exit code 101

It's the problem of json! macro.

@b41sh
Copy link
Member Author

b41sh commented Mar 27, 2022

@kevinw66 I'm not very sure of the specific reason, you can submit the PR first, so that we can check the problem together.

@kevinw66
Copy link
Contributor

@kevinw66 I'm not very sure of the specific reason, you can submit the PR first, so that we can check the problem together.

I can simply reproduce it by change expect value in semi_structures.test_parse_json_function(), like

ScalarFunctionTest {
            name: "parse_json_bool",
            columns: vec![Series::from_data(vec![true, false])],
            expect: Series::from_data(vec![json!(true), json!(false)]),
            error: "",
        },

to

ScalarFunctionTest {
            name: "parse_json_bool",
            columns: vec![Series::from_data(vec![true, false])],
            expect: Series::from_data(vec![json!(false), json!(false)]),
            error: "",
        },

@kevinw66
Copy link
Contributor

@b41sh I met a problem here.
When I use NullableColumnBuilder::<JsonValue>::with_capacity(input_rows) to call append_null(), I saw the code will append JsonValue::Null, But I got NullableColumn typeid: Variant len: 1 data: [NULL] here.
And when I call Series::from_data(vec![Some(JsonValue::Null)]), I got NullableColumn typeid: Variant len: 1 data: [Null]. I think append_null() should get the same value like this. But why the Null was all uppercase?

And my tests cannot pass because of this.

@kevinw66
Copy link
Contributor

@b41sh I met a problem here. When I use NullableColumnBuilder::<JsonValue>::with_capacity(input_rows) to call append_null(), I saw the code will append JsonValue::Null, But I got NullableColumn typeid: Variant len: 1 data: [NULL] here. And when I call Series::from_data(vec![Some(JsonValue::Null)]), I got NullableColumn typeid: Variant len: 1 data: [Null]. I think append_null() should get the same value like this. But why the Null was all uppercase?

And my tests cannot pass because of this.

Oh I got it, It's print problem because of impl std::fmt::Debug for NullableColumn, The different is actually inside validity, not column.

@b41sh
Copy link
Member Author

b41sh commented Mar 27, 2022

@kevinw66
The NullableColumn contains an inner column, such as JsonColumn. When a value in NullableColumn is Null, the corresponding value in the JsonColumn is useless. In this case, the JsonValue::Null is just a placeholder. Only when the value of NullableColumn is not Null, the value of data in JsonColumn is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature community-take good first issue Category: good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants