Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(function): Support generic Array access elements by index #5244

Merged
merged 1 commit into from
May 11, 2022

Conversation

b41sh
Copy link
Member

@b41sh b41sh commented May 9, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Support accessing elements of generic Array by index

Changelog

  • New Feature
  • Build/Testing/CI
  • Documentation

Related Issues

Fixes #5224

@vercel
Copy link

vercel bot commented May 9, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) May 11, 2022 at 4:51AM (UTC)

@mergify
Copy link
Contributor

mergify bot commented May 9, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@mergify mergify bot added pr-feature this PR introduces a new feature to the codebase pr-build this PR changes build/testing/ci steps pr-doc-fix labels May 9, 2022
@b41sh b41sh marked this pull request as ready for review May 9, 2022 11:44
@b41sh b41sh requested a review from BohuTANG as a code owner May 9, 2022 11:44
@b41sh b41sh requested review from sundy-li and fkuner May 9, 2022 11:44
}
} else {
return Err(ErrorCode::IllegalDataType(format!(
"Array column only support accessed by index, but got {:#?}",
Copy link
Member

@sundy-li sundy-li May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If array only support accessed by index, we should not use path_keys.

I think the better way is to parse the path column into integer index column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of using path_keys is to reuse the parse_path_keys function. It seems that the access method of the array column and variant column is somewhat different. We can rewrite a function to generate an index path key.

@b41sh b41sh force-pushed the array-access branch 3 times, most recently from ab22b1b to ace0b7f Compare May 9, 2022 22:54
@fkuner
Copy link
Contributor

fkuner commented May 10, 2022

What's the purpose of get_path of array? It seems same as normal get.

@b41sh
Copy link
Member Author

b41sh commented May 10, 2022

What's the purpose of get_path of array? It seems same as normal get.

get_path is a snowflake semistructured function, we can use it to access element variant by multiple index path, such as '[0][1][2]', while get can only access by one index. But it's a bit too complicated for access the data of array, I'll simplify it.

@b41sh
Copy link
Member Author

b41sh commented May 10, 2022

I have removed ArrayGetPathFunction, array map access will be converted to the recursive get function.
for example:

select arr[0][1] from test;

will be converted to

select get(get(arr, 0), 1) from test;

PTAL @sundy-li @fkuner

let inner_column: &<$T as Scalar>::ColumnType = Series::check_get(array_column.values())?;
let mut builder = NullableColumnBuilder::<$T>::with_capacity(input_rows);

for index in indexes.iter() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's incorrect, what if index's size is not 1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the index is from a table column, we will have multi indexes.
for example:

mysql> create table t (index uint64);
mysql> insert into t values(0),(1),(2);
mysql> select get([1,2,3,4], index) from t;
+--------------------------+
| get([1, 2, 3, 4], index) |
+--------------------------+
|                        1 |
|                        2 |
|                        3 |
+--------------------------+
3 rows in set (0.05 sec)

@@ -68,4 +72,10 @@ select id, json_extract_path_text(str, '["a"]') from t3;
select id, json_extract_path_text(str, 'b.c') from t3;
select id, json_extract_path_text(str, '["b"]["c"]') from t3;

select '==get from array table==';
select id, get(arr, 0) from t4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get(arr, number % 2 ) from t4;

@fkuner
Copy link
Contributor

fkuner commented May 10, 2022

LGTM

@b41sh b41sh force-pushed the array-access branch 2 times, most recently from e0d41cd to 8239842 Compare May 10, 2022 15:51
let data_type = args[0];
let path_type = args[1];

if !data_type.data_type_id().is_array() || !path_type.data_type_id().is_integer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if index is negative?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, we don't support index by a negative number, a error will return.

mysql> select get(v, -1) from arr;
ERROR 1105 (HY000): Code: 1010, displayText = Unexpected type:Int64 to get u64 number (while in processor thread 0).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have not seen as_u64 method.

let index_column: &PrimitiveColumn<$T2> = Series::check_get(columns[1].column())?;
let mut offset = 0;
for i in 0..input_rows {
let index = index_column.get(i).as_u64()? as usize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use get(i) to construct a datavalue and convert it into as_u64

for (row, index) in index_column.iter().enumerate() {
      if index < 0 { ...}
      
      ....

}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@BohuTANG
Copy link
Member

@mergify update

1 similar comment
@BohuTANG
Copy link
Member

@mergify update

@mergify
Copy link
Contributor

mergify bot commented May 11, 2022

update

✅ Branch has been successfully updated

@mergify
Copy link
Contributor

mergify bot commented May 11, 2022

update

❌ Base branch update has failed

expected head sha didn’t match current head ref.
err-code: D6356

@BohuTANG BohuTANG merged commit 883e768 into databendlabs:main May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-build this PR changes build/testing/ci steps pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: support get item by index to generic Array column
5 participants