-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSSION] Properly support array values in new engine #1300
Comments
We had a team discussion outside of GH and I'll list ideas we got and notes for them. Return all values as array when there is a mix of
|
@Yury-Fridlyand Thanks for sharing the notes! Have we considered also the idea in Preto/Trino: #442 (comment) ? |
Following that the response be like {
"schema": [
{
"name": "myNum",
"type": "long",
"array": true
}
],
"total": 2,
"datarows": [
[
5
],
[
[
3,
4
]
]
],
"size": 2,
"status": 200
} Why not? |
This also concerns fields being passed in to functions as only the last value in the array is used. Also, note the incorrect type with the example below.
|
A current blocker for implementing Array Support in JDBC driver, so this would be a great feature to have, any status update on priority? |
I'm going to research possible solutions for this and share for discussion.
search result: {
"took" : 361,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "dbg",
"_id" : "uDBnoYgB5NyEnr3HyanK",
"_score" : 1.0,
"_source" : {
"obj" : [
[
1,
2
],
[
3,
4
],
5
]
}
}
]
}
} mapping: {
"dbg" : {
"aliases" : { },
"mappings" : {
"properties" : {
"obj" : {
"type" : "long"
}
}
},
"settings" : {
"index" : {
"creation_date" : "1686168574104",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "LN-lYUQFSsi9MVpa5VT-Zg",
"version" : {
"created" : "136297827"
},
"provided_name" : "dbg"
}
}
}
} |
Please proceed with discussion in #1733. |
Supporting all of the above use cases will take multiple tries, and each should be dealt with separately. We can separate each use cases into a individual issues. To solve the primitive/array expanding into multiple rows, we should try and use the metadata (cues) to determine if the data is treated as a primitive object or an array (we cannot do both easily). Doing something like what Presto/Trino supports in the index mapping: https://trino.io/docs/current/connector/elasticsearch.html#array-types would be simple. It would indicate that the mapped symbol should treat the data record as an array (or an array of 1 if the data is not defined as an array). We should PoC this and see if it works to solve #1733 (comment). |
…ect#3095) (opensearch-project#3120) Signed-off-by: Norman Jordan <norman.jordan@improving.com> (cherry picked from commit e109417) Co-authored-by: normanj-bitquill <78755797+normanj-bitquill@users.noreply.github.com>
…ect#3095) (opensearch-project#3120) Signed-off-by: Norman Jordan <norman.jordan@improving.com> (cherry picked from commit e109417) Co-authored-by: normanj-bitquill <78755797+normanj-bitquill@users.noreply.github.com> Signed-off-by: Louis Chu <clingzhi@amazon.com>
…ect#3095) (opensearch-project#3120) Signed-off-by: Norman Jordan <norman.jordan@improving.com> (cherry picked from commit e109417) Co-authored-by: normanj-bitquill <78755797+normanj-bitquill@users.noreply.github.com> Signed-off-by: Louis Chu <clingzhi@amazon.com>
…ect#3095) (opensearch-project#3120) Signed-off-by: Norman Jordan <norman.jordan@improving.com> (cherry picked from commit e109417) Co-authored-by: normanj-bitquill <78755797+normanj-bitquill@users.noreply.github.com> Signed-off-by: Louis Chu <clingzhi@amazon.com>
Signed-off-by: Norman Jordan <norman.jordan@improving.com> (cherry picked from commit e109417) Co-authored-by: normanj-bitquill <78755797+normanj-bitquill@users.noreply.github.com> (cherry picked from commit 81577df) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
(cherry picked from commit e109417) (cherry picked from commit 81577df) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: normanj-bitquill <78755797+normanj-bitquill@users.noreply.github.com>
What is the bug?
The new engine does not return values in an array while the Legacy engine returns all values in a row as an array. Implementing same support as
V1
does isn't a right way, because legacy engine produces inconsistent value.How can one reproduce the bug?
Steps to reproduce the behavior:
curl -XDELETE 'http://localhost:9200/dbg'
curl -X POST "localhost:9200/dbg/_doc/?pretty" -H 'Content-Type: application/json' -d '{"myNum": 5}'
select * from dbg
. Not bas so far.curl -X POST "localhost:9200/dbg/_doc/?pretty" -H 'Content-Type: application/json' -d '{"myNum": [3, 4]}'
curl -X GET "localhost:9200/dbg?pretty"
curl -s -XPOST http://localhost:9200/_plugins/_sql -H 'Content-Type: application/json' -d '{"query": "select * from dbg"}'
(if you have only second doc in the index)
curl -s -XPOST http://localhost:9200/_plugins/_sql -H 'Content-Type: application/json' -d '{"query": "select * from dbg", "fetch_size": 20}'
What is the expected behavior?
TBD
Why legacy response is incorrect?
It declares data type as long, but returns a number and array of numbers. Imagine a user has a parser for response, what should parser do with such values?
You can try our JDBC driver as an example of a customer application.
What is your host/environment?
main
@ 6108ca1The text was updated successfully, but these errors were encountered: