-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework on OpenSearchDataType
: parse, store and use mapping information
#180
Rework on OpenSearchDataType
: parse, store and use mapping information
#180
Conversation
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
… update tests. Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
This comment was marked as abuse.
This comment was marked as abuse.
Should |
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Outdated
Show resolved
Hide resolved
* Add data types for those classes are not defined in `ExprCoreType`. * Add fixes to make all IT pass. * Complete some TODOs. Address #180 (comment) Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
OpenSearchDataType
: parse, store and use mapping informationOpenSearchDataType
: parse, store and use mapping information
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
#180 * Update `IndexMapping::parseMapping` function. * Rename `OpenSearchDataType.Type` to `OpenSearchDataType.MappingType`. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Update tests. Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition, Yury!
My main feedback is to re-factor OpenSearchDataType-derived classes to avoid creating multiple copies of small identical objects -- since types are used every where, this will have a big impact.
I think we can move traverse and flatten and the notion of fields and properties to OpenSearchDescribeIndexRequest.getFieldTypes.
public enum Type { | ||
Text("text"), | ||
Keyword("keyword"), | ||
Ip("ip"), | ||
GeoPoint("geo_point"), | ||
Binary("binary"), | ||
Date("date"), | ||
Nested("nested"), | ||
Byte("byte"), | ||
Short("short"), | ||
Integer("integer"), | ||
Long("long"), | ||
Float("float"), | ||
HalfFloat("half_float"), | ||
ScaledFloat("scaled_float"), | ||
Double("double"), | ||
Boolean("boolean"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make these separate classes.
Polymorphism is an implementation of this pattern, of using enums to indicate "type of thing" and then using a switch statement based on an enum value to change behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b1ffaa3.
@@ -22,7 +21,8 @@ public class ScriptUtils { | |||
* Limitation: assume inner field name is always "keyword". | |||
*/ | |||
public static String convertTextToKeyword(String fieldName, ExprType fieldType) { | |||
if (fieldType == OPENSEARCH_TEXT_KEYWORD) { | |||
if (fieldType instanceof OpenSearchTextType | |||
&& ((OpenSearchTextType) fieldType).getFields().size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to assume that every text column will have a keyword
field?
What will happen if that's not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not safe. In case if user makes search on text field without fields
/fielddata
, OpenSearch engine raises an exception like
Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [] in order to load field data by uninverting the inverted index.
opensearch/src/test/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataTypeTest.java
Outdated
Show resolved
Hide resolved
if (cachedFieldTypes == null) { | ||
cachedFieldTypes = new OpenSearchDescribeIndexRequest(client, indexName).getFieldTypes(); | ||
} | ||
return OpenSearchDataType.traverseAndFlatten(cachedFieldTypes).entrySet().stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could cache the return of traverseAndFlatten
instead of re-running it each time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it is called only once, but then getFieldOpenSearchTypes()
is called which requires unflattened map.
I'll do some tracking on traverseAndFlatten
usage to optimize it possible. Thanks for the idea.
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Outdated
Show resolved
Hide resolved
// object and nested types have properties - info about embedded types | ||
@Getter | ||
@EqualsAndHashCode.Exclude | ||
Map<String, OpenSearchDataType> properties = new LinkedHashMap<>(); | ||
|
||
// text could have fields | ||
@Getter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need examples of mapping and data to play with. I asked community for examples, but there is no response yet.
Current implementation can parse fields
and properties
both, but it maybe not used properly.
* @return A list of all `OpenSearchDataType`s from given map on the same nesting level (1). | ||
* Nested object names are prefixed by names of their host. | ||
*/ | ||
public static Map<String, OpenSearchDataType> traverseAndFlatten( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks similar to OpenSearchExprValueFactory.construct/parse/parseArray -- can they be combined?
* Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to #180 (comment) * Minor comment fix. Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
* `typeName` and `legacyTypeName` to return different type names. * Update unit test. * Revert changes done in IT. * Change `typeof` function and corresponding tests. Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
integ-test/src/test/java/org/opensearch/sql/sql/SystemFunctionIT.java
Outdated
Show resolved
Hide resolved
@EqualsAndHashCode.Exclude | ||
Map<String, OpenSearchDataType> properties = new LinkedHashMap<>(); | ||
|
||
// text could have fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now. No need to make assumptions for OpenSearch
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/ScriptUtils.java
Outdated
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Show resolved
Hide resolved
|
||
// TODO remove, used in tests only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove. test-only code is bad
// type.subtype : Object | ||
// type.subtype.subsubtype : Object | ||
// type.subtype.subsubtype.textWithKeywordType : Text | ||
// |- keyword : Keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a keyword in a text in an object...
is this flattened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Object is flattened, but text-with-keyword remains text-with-keyword. This test shows that.
opensearch/src/test/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataTypeTest.java
Outdated
Show resolved
Hide resolved
// type.keyword : Keyword | ||
// ================== | ||
// Objects are flattened, but Texts aren't | ||
// TODO Arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? Isn't this part of the nested work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the difference at that point between nested
and object
. Should be both flattened the same manner?
…`. Update tests. Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Something to possibly investigate is if all This may be a bit complicated because of this bug in the new engine: opensearch-project#1300 |
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
…etones as much as possible. Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
* Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. * Change `OpenSearchDataType::cloneEmpty` function to avoid using reflection; update derived classes to override it. Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
…ion (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address #180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. #180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to #180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com> Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
…ion (#1314) * Rework on `OpenSearchDataType`: parse, store and use mapping information (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address Bit-Quill#180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. Bit-Quill#180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com> Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
…ion (#1314) * Rework on `OpenSearchDataType`: parse, store and use mapping information (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address Bit-Quill#180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. Bit-Quill#180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com> Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com> (cherry picked from commit d44cd39)
…ion (#1314) (#1455) * Rework on `OpenSearchDataType`: parse, store and use mapping information (#180) Rework on `OpenSearchDataType`: * Add data types for those classes are not defined in `ExprCoreType`. * Address Bit-Quill#180 (comment) * Remove `TextKeywordValue`. * Add changes according to the PR review. Bit-Quill#180 * Update `IndexMapping::parseMapping` function. * Add `OpenSearchDataType::resolve` function. * Add new constructor for `OpenSearchTextType`. * Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser. * Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests. * Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment) * Minor comment fix. * A fix to avoid breaking changes. * `typeName` and `legacyTypeName` to return different type names. * Change `typeof` function and corresponding tests. * Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests. * Update UT for `typeof` function. * Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible. * Make string representations of all `ExprType`s uppercase. * Remove functions from `IndexMapping` used in tests only. Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com> Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com> (cherry picked from commit d44cd39) Co-authored-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Description
Rework on
OpenSearchDataType
according to #169 (comment)OpenSearchDataType
is not a enum anymore, it is a class which stores all required mapping information.TODOs
Fixes could be done in scope of this PR or in follow-up fixes:
ExprCoreType
. Type resolution is hardcoded and doesn't consider real column types:opensearch-project-sql/core/src/main/java/org/opensearch/sql/data/type/ExprCoreType.java
Line 86 in e4d89de
traverseAndFlatten
- do recursive traverse onlyARRAY
opensearch-project-sql/opensearch/src/main/java/org/opensearch/sql/opensearch/data/value/OpenSearchExprValueFactory.java
Line 244 in 16d7ce0
Further future improvements
Features that should be done in scope of other PRs:
text
is not completely supported as a separate type yet.geo_shape
,ip_range
.alias
is not resolved.Map<String, ExprType> getFieldTypes()
inOpenSearchIndex
should be replaced byMap<String, OpenSearchDataType> getFieldOpenSearchTypes()
. This requires base interface change.ExprCoreType
.Issues Resolved
Assuming that these changes would fix opensearch-project#794, opensearch-project#1038, opensearch-project#1112 and opensearch-project#1113. Relevant for opensearch-project#1111 too.
Notes
opensearch-project-sql/opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java
Line 28 in ae11c22
added to bypass Ignore switch case default cases which can't be covered jacoco/jacoco#1211. See also https://www.javaspecialists.eu/archive/Issue272-Hacking-Enums-Revisited.html
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.