Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework on OpenSearchDataType: parse, store and use mapping information #180

Merged

Conversation

Yury-Fridlyand
Copy link

@Yury-Fridlyand Yury-Fridlyand commented Dec 2, 2022

Description

Rework on OpenSearchDataType according to #169 (comment)
OpenSearchDataType is not a enum anymore, it is a class which stores all required mapping information.

TODOs

Fixes could be done in scope of this PR or in follow-up fixes:

  1. Types of columns added by aggregation are based on ExprCoreType. Type resolution is hardcoded and doesn't consider real column types:
  2. Get rid of traverseAndFlatten - do recursive traverse only
  3. Support more types in ARRAY

Further future improvements

Features that should be done in scope of other PRs:

  1. text is not completely supported as a separate type yet.
  2. Date formats are ignored (draft in #639: allow metadata fields (_id) to be used as QualifiedName #142).
  3. Some types are not supported, e.g. geo_shape, ip_range.
  4. alias is not resolved.
  5. Function Map<String, ExprType> getFieldTypes() in OpenSearchIndex should be replaced by Map<String, OpenSearchDataType> getFieldOpenSearchTypes(). This requires base interface change.
  6. Probably, completely remove ExprCoreType.

Issues Resolved

Assuming that these changes would fix opensearch-project#794, opensearch-project#1038, opensearch-project#1112 and opensearch-project#1113. Relevant for opensearch-project#1111 too.

Notes

  1. Dummy enum value
    added to bypass Ignore switch case default cases which can't be covered jacoco/jacoco#1211. See also https://www.javaspecialists.eu/archive/Issue272-Hacking-Enums-Revisited.html

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
… update tests.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@codecov

This comment was marked as abuse.

@Yury-Fridlyand
Copy link
Author

Should OpenSearchDataType inherit ExprType?

@Yury-Fridlyand
Copy link
Author

CC: @penghuo @dai-chen

* Add data types for those classes are not defined in `ExprCoreType`.
* Add fixes to make all IT pass.
* Complete some TODOs.

Address #180 (comment)

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@Yury-Fridlyand Yury-Fridlyand changed the title [WIP] Rework on OpenSearchDataType: parse, store and use mapping information Rework on OpenSearchDataType: parse, store and use mapping information Jan 6, 2023
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@Yury-Fridlyand Yury-Fridlyand marked this pull request as ready for review January 6, 2023 20:44
#180

* Update `IndexMapping::parseMapping` function.
* Rename `OpenSearchDataType.Type` to `OpenSearchDataType.MappingType`.
* Add `OpenSearchDataType::resolve` function.
* Add new constructor for `OpenSearchTextType`.
* Update tests.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@Yury-Fridlyand Yury-Fridlyand requested review from penghuo and dai-chen and removed request for penghuo and dai-chen January 14, 2023 00:53
Copy link

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, Yury!

My main feedback is to re-factor OpenSearchDataType-derived classes to avoid creating multiple copies of small identical objects -- since types are used every where, this will have a big impact.

I think we can move traverse and flatten and the notion of fields and properties to OpenSearchDescribeIndexRequest.getFieldTypes.

Comment on lines 26 to 42
public enum Type {
Text("text"),
Keyword("keyword"),
Ip("ip"),
GeoPoint("geo_point"),
Binary("binary"),
Date("date"),
Nested("nested"),
Byte("byte"),
Short("short"),
Integer("integer"),
Long("long"),
Float("float"),
HalfFloat("half_float"),
ScaledFloat("scaled_float"),
Double("double"),
Boolean("boolean");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make these separate classes.

Polymorphism is an implementation of this pattern, of using enums to indicate "type of thing" and then using a switch statement based on an enum value to change behaviour.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b1ffaa3.

@@ -22,7 +21,8 @@ public class ScriptUtils {
* Limitation: assume inner field name is always "keyword".
*/
public static String convertTextToKeyword(String fieldName, ExprType fieldType) {
if (fieldType == OPENSEARCH_TEXT_KEYWORD) {
if (fieldType instanceof OpenSearchTextType
&& ((OpenSearchTextType) fieldType).getFields().size() > 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that every text column will have a keyword field?

What will happen if that's not the case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not safe. In case if user makes search on text field without fields/fielddata, OpenSearch engine raises an exception like

Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [] in order to load field data by uninverting the inverted index.

if (cachedFieldTypes == null) {
cachedFieldTypes = new OpenSearchDescribeIndexRequest(client, indexName).getFieldTypes();
}
return OpenSearchDataType.traverseAndFlatten(cachedFieldTypes).entrySet().stream()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could cache the return of traverseAndFlatten instead of re-running it each time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is called only once, but then getFieldOpenSearchTypes() is called which requires unflattened map.

I'll do some tracking on traverseAndFlatten usage to optimize it possible. Thanks for the idea.

Comment on lines 147 to 153
// object and nested types have properties - info about embedded types
@Getter
@EqualsAndHashCode.Exclude
Map<String, OpenSearchDataType> properties = new LinkedHashMap<>();

// text could have fields
@Getter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields and properties can exist on the same type.

See this question, especially this comment.

We'd need to understand how they can interact together.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need examples of mapping and data to play with. I asked community for examples, but there is no response yet.

Current implementation can parse fields and properties both, but it maybe not used properly.

* @return A list of all `OpenSearchDataType`s from given map on the same nesting level (1).
* Nested object names are prefixed by names of their host.
*/
public static Map<String, OpenSearchDataType> traverseAndFlatten(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to OpenSearchExprValueFactory.construct/parse/parseArray -- can they be combined?

* Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser.
* Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests.
* Rewrite `traverseAndFlatten` according to #180 (comment)
* Minor comment fix.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
* `typeName` and `legacyTypeName` to return different type names.
* Update unit test.
* Revert changes done in IT.
* Change `typeof` function and corresponding tests.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@EqualsAndHashCode.Exclude
Map<String, OpenSearchDataType> properties = new LinkedHashMap<>();

// text could have fields

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now. No need to make assumptions for OpenSearch


// TODO remove, used in tests only

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove. test-only code is bad

// type.subtype : Object
// type.subtype.subsubtype : Object
// type.subtype.subsubtype.textWithKeywordType : Text
// |- keyword : Keyword

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a keyword in a text in an object...
is this flattened?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Object is flattened, but text-with-keyword remains text-with-keyword. This test shows that.

// type.keyword : Keyword
// ==================
// Objects are flattened, but Texts aren't
// TODO Arrays

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? Isn't this part of the nested work?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the difference at that point between nested and object. Should be both flattened the same manner?

…`. Update tests.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@GumpacG
Copy link

GumpacG commented Jan 27, 2023

Something to possibly investigate is if all Nested types are ExprCoreType.ARRAY (this case is fine) and if all ExprCoreType.ARRAY are Nested type(this case may be an issue). The latter may be an issue if the value is an array but not of Nested type. For example, myNum in https://github.com/opensearch-project/sql/blob/45fc371ea19c0f6c1e0107d294f30c97f9ff2002/integ-test/src/test/resources/nested_objects.json#L10.

This may be a bit complicated because of this bug in the new engine: opensearch-project#1300

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
…etones as much as possible.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
* Make string representations of all `ExprType`s uppercase.
* Remove functions from `IndexMapping` used in tests only.
* Change `OpenSearchDataType::cloneEmpty` function to avoid using reflection; update derived classes to override it.

Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
@Yury-Fridlyand Yury-Fridlyand merged commit 5e6b90a into integ-spike-rework-mapping-use Feb 1, 2023
@Yury-Fridlyand Yury-Fridlyand deleted the dev-spike-rework-mapping-use branch February 1, 2023 18:40
Yury-Fridlyand added a commit that referenced this pull request Feb 2, 2023
…ion (#180)

Rework on `OpenSearchDataType`:
* Add data types for those classes are not defined in `ExprCoreType`.
* Address #180 (comment)
* Remove `TextKeywordValue`.
* Add changes according to the PR review. #180
* Update `IndexMapping::parseMapping` function.
* Add `OpenSearchDataType::resolve` function.
* Add new constructor for `OpenSearchTextType`.
* Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser.
* Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests.
* Rewrite `traverseAndFlatten` according to #180 (comment)
* Minor comment fix.
* A fix to avoid breaking changes.
* `typeName` and `legacyTypeName` to return different type names.
* Change `typeof` function and corresponding tests.
* Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests.
* Update UT for `typeof` function.
* Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible.
* Make string representations of all `ExprType`s uppercase.
* Remove functions from `IndexMapping` used in tests only.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Yury-Fridlyand added a commit to opensearch-project/sql that referenced this pull request Mar 21, 2023
…ion (#1314)

* Rework on `OpenSearchDataType`: parse, store and use mapping information (#180)

Rework on `OpenSearchDataType`:
* Add data types for those classes are not defined in `ExprCoreType`.
* Address Bit-Quill#180 (comment)
* Remove `TextKeywordValue`.
* Add changes according to the PR review. Bit-Quill#180
* Update `IndexMapping::parseMapping` function.
* Add `OpenSearchDataType::resolve` function.
* Add new constructor for `OpenSearchTextType`.
* Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser.
* Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests.
* Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment)
* Minor comment fix.
* A fix to avoid breaking changes.
* `typeName` and `legacyTypeName` to return different type names.
* Change `typeof` function and corresponding tests.
* Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests.
* Update UT for `typeof` function.
* Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible.
* Make string representations of all `ExprType`s uppercase.
* Remove functions from `IndexMapping` used in tests only.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
opensearch-trigger-bot bot pushed a commit to opensearch-project/sql that referenced this pull request Mar 21, 2023
…ion (#1314)

* Rework on `OpenSearchDataType`: parse, store and use mapping information (#180)

Rework on `OpenSearchDataType`:
* Add data types for those classes are not defined in `ExprCoreType`.
* Address Bit-Quill#180 (comment)
* Remove `TextKeywordValue`.
* Add changes according to the PR review. Bit-Quill#180
* Update `IndexMapping::parseMapping` function.
* Add `OpenSearchDataType::resolve` function.
* Add new constructor for `OpenSearchTextType`.
* Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser.
* Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests.
* Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment)
* Minor comment fix.
* A fix to avoid breaking changes.
* `typeName` and `legacyTypeName` to return different type names.
* Change `typeof` function and corresponding tests.
* Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests.
* Update UT for `typeof` function.
* Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible.
* Make string representations of all `ExprType`s uppercase.
* Remove functions from `IndexMapping` used in tests only.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
(cherry picked from commit d44cd39)
Yury-Fridlyand added a commit to opensearch-project/sql that referenced this pull request Mar 21, 2023
…ion (#1314) (#1455)

* Rework on `OpenSearchDataType`: parse, store and use mapping information (#180)

Rework on `OpenSearchDataType`:
* Add data types for those classes are not defined in `ExprCoreType`.
* Address Bit-Quill#180 (comment)
* Remove `TextKeywordValue`.
* Add changes according to the PR review. Bit-Quill#180
* Update `IndexMapping::parseMapping` function.
* Add `OpenSearchDataType::resolve` function.
* Add new constructor for `OpenSearchTextType`.
* Make `fields` and `properties` in `OpenSearchDataType` readonly. Update tests and mapping parser.
* Move `getFields` from `OpenSearchDataType` to `OpenSearchTextType`. Update tests.
* Rewrite `traverseAndFlatten` according to Bit-Quill#180 (comment)
* Minor comment fix.
* A fix to avoid breaking changes.
* `typeName` and `legacyTypeName` to return different type names.
* Change `typeof` function and corresponding tests.
* Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType`. Update tests.
* Update UT for `typeof` function.
* Make all instances of `OpenSearchDataType` and of derived types singletones as much as possible.
* Make string representations of all `ExprType`s uppercase.
* Remove functions from `IndexMapping` used in tests only.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yury.fridlyand@improving.com>
(cherry picked from commit d44cd39)

Co-authored-by: Yury-Fridlyand <yury.fridlyand@improving.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants