Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for arrays being in separated rows or returned as null #213

Closed
wants to merge 7 commits into from

Conversation

GumpacG
Copy link

@GumpacG GumpacG commented Jan 31, 2023

Description

The main idea for the change is to support returning array values when the arrays contain objects. As a side effect, it fixed bugs with object types and partiQL syntax for the new engine.

  • Integration tests have been added as a result of the fixed bug.
  • Old tests were also removed as they were expecting the wrong result. For example:
  public void testSelectObjectFieldOfArrayValuesItself() {
    JSONObject response = new JSONObject(query("SELECT accounts FROM %s"));

    // Only the first element of the list of is returned.
    verifyDataRows(response, rows(new JSONObject("{\"id\": 1}")));
  }

Old behaviour of object fields with arrays of objects:

Screenshot 2023-01-30 at 7 38 44 PM

New behaviour of object fields with arrays of objects:

Screenshot 2023-01-30 at 7 29 19 PM

Old behaviour of Nested Function with arrays of objects:

Disregard the row of nulls here, it was to force the plugin to run on the legacy engine. Note the added row and duplicated "ab" value.
Screenshot 2023-01-30 at 7 37 41 PM

New behaviour of Nested Function with arrays of objects:

Screenshot 2023-01-30 at 7 30 27 PM

Old behaviour of PartiQL syntax with arrays of objects:

Screenshot 2023-01-30 at 7 35 30 PM

New behaviour of PartiQL syntax with arrays of objects:

Screenshot 2023-01-30 at 7 31 21 PM

Test data mapping for nested_simple:

{
  "mappings": {
    "properties": {
      "message": {
        "type": "nested",
        "properties": {
          "info": {
            "type": "keyword",
            "index": "true"
          },
          "author": {
            "type": "keyword",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            },
            "index": "true"
          },
          "dayOfWeek": {
            "type": "long"
          }
        }
      },
      "comment": {
        "type": "nested",
        "properties": {
          "data": {
            "type": "keyword",
            "index": "true"
          },
          "likes": {
            "type": "long"
          }
        }
      },
      "myNum": {
        "type": "long"
      },
      "someField": {
        "type": "keyword",
        "index": "true"
      }
    }
  }
}

Test data for nested_simple and nested_as_objects (to load object type, do not specify the mapping for the data):

{"index":{"_id":"1"}}
{"message":{"info":"a","author":"e","dayOfWeek":1},"comment":{"data":"ab","likes":3},"myNum":1,"someField":"b"}
{"index":{"_id":"2"}}
{"message":{"info":"b","author":"f","dayOfWeek":2},"comment":{"data":"aa","likes":2},"myNum":2,"someField":"a"}
{"index":{"_id":"3"}}
{"message":{"info":"c","author":"g","dayOfWeek":1},"comment":{"data":"aa","likes":3},"myNum":3,"someField":"a"}
{"index":{"_id":"4"}}
{"message":[{"info":"c","author":"h","dayOfWeek":4},{"info":"a","author":"i","dayOfWeek":5}],"comment":{"data":"ab","likes":1},"myNum":4,"someField":"b"}
{"index":{"_id":"5"}}
{"message": [{"info":"zz","author":"zz","dayOfWeek":6}],"comment":{"data":["aa","bb"],"likes":10},"myNum":[3,4],"someField":"a"}

Issues Resolved

opensearch-project#1305

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
…ues"

This reverts commit cea553e.

Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
…s with undesired expected results

Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
@GumpacG GumpacG requested a review from a team January 31, 2023 03:44
@@ -85,22 +85,6 @@ public void testSelectNestedFieldItself() {
);
}

@Test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix these rather than deleting them.

verifyDataRows(response, rows(new JSONArray(List.of(new JSONObject("{\"id\":1}"), new JSONObject("{\"id\":2}")))));
verifyDataRows(response, rows(new JSONArray(List.of(1,2))));

@@ -1,9 +1,9 @@
{"index":{"_id":"1"}}
{"message":[{"info":"a","author":{"name": "e", "address": {"street": "bc", "number": 1}},"dayOfWeek":1}]}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are updating the test data, maybe we could update the values to be better then one or two single letters.

Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
@codecov
Copy link

codecov bot commented Jan 31, 2023

Codecov Report

Merging #213 (a48587d) into dev-spike-nested (e994e67) will decrease coverage by 0.04%.
The diff coverage is 0.00%.

@@                  Coverage Diff                   @@
##             dev-spike-nested     #213      +/-   ##
======================================================
- Coverage               94.94%   94.91%   -0.04%     
  Complexity               3472     3472              
======================================================
  Files                     361      361              
  Lines                    9414     9417       +3     
  Branches                  682      682              
======================================================
  Hits                     8938     8938              
- Misses                    412      414       +2     
- Partials                   64       65       +1     
Flag Coverage Δ
query-workbench 62.76% <ø> (ø)
sql-engine 97.32% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...opensearch/sql/expression/ReferenceExpression.java 82.35% <0.00%> (-17.65%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Comment on lines 108 to 113
ExprValue result = ExprValueUtils.collectionValue(new ArrayList<>());

for (ExprValue val: value.collectionValue()){
result.collectionValue().add(resolve(val, paths));
}
return result;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do this with single-liner like

return new ExprCollectionValue(value.collectionValue().stream()
      .map(val -> resolve(val, paths)).collect(Collectors.toList()))

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed in a48587d


@Test
public void nested_function_with_array_of_nested_field_test() {
String query = "SELECT nested(message.info), nested(comment.data) FROM " + TEST_INDEX_NESTED_TYPE;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plugin response is

{
  "schema": [
    {
      "name": "nested(message.info)",
      "type": "keyword"
    },
    {
      "name": "nested(comment.data)",
      "type": "keyword"
    }
  ],
  "total": 5,
  "datarows": [
    [
      "a",
      "ab"
    ],
    [
      "b",
      "aa"
    ],
    [
      "c",
      "aa"
    ],
    [
      [
        "c",
        "a"
      ],
      "ab"
    ],
    [
      [
        "zz"
      ],
      [
        "aa",
        "bb"
      ]
    ]
  ],
  "size": 5,
  "status": 200
}

It could be confusing when datatype is keyword, but values are arrays and keywords. Imagine a user has a parser for response, what should parser do with such values?
You can try our JDBC driver as an example.

I think we should return all arrays and value type should be array if we work with value which may be an array.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe our solution is to split the nested function for JDBC or other queries that have strict typing (really, any SQL language). For now, we should make sure that the JDBC and ODBC drivers should not output any undesired typing and break.

To ensure that typings aren't broken, we could introduce partiql like syntax for nested that would make sure that only strict typing is followed on output. For example:

SELECT n.name, m1.info AS info_keyword, m2.info AS info_array FROM nested as n, n.message AS m1 IS keyword, n.message as m2 IS array

SELECT n.name, CASE(m.info IS keyword) THEN m.info END AS info_keyword, CASE(m.info IS ARRAY) THEN m.info END AS info_array FROM nested as n, n.message AS m

For now, this can be marked as 'out of scope' as the user should be responsible for proper data setup and typing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It is a new feature reported in #1300.

Signed-off-by: Guian Gumpac <guian.gumpac@improving.com>
Copy link

@Yury-Fridlyand Yury-Fridlyand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of CI failures, perhaps you need to rebase

@GabeFernandez310
Copy link

I also believe you will probably have to rebase to get rid of some of the failing checks. I think you may also be failing on code coverage as well.

@@ -10,11 +10,15 @@

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this pass checkstyle?

import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;

import org.json.JSONArray;
import org.json.JSONObject;
import org.junit.Test;
import org.opensearch.sql.legacy.SQLIntegTestCase;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this space between the imports pass checkstyle?

Comment on lines +3 to +9
import org.json.JSONArray;
import org.json.JSONObject;
import org.junit.Test;
import org.opensearch.sql.legacy.SQLIntegTestCase;

import java.io.IOException;
import java.util.List;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these imports pass checkstyle with the extra space and them out of alphabetical order?

@@ -0,0 +1,50 @@
package org.opensearch.sql.sql;

import org.json.JSONArray;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkstyle passes?

@@ -100,6 +104,12 @@ public ExprValue resolve(ExprTupleValue value) {
}

private ExprValue resolve(ExprValue value, List<String> paths) {
// This case is to allow returning all values in an array to be in one row
if (value.type().equals(ExprCoreType.ARRAY)){
return new ExprCollectionValue(value.collectionValue().stream()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like code coverage is complaining about no test for when the if statement is true?

@GabeFernandez310
Copy link

I think it looks good. Just want to see more checks passing (or an explanation for why they can't pass at the moment if it's failing due to something from main)

@GumpacG
Copy link
Author

GumpacG commented Feb 1, 2023

The checks are failing due to some failing tests inherited from the base branch when the POC was implemented. Fixing actions and cleaning up the POC will be part of a different task.

@GumpacG
Copy link
Author

GumpacG commented Feb 2, 2023

Closing this PR as there will be another PR towards the updated nested function POC

@GumpacG GumpacG closed this Feb 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants