Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error reporting #108

Conversation

Yury-Fridlyand
Copy link

@Yury-Fridlyand Yury-Fridlyand commented Aug 18, 2022

Description

Note: contains changes from #88, it is not merged on upstream.
My changes are concentrated in opensearch/src/main/java/org/opensearch/sql/opensearch/storage/script/filter/lucene/relevance/* and some changes on Forest's work in RestSQLQueryAction.java, RestSqlAction.java, QueryContext.java and ErrorMessage.java + changes on Max's PR in RelevanceQuery.java.

1.
Added cast validation and user-friendly error messages

SELECT phrase, insert_time2 from phrase WHERE match_phrase(phrase, 'brown fox', slop = 0.5);

before

{'reason': 'Invalid SQL query', 'details': 'For input string: "0.5"', 'type': 'NumberFormatException'}

after

{'reason': 'There was internal problem at backend', 'details': "Invalid slop value: '0.5'. Accepts only integer values.", 'type': 'RuntimeException'}

2.
Cast validation and user-friendly errors for all enum types

select `key` from calcs where multi_match([str1], 'DVD', zero_terms_query=meow);

before

{'reason': 'Invalid SQL query', 'details': 'No enum constant org.opensearch.index.search.MatchQuery.ZeroTermsQuery.MEOW', 'type': 'IllegalArgumentException'}

after

{'reason': 'There was internal problem at backend', 'details': "Invalid zero_terms_query value: 'meow'. Available values are: NONE, ALL, NULL.", 'type': 'RuntimeException'}

3.
Extended error message when incorrect parameters specified

SELECT phrase, insert_time2 from phrase WHERE match(phrase, 'brown fox', slop = 0);

before

{'reason': 'Invalid SQL query', 'details': 'Parameter slop is invalid for match function', 'type': 'SemanticCheckException'}

after

{'reason': 'Invalid SQL query', 'details': 'Parameter slop is invalid for match function. Available parameters are: analyzer, auto_generate_synonyms_phrase_query, boost, fuzziness, fuzzy_rewrite, fuzzy_transpositions, lenient, minimum_should_match, max_expansions, operator, prefix_length, zero_terms_query', 'type': 'SemanticCheckException'}

4.
A bit better error on typo in function name

SELECT phrase, insert_time2 from phrase WHERE match_phrase2(phrase, 'brown fox', slop = 0);

before

TransportError(503, 'Exception', {'error': {'reason': 'There was internal problem at backend', 'details': '[701f78b8-da61-4eb1-a00a-18f089f5644d] Request SQLQueryRequest(jsonContent={"query":"SELECT phrase, insert_time2 from phrase WHERE match_phrase2(phrase, \'brown fox\', slop = 0)"}, query=SELECT phrase, insert_time2 from phrase WHERE match_phrase2(phrase, \'brown fox\', slop = 0), path=/_plugins/_sql/, format=jdbc, params={format=jdbc}, sanitize=true) is not supported and falling back to old SQL engine', 'type': 'Exception'}, 'status': 503})

after

{'reason': 'Invalid SQL query', 'details': "unsupported method: match_phrase2\nQuery failed on both V1 and V2 SQL parser engines. V2 SQL parser error following: \nFailed to parse query due to offending symbol [(] at: 'SELECT phrase, insert_time2 from phrase WHERE match_phrase2(' <--- HERE... More details: Expecting tokens in {<EOF>, ';'}", 'type': 'SqlParseException'}

Issues left

1.

select `key` from calcs where multi_match([str1], 'DVD', zero_termsquery=A);
SELECT phrase, insert_time2 from phrase WHERE match_phrase(phrase, 'brown fox', slope = 0);

These queries succeeds, but I expect a failure. It falls back to the old engine and devil knows how executed.
Proposal: disable this function in the old engine. Could be a breaking change...

2.
It is nice to have a enum with parameter names to avoid mistakes when adding new parameters/functions.
Code seems ugly, so I reverted this:

.put("analyzer", (b, v) -> b.analyzer(v.stringValue()))
.put(FunctionParameterNames.analyzer.toString(), (b, v) -> b.analyzer(v.stringValue()))

Not so relevant as far as we added almost all functions.

3.
I tried to add a mapping between functions, lists of their parameters and parameters' setters. The code was super-ugly...
Instead of having one line for each parameter for each function I got 4...

Ideas and proposals are welcome!

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Yury-Fridlyand Yury-Fridlyand changed the title Dev spike imrove error reporting Spike - improve error reporting Aug 18, 2022
@Yury-Fridlyand Yury-Fridlyand marked this pull request as ready for review August 18, 2022 01:37
@Bit-Quill Bit-Quill deleted a comment from codecov bot Aug 18, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Aug 18, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Aug 18, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Aug 18, 2022
Map.Entry<Integer, FunctionSignature> bestMatchEntry = functionMatchQueue.peek();
if (FunctionSignature.NOT_MATCH.equals(bestMatchEntry.getKey())) {
throw new ExpressionEvaluationException(
String.format("%s function expected %s, but get %s", functionName,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"%s function expected %s, but got %s"

Should probably be got not get.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change copied from opensearch-project#746, it is not too late to post your proposals

}

private String formatFunctions(Set<FunctionSignature> functionSignatures) {
return functionSignatures.stream().map(FunctionSignature::formatTypes)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this function, formatFunctions, do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It builds a list of function arguments in format like [STRING, STRING, INT]

@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 8, 2022
@Yury-Fridlyand Yury-Fridlyand force-pushed the dev-spike-imrove-error-reporting branch from f9757cf to ef19d06 Compare September 8, 2022 21:20
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 8, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 8, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 8, 2022
@Yury-Fridlyand Yury-Fridlyand marked this pull request as draft September 8, 2022 21:51
@Yury-Fridlyand
Copy link
Author

Yury-Fridlyand commented Sep 8, 2022

https://github.com/Bit-Quill/opensearch-project-sql/blob/dev-spike-imrove-error-reporting/core/src/main/java/org/opensearch/sql/expression/function/RelevanceFunctionResolver.java#L41-L48

  1. Should I check that there are at least 2 parameters given there? It is checked in RelevanceQuery.build
  2. Should I remove validation that parameters 1..N are STRINGs? We may want to place them in any order.
    // Check if all but the first parameter are of type STRING.
    for (int i = 1; i < paramTypes.size(); i++) {
      ExprType paramType = paramTypes.get(i);
      if (!ExprCoreType.STRING.equals(paramType)) {
        throw new SemanticCheckException(
            getWrongParameterErrorMessage(i, paramType, ExprCoreType.STRING));
      }
    }

@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 9, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 9, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 9, 2022
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 9, 2022
@Yury-Fridlyand Yury-Fridlyand force-pushed the dev-spike-imrove-error-reporting branch from 5861f36 to 5182e6f Compare September 9, 2022 01:40
@Bit-Quill Bit-Quill deleted a comment from codecov bot Sep 9, 2022
@Yury-Fridlyand Yury-Fridlyand marked this pull request as ready for review September 9, 2022 01:49
@Yury-Fridlyand Yury-Fridlyand changed the title Spike - improve error reporting Improve error reporting Sep 9, 2022
Comment on lines +37 to +44
// Aggregate parameters by name, so getting a Map<Name:String, List>
arguments.stream().collect(Collectors.groupingBy(a -> a.getArgName().toLowerCase()))
.forEach((k, v) -> {
if (v.size() > 1) {
throw new SemanticCheckException(
String.format("Parameter '%s' can only be specified once.", k));
}
});
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to aggregate all duplicates into one exception? Currently, function throws on first found duplicated parameter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do other places...? I think we always throw on the first failure, but maybe you have an example where we aggregate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No examples. I don't think that we should worry about that, the probability that a user will specify multiple arguments more than once is extremely low I suppose. I just pointed that we have a room for further improvement.

@codecov

This comment was marked as spam.

Copy link

@MaxKsyunz MaxKsyunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-branch to remove commits from upstream. Are they in the integ-spik-improve-error-reporting branch? This is also tripping up the DCO check.

Also, is the first issue under issues left still valid? I thought the changes to FunctionResolver changed that error message.

MaxKsyunz and others added 3 commits September 9, 2022 20:48
Co-authored-by: MaxKsyunz <maxk@bitquilltech.com>
Co-authored-by: forestmvey <forestv@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@Yury-Fridlyand Yury-Fridlyand force-pushed the dev-spike-imrove-error-reporting branch from 98df515 to 1b0e047 Compare September 10, 2022 03:49
@Yury-Fridlyand
Copy link
Author

Yury-Fridlyand commented Sep 10, 2022

Please re-branch to remove commits from upstream. Are they in the integ-spik-improve-error-reporting branch? This is also tripping up the DCO check.

20:46:03 /mnt/c/G/ope $ git pull --all
Fetching origin
remote: Enumerating objects: 31, done.
remote: Counting objects: 100% (31/31), done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 31 (delta 19), reused 31 (delta 19), pack-reused 0
Unpacking objects: 100% (31/31), 4.75 KiB | 39.00 KiB/s, done.
From github.com:Bit-Quill/opensearch-project-sql
   c751cc97..41df59e8  dev-Add-convert_tz-function -> origin/dev-Add-convert_tz-function
   73ca29e8..65776edd  dev-datetime-unix-convert   -> origin/dev-datetime-unix-convert
Fetching upstream
remote: Enumerating objects: 20, done.
remote: Counting objects: 100% (11/11), done.
remote: Total 20 (delta 10), reused 10 (delta 10), pack-reused 9
Unpacking objects: 100% (20/20), 1.99 KiB | 23.00 KiB/s, done.
From github.com:opensearch-project/sql
   53cde65a..b7b37da7  main       -> upstream/main
Already up to date.
20:48:33 /mnt/c/G/ope $ git switch dev-spike-imrove-error-reporting
Already on 'dev-spike-imrove-error-reporting'
Your branch is up to date with 'origin/dev-spike-imrove-error-reporting'.
20:48:41 /mnt/c/G/ope $ git rebase upstream/main
First, rewinding head to replay your work on top of it...
Applying: Rework on error reporting to make it more verbose and human-friendly.
Applying: Improve check for `field` and `fields`.
Applying: Reorder V1 and V2 error messages.
20:49:01 /mnt/c/G/ope $ git push -f
Enumerating objects: 204, done.
Counting objects: 100% (204/204), done.
Delta compression using up to 4 threads
Compressing objects: 100% (96/96), done.
Writing objects: 100% (143/143), 20.22 KiB | 265.00 KiB/s, done.
Total 143 (delta 71), reused 12 (delta 3)
remote: Resolving deltas: 100% (71/71), completed with 37 local objects.
To github.com:Bit-Quill/opensearch-project-sql.git
 + 98df5156...1b0e0476 dev-spike-imrove-error-reporting -> dev-spike-imrove-error-reporting (forced update)
20:49:08 /mnt/c/G/ope $ git switch integ-spike-imrove-error-reporting
Switched to branch 'integ-spike-imrove-error-reporting'
Your branch is up to date with 'origin/integ-spike-imrove-error-reporting'.
20:49:25 /mnt/c/G/ope $ git rebase upstream/main
First, rewinding head to replay your work on top of it...
Fast-forwarded integ-spike-imrove-error-reporting to upstream/main.
20:49:32 /mnt/c/G/ope $ git push -f
Total 0 (delta 0), reused 0 (delta 0)
To github.com:Bit-Quill/opensearch-project-sql.git
   53cde65a..b7b37da7  integ-spike-imrove-error-reporting -> integ-spike-imrove-error-reporting
20:49:36 /mnt/c/G/ope $

Now I have another commit shown in the list.

Also, is the first issue under issues left still valid? I thought the changes to FunctionResolver changed that error message.

Thanks, updated.

@Yury-Fridlyand
Copy link
Author

Can't avoid commits from upstream to be shown in my branches. I recreated branches and PR, see #116.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants