Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support match_phrase filter function in SQL and PPL #604

Merged
merged 58 commits into from
May 25, 2022
Merged
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
e7c45f5
Support match_phrase in AST expressions
Apr 22, 2022
a39f423
Support match_phrase in PPL syntax
Apr 26, 2022
8d0cd91
Added MATCH_QUERY to the list of supported functions in SQL parser.
Yury-Fridlyand Apr 26, 2022
7d451e5
Support slop parameter for relevancy functions
Apr 26, 2022
7001b87
Typo fix for previous commit. Expanded list of MATCH_* function argum…
Yury-Fridlyand Apr 26, 2022
f0ff81c
Integration tests for match_test in PPL
Apr 26, 2022
b883254
Support slop option
Apr 26, 2022
e1c071a
Tests to verify that PPL parser supports match_phrase
Apr 26, 2022
9d4f378
Refactor and style fixes
Apr 26, 2022
3c8bd62
Refactor and fix style in OpenSearchFunctions.java
Apr 26, 2022
4a0fe8b
Style fixes for OpenSearchFunctionsTest.java
Apr 26, 2022
659c4d7
Add tests for `MATCH` and `MATCH_PHRASE` functions.
Yury-Fridlyand Apr 26, 2022
017fbb7
Typo fix.
Yury-Fridlyand Apr 26, 2022
e93aa2b
Fix style and minor refactor in PPLSyntaxParserTest.java
Apr 26, 2022
f67b1ac
Minor test improvements
Yury-Fridlyand Apr 26, 2022
0e48992
Parameterized test for match_phrase parsing
Apr 28, 2022
cc2100b
Add `MATCH_PHRASE` function to the function repository.
Yury-Fridlyand Apr 27, 2022
23f8fe6
Finally added `MATCH_PHASE` query support to `FilterQueryBuilder`.
Yury-Fridlyand Apr 28, 2022
775cb20
Typo fix + EOL fix
Yury-Fridlyand Apr 28, 2022
2aaf047
Typo fix
Yury-Fridlyand Apr 29, 2022
2316173
Adding tests for `FilterQueryBuilder` for `MATCH_PHRASE`. Some tests …
Yury-Fridlyand Apr 29, 2022
e22879a
Address PR feedback.
Yury-Fridlyand May 2, 2022
a46d6ac
Add documentation for `MATCH_PHRASE` function.
Yury-Fridlyand May 2, 2022
30ef3ba
Remove duplicate definition of match_phrase in DSL.java
May 3, 2022
39ac1dd
Generate supported signatures for match and match_phrases
May 3, 2022
0ec0aec
Simplify match_phrase test.
May 3, 2022
a987aa3
Some sample queries with match_phrase for parser test
May 3, 2022
14cb1a5
Update PPL documentation.
May 3, 2022
b4f5ba9
Update query samples in docs. Add a new table to doctest to match the…
Yury-Fridlyand May 4, 2022
e4d155b
Recover `FUZZY_REWRITE` - was deleted by mistake.
Yury-Fridlyand May 4, 2022
c303381
Add missing file for `63009a05`.
Yury-Fridlyand May 4, 2022
5020d7f
Use constant seed to produce repeatable samples
May 4, 2022
82c6ab8
Enable integration tests for match_phrase in PPL
May 4, 2022
a0d56a8
Fix code style.
Yury-Fridlyand May 4, 2022
becdeac
Make generateAndTestQueries() product consistent results.
May 4, 2022
b30db7e
use generateQueries to generate samples for match_phrase
May 4, 2022
0e8d6da
Fix finction signature list. Typo fix for `5bf470c7`.
Yury-Fridlyand May 4, 2022
4b92454
Fix code styling.
Yury-Fridlyand May 4, 2022
5334c65
Minor style changes.
May 6, 2022
05c6acb
Add newline at the end of file.
May 6, 2022
80e3da0
Simplify MatchPhraseQuery and add unit tests.
May 9, 2022
8198323
Remove unused imports.
May 9, 2022
5c6e07f
Change data used by WhereCommandIT
May 9, 2022
0f7ea73
Style update for OpenSearchFunctions
May 9, 2022
026ac4c
100% test coverage for MatchPhraseQuery
May 9, 2022
e68ba62
Fix imports in WhereCommandIT
May 10, 2022
bcc0a82
Move final variables to class constants in OpenSearchFunctions
May 10, 2022
0127ee6
Fix checkstyle failure.
May 10, 2022
d6ea633
Support legacy syntax for match_phrase in the new SQL engine.
May 16, 2022
fedbbed
Merge pull request #59 from Bit-Quill/integ-match_phrase-#185-legacy-…
MaxKsyunz May 16, 2022
1fae665
Added missed license headers.
May 20, 2022
d3667ce
Add RelevanceQuery -- a base class for MatchQuery and MatchPhraseQuery.
May 20, 2022
7101d07
Use SyntaxCheckException from RelevanceQuery.build
May 20, 2022
b5bf44e
MatchPhraseQueryBuilder constructor requires a non-empty field name.
May 20, 2022
17f0ded
Add missing newline at the end of the file.
May 21, 2022
80f0deb
Merge branch 'main' into integ-match_phrase-#185
MaxKsyunz May 21, 2022
15775e8
Update tests from upstream/main to verify new behavior.
May 21, 2022
ee4e319
Verify exception messages in RelevanceQueryBuildTest
May 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions core/src/main/java/org/opensearch/sql/expression/DSL.java
Original file line number Diff line number Diff line change
Expand Up @@ -651,7 +651,16 @@ public FunctionExpression castDatetime(Expression value) {
}

public FunctionExpression match(Expression... args) {
return (FunctionExpression) repository
.compile(BuiltinFunctionName.MATCH.getName(), Arrays.asList(args.clone()));
return compile(BuiltinFunctionName.MATCH, args);
}

public FunctionExpression match_phrase(Expression... args) {
return compile(BuiltinFunctionName.MATCH_PHRASE, args);
}

private FunctionExpression compile(BuiltinFunctionName bfn, Expression... args) {
return (FunctionExpression) repository.compile(bfn.getName(), Arrays.asList(args.clone()));
}


}
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,8 @@ public enum BuiltinFunctionName {
* Relevance Function.
*/
MATCH(FunctionName.of("match")),
MATCH_PHRASE(FunctionName.of("match_phrase")),
MATCHPHRASE(FunctionName.of("matchphrase")),

/**
* Legacy Relevance Function.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@

import static org.opensearch.sql.data.type.ExprCoreType.STRING;

import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
import lombok.ToString;
import lombok.experimental.UtilityClass;
import org.opensearch.sql.data.model.ExprValue;
import org.opensearch.sql.data.type.ExprCoreType;
Expand All @@ -23,56 +23,49 @@

@UtilityClass
public class OpenSearchFunctions {

public static final int MATCH_MAX_NUM_PARAMETERS = 12;
public static final int MATCH_PHRASE_MAX_NUM_PARAMETERS = 3;
public static final int MIN_NUM_PARAMETERS = 2;

/**
* Add functions specific to OpenSearch to repository.
*/
public void register(BuiltinFunctionRepository repository) {
repository.register(match());
// Register MATCHPHRASE as MATCH_PHRASE as well for backwards
// compatibility.
repository.register(match_phrase(BuiltinFunctionName.MATCH_PHRASE));
repository.register(match_phrase(BuiltinFunctionName.MATCHPHRASE));
}

private static FunctionResolver match() {
FunctionName funcName = BuiltinFunctionName.MATCH.getName();
return getRelevanceFunctionResolver(funcName, MATCH_MAX_NUM_PARAMETERS);
}

private static FunctionResolver match_phrase(BuiltinFunctionName matchPhrase) {
FunctionName funcName = matchPhrase.getName();
return getRelevanceFunctionResolver(funcName, MATCH_PHRASE_MAX_NUM_PARAMETERS);
}

private static FunctionResolver getRelevanceFunctionResolver(
FunctionName funcName, int maxNumParameters) {
return new FunctionResolver(funcName,
ImmutableMap.<FunctionSignature, FunctionBuilder>builder()
.put(new FunctionSignature(funcName, ImmutableList.of(STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList.of(STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList.of(STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING,
STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING,
STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING,
STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING,
STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.put(new FunctionSignature(funcName, ImmutableList
.of(STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING,
STRING, STRING, STRING, STRING, STRING)),
args -> new OpenSearchFunction(funcName, args))
.build());
getRelevanceFunctionSignatureMap(funcName, maxNumParameters));
}

private static Map<FunctionSignature, FunctionBuilder> getRelevanceFunctionSignatureMap(
FunctionName funcName, int numOptionalParameters) {
FunctionBuilder buildFunction = args -> new OpenSearchFunction(funcName, args);
var signatureMapBuilder = ImmutableMap.<FunctionSignature, FunctionBuilder>builder();
for (int numParameters = MIN_NUM_PARAMETERS;
numParameters <= MIN_NUM_PARAMETERS + numOptionalParameters;
numParameters++) {
List<ExprType> args = Collections.nCopies(numParameters, STRING);
signatureMapBuilder.put(new FunctionSignature(funcName, args), buildFunction);
}
return signatureMapBuilder.build();
}

private static class OpenSearchFunction extends FunctionExpression {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,15 @@
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.opensearch.sql.data.type.ExprCoreType.BOOLEAN;

import java.util.List;
import org.junit.jupiter.api.Test;
import org.opensearch.sql.expression.DSL;
import org.opensearch.sql.expression.ExpressionTestBase;
import org.opensearch.sql.expression.FunctionExpression;
import org.opensearch.sql.expression.NamedArgumentExpression;



public class OpenSearchFunctionsTest extends ExpressionTestBase {
private final NamedArgumentExpression field = new NamedArgumentExpression(
"field", DSL.literal("message"));
Expand All @@ -40,10 +43,14 @@ public class OpenSearchFunctionsTest extends ExpressionTestBase {
"operator", DSL.literal("OR"));
private final NamedArgumentExpression minimumShouldMatch = new NamedArgumentExpression(
"minimum_should_match", DSL.literal("1"));
private final NamedArgumentExpression zeroTermsQuery = new NamedArgumentExpression(
"zero_terms_query", DSL.literal("ALL"));
private final NamedArgumentExpression zeroTermsQueryAll = new NamedArgumentExpression(
"zero_terms_query", DSL.literal("ALL"));
private final NamedArgumentExpression zeroTermsQueryNone = new NamedArgumentExpression(
"zero_terms_query", DSL.literal("None"));
private final NamedArgumentExpression boost = new NamedArgumentExpression(
"boost", DSL.literal("2.0"));
private final NamedArgumentExpression slop = new NamedArgumentExpression(
"slop", DSL.literal("3"));

@Test
void match() {
Expand Down Expand Up @@ -98,16 +105,34 @@ void match() {

expr = dsl.match(
field, query, analyzer, autoGenerateSynonymsPhrase, fuzziness, maxExpansions, prefixLength,
fuzzyTranspositions, fuzzyRewrite, lenient, operator, minimumShouldMatch, zeroTermsQuery);
fuzzyTranspositions, fuzzyRewrite, lenient, operator, minimumShouldMatch,
zeroTermsQueryAll);
assertEquals(BOOLEAN, expr.type());

expr = dsl.match(
field, query, analyzer, autoGenerateSynonymsPhrase, fuzziness, maxExpansions, prefixLength,
fuzzyTranspositions, fuzzyRewrite, lenient, operator, minimumShouldMatch, zeroTermsQuery,
fuzzyTranspositions, fuzzyRewrite, lenient, operator, minimumShouldMatch, zeroTermsQueryAll,
boost);
assertEquals(BOOLEAN, expr.type());
}

@Test
void match_phrase() {
for (FunctionExpression expr : match_phrase_dsl_expressions()) {
assertEquals(BOOLEAN, expr.type());
}
}


List<FunctionExpression> match_phrase_dsl_expressions() {
return List.of(
dsl.match_phrase(field, query),
dsl.match_phrase(field, query, analyzer),
dsl.match_phrase(field, query, analyzer, zeroTermsQueryAll),
dsl.match_phrase(field, query, analyzer, zeroTermsQueryNone, slop)
);
}

@Test
void match_in_memory() {
FunctionExpression expr = dsl.match(field, query);
Expand Down
40 changes: 40 additions & 0 deletions docs/user/dql/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2195,3 +2195,43 @@ Another example to show how to set custom values for the optional parameters::
| Bond |
+------------+

MATCH_PHRASE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what should be done but should we distinguish between this one and https://github.com/opensearch-project/sql/blob/main/docs/user/beyond/fulltext.rst#match-phrase-query

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that match_phrase from the legacy SQL engine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe so, i'm thinking maybe add a note to the old one, or link it after "For backward compatibility" in the new one, so users know what the difference is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshuali925 that's a good point.

Distinction between old and new SQL engines was explicitly removed as part of #58

I'd like to add a note to fulltext.rst that the document describes the legacy SQL engine and to refer to functions.rst for the new engine.

Once the new engine is on par with the old one, we can update fulltext.rst to describe the default engine.

What do you think?

-----

Description
>>>>>>>>>>>

``match_phrase(field_expression, query_expression[, option=<option_value>]*)``

The match_phrase function maps to the match_phrase query used in search engine, to return the documents that match a provided text with a given field. Available parameters include:

- analyzer
- slop
- zero_terms_query

For backward compatibility, matchphrase is also supported and mapped to match_phrase query as well.

Example with only ``field`` and ``query`` expressions, and all other parameters are set default values::

os> SELECT author, title FROM books WHERE match_phrase(author, 'Alexander Milne');
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+



Another example to show how to set custom values for the optional parameters::

os> SELECT author, title FROM books WHERE match_phrase(author, 'Alan Milne', slop = 2);
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+

40 changes: 40 additions & 0 deletions docs/user/ppl/functions/relevance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,46 @@ Another example to show how to set custom values for the optional parameters::
| Bond |
+------------+

MATCH_PHRASE
-----

Description
>>>>>>>>>>>

``match_phrase(field_expression, query_expression[, option=<option_value>]*)``

The match_phrase function maps to the match_phrase query used in search engine, to return the documents that match a provided text with a given field. Available parameters include:

- analyzer
- slop
- zero_terms_query

For backward compatibility, matchphrase is also supported and mapped to match_phrase query as well.

Example with only ``field`` and ``query`` expressions, and all other parameters are set default values::

os> source=books | where match_phrase(author, 'Alexander Milne') | fields author, title
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+



Another example to show how to set custom values for the optional parameters::

os> source=books | where match_phrase(author, 'Alan Milne', slop = 2) | fields author, title
fetched rows / total rows = 2/2
+----------------------+--------------------------+
| author | title |
|----------------------+--------------------------|
| Alan Alexander Milne | The House at Pooh Corner |
| Alan Alexander Milne | Winnie-the-Pooh |
+----------------------+--------------------------+

Limitations
>>>>>>>>>>>

Expand Down
2 changes: 2 additions & 0 deletions doctest/test_data/books.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"id": 1, "author": "Alan Alexander Milne", "title": "The House at Pooh Corner"}
{"id": 2, "author": "Alan Alexander Milne", "title": "Winnie-the-Pooh"}
4 changes: 3 additions & 1 deletion doctest/test_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
PEOPLE = "people"
ACCOUNT2 = "account2"
NYC_TAXI = "nyc_taxi"
BOOKS = "books"


class DocTestConnection(OpenSearchConnection):
Expand Down Expand Up @@ -88,6 +89,7 @@ def set_up_test_indices(test):
load_file("people.json", index_name=PEOPLE)
load_file("account2.json", index_name=ACCOUNT2)
load_file("nyc_taxi.json", index_name=NYC_TAXI)
load_file("books.json", index_name=BOOKS)


def load_file(filename, index_name):
Expand Down Expand Up @@ -116,7 +118,7 @@ def set_up(test):

def tear_down(test):
# drop leftover tables after each test
test_data_client.indices.delete(index=[ACCOUNTS, EMPLOYEES, PEOPLE, ACCOUNT2, NYC_TAXI], ignore_unavailable=True)
test_data_client.indices.delete(index=[ACCOUNTS, EMPLOYEES, PEOPLE, ACCOUNT2, NYC_TAXI, BOOKS], ignore_unavailable=True)


docsuite = partial(doctest.DocFileSuite,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_ACCOUNT;
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK;
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_BANK_WITH_NULL_VALUES;
import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_PHRASE;
import static org.opensearch.sql.util.MatcherUtils.rows;
import static org.opensearch.sql.util.MatcherUtils.verifyDataRows;

Expand All @@ -23,6 +24,8 @@ public void init() throws IOException {
loadIndex(Index.ACCOUNT);
loadIndex(Index.BANK_WITH_NULL_VALUES);
loadIndex(Index.BANK);
loadIndex(Index.GAME_OF_THRONES);
loadIndex(Index.PHRASE);
}

@Test
Expand Down Expand Up @@ -110,4 +113,22 @@ public void testRelevanceFunction() throws IOException {
TEST_INDEX_BANK));
verifyDataRows(result, rows("Hattie"));
}

@Test
public void testMatchPhraseFunction() throws IOException {
JSONObject result =
executeQuery(
String.format(
"source=%s | where match_phrase(phrase, 'quick fox') | fields phrase", TEST_INDEX_PHRASE));
verifyDataRows(result, rows("quick fox"), rows("quick fox here"));
}

@Test
public void testMathPhraseWithSlop() throws IOException {
JSONObject result =
executeQuery(
String.format(
"source=%s | where match_phrase(phrase, 'brown fox', slop = 2) | fields phrase", TEST_INDEX_PHRASE));
verifyDataRows(result, rows("brown fox"), rows("fox brown"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import org.opensearch.sql.opensearch.storage.script.filter.lucene.RangeQuery.Comparison;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.TermQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.WildcardQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchPhraseQuery;
import org.opensearch.sql.opensearch.storage.script.filter.lucene.relevance.MatchQuery;
import org.opensearch.sql.opensearch.storage.serialization.ExpressionSerializer;

Expand All @@ -52,6 +53,8 @@ public class FilterQueryBuilder extends ExpressionNodeVisitor<QueryBuilder, Obje
.put(BuiltinFunctionName.GTE.getName(), new RangeQuery(Comparison.GTE))
.put(BuiltinFunctionName.LIKE.getName(), new WildcardQuery())
.put(BuiltinFunctionName.MATCH.getName(), new MatchQuery())
.put(BuiltinFunctionName.MATCH_PHRASE.getName(), new MatchPhraseQuery())
.put(BuiltinFunctionName.MATCHPHRASE.getName(), new MatchPhraseQuery())
.put(BuiltinFunctionName.QUERY.getName(), new MatchQuery())
.put(BuiltinFunctionName.MATCH_QUERY.getName(), new MatchQuery())
.put(BuiltinFunctionName.MATCHQUERY.getName(), new MatchQuery())
Expand Down
Loading