Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding access to noSubMatches and noOverlappingMatches in Hyphenation… #13895

Conversation

hasnain2808
Copy link
Contributor

@hasnain2808 hasnain2808 commented May 30, 2024

Description

This change adds support for / exposes two new settings (noSubMatches and noOverlappingMatches) that were added to Lucene's HyphenationCompoundWordTokenFilter class.

Related Issues

Resolves #8796
Based on of #10765

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers low hanging fruit Search Search query, autocomplete ...etc Search:Relevance labels May 30, 2024
@hasnain2808 hasnain2808 force-pushed the issue-8796/expose-new-lucene-filter-settings branch 2 times, most recently from 5abc8ec to 3d5ffdc Compare May 30, 2024 14:16
Copy link
Contributor

❌ Gradle check result for 5abc8ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 5abc8ec: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 7b2142e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
@jainankitk
Copy link
Collaborator

@hasnain2808 - It seems the spotless check is failing. Can you fix those?

Execution failed for task ':modules:analysis-common:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/analysis/common/CompoundAnalysisTests.java
          @@ -35,7 +35,6 @@
           import·org.apache.lucene.analysis.Analyzer;
           import·org.apache.lucene.analysis.TokenStream;
           import·org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
          -import·org.junit.Before;
           import·org.opensearch.Version;
           import·org.opensearch.cluster.metadata.IndexMetadata;
           import·org.opensearch.common.settings.Settings;
          @@ -51,6 +50,7 @@
           import·org.opensearch.test.IndexSettingsModule;
           import·org.opensearch.test.OpenSearchTestCase;
           import·org.hamcrest.MatcherAssert;
          +import·org.junit.Before;
           
           import·java.io.IOException;
           import·java.io.InputStream;
  Run './gradlew :modules:analysis-common:spotlessApply' to fix these violations.

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
@jainankitk jainankitk added the backport 2.x Backport to 2.x branch label Aug 13, 2024
@hasnain2808
Copy link
Contributor Author

@hasnain2808 - It seems the spotless check is failing. Can you fix those?

Execution failed for task ':modules:analysis-common:spotlessJavaCheck'.
> The following files had format violations:
      src/test/java/org/opensearch/analysis/common/CompoundAnalysisTests.java
          @@ -35,7 +35,6 @@
           import·org.apache.lucene.analysis.Analyzer;
           import·org.apache.lucene.analysis.TokenStream;
           import·org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
          -import·org.junit.Before;
           import·org.opensearch.Version;
           import·org.opensearch.cluster.metadata.IndexMetadata;
           import·org.opensearch.common.settings.Settings;
          @@ -51,6 +50,7 @@
           import·org.opensearch.test.IndexSettingsModule;
           import·org.opensearch.test.OpenSearchTestCase;
           import·org.hamcrest.MatcherAssert;
          +import·org.junit.Before;
           
           import·java.io.IOException;
           import·java.io.InputStream;
  Run './gradlew :modules:analysis-common:spotlessApply' to fix these violations.

Done
Weird this error was missed

Copy link
Collaborator

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh @mch2 - Can one of you help merge this change?

Copy link
Contributor

❌ Gradle check result for 6a88bb0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 8752b76: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@hasnain2808
Copy link
Contributor Author

hasnain2808 commented Aug 19, 2024

@msfroh @mch2 - Can one of you help merge this change?

@msfroh @mch2 could you please have a look at this mini pr 🙂

Copy link
Contributor

❌ Gradle check result for 8752b76: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 8752b76: SUCCESS

@hasnain2808
Copy link
Contributor Author

I cannot merge even after approval 😢
Need your help again @msfroh 😄

@jainankitk jainankitk merged commit ce64fac into opensearch-project:main Aug 21, 2024
37 of 39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 21, 2024
#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: linting

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

---------

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
(cherry picked from commit ce64fac)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@jainankitk
Copy link
Collaborator

I cannot merge even after approval 😢 Need your help again @msfroh 😄

Merged! :)

jainankitk pushed a commit that referenced this pull request Aug 21, 2024
#13895) (#15329)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter



* Add Changelog Entry



* test: add hyphenation decompounder tests



* test: refactor tests



* test: reformat test files



* chore: add changelog entry for 2.X



* chore: remove 3.x changelog



* chore: commonify settingsarr



* chore: commonify settingsarr



* chore: linting



---------





(cherry picked from commit ce64fac)

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: linting

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

---------

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
shiv0408 added a commit to shiv0408/OpenSearch that referenced this pull request Sep 2, 2024
* Optimize global ordinal includes/excludes for prefix matching (opensearch-project#14371)

* Optimize global ordinal includes/excludes for prefix matching

If an aggregration specifies includes or excludes based on a regular
expression, and the regular expression has a finite expansion followed
by .*, then we can optimize the global ordinal filter.

Specifically, in this case, we can expand the matching prefixes, then
include/exclude the range of global ordinals that start with each
prefix.

Signed-off-by: Michael Froh <froh@amazon.com>

* Add unit test

Signed-off-by: Michael Froh <froh@amazon.com>

* Add changelog entry

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Updated the unit test to be functionally equivalent, but it covers
more of the regex logic.

Signed-off-by: Michael Froh <froh@amazon.com>

* Improve test coverage

Signed-off-by: Michael Froh <froh@amazon.com>

* Fix bug in exclude-only case with no doc values in segment

Signed-off-by: Michael Froh <froh@amazon.com>

* Address comments from @mch2

Signed-off-by: Michael Froh <froh@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>

* Adding access to noSubMatches and noOverlappingMatches in Hyphenation… (opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: linting

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

---------

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>

* Add Settings related to Workload Management feature (opensearch-project#15028)

* add QeryGroup Service tests
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add PR to changelog
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* change the test directory
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* modify comments to be more specific
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add test coverage
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* remove QUERY_GROUP_RUN_INTERVAL_SETTING as we'll define it in QueryGroupService
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* Update affiliation for @nknize. (opensearch-project#15322)

Signed-off-by: dblock <dblock@amazon.com>

* Add log when download completes with file size (opensearch-project#15224)

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>

* Support Filtering on Large List encoded by Bitmap (version update) (opensearch-project#15352)

Signed-off-by: Andriy Redko <andriy.redko@aiven.io>

* Add support for index level slice count setting (opensearch-project#15336)

Signed-off-by: Ganesh Ramadurai <gramadur@amazon.com>

* Adding allowlist setting for ingest-useragent and ingest-geoip processors (opensearch-project#15325)

* Adding allowlist setting for user-agent, geo-ip and updated tests for ingest-common.

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Remove duplicate test in ingest-common

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Adding changelog

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

---------

Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>

* Add Delete QueryGroup API Logic (opensearch-project#14735)

* Add Delete QueryGroup API Logic
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* modify changelog
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* include comments from create pr
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* remove delete all
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* rebase and address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* rebase
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* address comments
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* add UT coverage
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>

* [Star Tree] Lucene Abstractions for Star Tree File Formats  (opensearch-project#15278)

---------
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>

* [Star tree] Changes to handle derived metrics such as avg as part of star tree mapping (opensearch-project#15152)

---------
Signed-off-by: Bharathwaj G <bharath78910@gmail.com>

* relaxing the join validation for nodes which have only store disabled but only publication enabled

* relaxing the join validation for nodes which have only store disabled but only publication enabled

Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>

---------

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Signed-off-by: dblock <dblock@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
Signed-off-by: Ganesh Ramadurai <gramadur@amazon.com>
Signed-off-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
Signed-off-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
Co-authored-by: Michael Froh <froh@amazon.com>
Co-authored-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
Co-authored-by: Ruirui Zhang <mariazrr@amazon.com>
Co-authored-by: Daniel (dB.) Doubrovkine <dblock@amazon.com>
Co-authored-by: Gaurav Bafna <85113518+gbbafna@users.noreply.github.com>
Co-authored-by: Andriy Redko <andriy.redko@aiven.io>
Co-authored-by: Ganesh Krishna Ramadurai <gramadur@icloud.com>
Co-authored-by: Sarat Vemulapalli <vemulapallisarat@gmail.com>
Co-authored-by: Sarthak Aggarwal <sarthagg@amazon.com>
Co-authored-by: Bharathwaj G <bharath78910@gmail.com>
Co-authored-by: Rajiv Kumar Vaidyanathan <rajivkv@amazon.com>
akolarkunnu pushed a commit to akolarkunnu/OpenSearch that referenced this pull request Sep 10, 2024
opensearch-project#13895)

* Adding access to noSubMatches and noOverlappingMatches in HyphenationCompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>

* Add Changelog Entry

Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>

* test: add hyphenation decompounder tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: refactor tests

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* test: reformat test files

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: add changelog entry for 2.X

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: remove 3.x changelog

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: commonify settingsarr

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

* chore: linting

Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>

---------

Signed-off-by: Evan Kielley <evankielley@gmail.com>
Signed-off-by: Mohammad Hasnain Mohsin Rajan <hasnain2808@gmail.com>
Signed-off-by: Mohammad Hasnain <hasnain2808@gmail.com>
Co-authored-by: Evan Kielley <evankielley@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers low hanging fruit Search:Relevance Search Search query, autocomplete ...etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide access to new settings for HyphenationCompoundWordTokenFilter
5 participants