Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding access to noSubMatches and noOverlappingMatches #10765

Conversation

evankielley
Copy link
Contributor

@evankielley evankielley commented Oct 19, 2023

…CompoundWordTokenFilter

Description

This change adds support for / exposes two new settings (noSubMatches and noOverlappingMatches) that were added to Lucene's HyphenationCompoundWordTokenFilter class.

Related Issues

Resolves #8796

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…CompoundWordTokenFilter

Signed-off-by: Evan Kielley <evankielley@gmail.com>
@evankielley
Copy link
Contributor Author

@msfroh I closed my previous pull request #9836 (albeit not intentionally). This PR has the same content, I just need to update the CHANGELOG as well.

@github-actions
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 1a22b1b

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@msfroh
Copy link
Collaborator

msfroh commented Oct 24, 2023

@msfroh I closed my previous pull request #9836 (albeit not intentionally). This PR has the same content, I just need to update the CHANGELOG as well.

Thanks @evankielley!

Can you please also click the link in the PR checklist to create a documentation issue? It looks like we have a list of analyzers with links to the Lucene 8.7 JavaDocs: https://opensearch.org/docs/latest/analyzers/token-filters/index/. We should probably update those links to point to whatever version of Lucene was used for a given release.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@evankielley
Copy link
Contributor Author

@msfroh I closed my previous pull request #9836 (albeit not intentionally). This PR has the same content, I just need to update the CHANGELOG as well.

Thanks @evankielley!

Can you please also click the link in the PR checklist to create a documentation issue? It looks like we have a list of analyzers with links to the Lucene 8.7 JavaDocs: https://opensearch.org/docs/latest/analyzers/token-filters/index/. We should probably update those links to point to whatever version of Lucene was used for a given release.

No problem! Issue created here: opensearch-project/documentation-website#5389

@noCharger
Copy link
Contributor

Let's add some test coverage for this change.

@evankielley
Copy link
Contributor Author

Let's add some test coverage for this change.

It looks like there's no existing unit tests for this class. Should I create a new file for unit tests in modules/analysis-common/src/test/java/org/opensearch/analysis/ or is there some reason why unit tests were skipped for this class in the first place?

@noCharger
Copy link
Contributor

noCharger commented Oct 27, 2023

Let's add some test coverage for this change.

It looks like there's no existing unit tests for this class. Should I create a new file for unit tests in modules/analysis-common/src/test/java/org/opensearch/analysis/ or is there some reason why unit tests were skipped for this class in the first place?

One likely reason is that unit testing around the lucene library itself does not add much value, hence it is non-blocking for this PR. There are some good examples on testing just for reference #9479

Let's rebase to fix the gradle check failure https://build.ci.opensearch.org/job/gradle-check/28931/console

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':qa:mixed-cluster:v2.12.0#mixedClusterTest'.
> There were failing tests. See the report at: file:///var/jenkins/workspace/gradle-check/search/qa/mixed-cluster/build/reports/tests/v2.12.0%23mixedClusterTest/index.html

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Nov 27, 2023
@ticheng-aws
Copy link
Contributor

Hi @evankielley, Is this being worked upon? Feel free to reach out to maintainers for further reviews.

@opensearch-trigger-bot opensearch-trigger-bot bot removed the stalled Issues that have stalled label Jan 6, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Feb 8, 2024
@jed326
Copy link
Collaborator

jed326 commented Apr 23, 2024

Closing stalled PR due to inactivity for over 6 months. Please re-open a new PR if you would still like to submit this change. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request good first issue Good for newcomers low hanging fruit Search:Relevance Search Search query, autocomplete ...etc stalled Issues that have stalled
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Provide access to new settings for HyphenationCompoundWordTokenFilter
5 participants