Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new configuration setting synonym_analyzer for synonym_graph and synonym. #16488

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

prudhvigodithi
Copy link
Contributor

@prudhvigodithi prudhvigodithi commented Oct 26, 2024

Description

Coming from #16263 (comment) this PR is not final and still need some discussion.

Related Issues

Part of #16263 and #16530.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for fb66d0f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines 199 to 204
try {
this.analysisModule = new AnalysisModule(environment,
List.of(this));
} catch (IOException e) {
throw new RuntimeException(e);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned with this approach of creating a new AnalysisModule with its own AnalysisRegistry.

If you define a custom analyzer, are you able to reference it from the synonym_analyzer parameter? I could be mistaken, but I think this module's registry only knows the built-in analyzers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that we could do is add a default method getTokenFilters(AnalysisModule) to the AnalysisPlugin interface that calls the existing getTokenFilters(). Then we could override the new method in CommonAnalysisModulePlugin. You'd also need to call the new method from setupTokenFilters.

It does mean that we would pass a reference to the partially-constructed AnalysisModule to getTokenFilters. As long as a consumer only holds onto the reference and doesn't try doing anything with it right away, it should be okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @msfroh,

  • With creation of new AnalysisModule, I was able to reference it from the synonym_analyzer parameter, also in my test not only the built-in analyzers but custom analyzer also worked [BUG] Token Filter Order: word_delimiter_graph and synonym_graph #16263 (comment).

  • I agree with your suggestion we can re-use the AnalysisModule and dont have to instantiate a new one. In the new commit I have updated the AnalysisPlugin section where getTokenFilters takes AnalysisModule as input. I have also updated the CommonAnalysisModulePlugin and setupTokenFilters.

Copy link
Contributor Author

@prudhvigodithi prudhvigodithi Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I will revert the changes made to buildSrc/src/main/java/org/opensearch/gradle/testclusters/OpenSearchNode.java, I was trying to test by adding config/hunspell/en_US/en-US.aff to config folder before startup (./gradlew run -PnumNodes=3) but OpenSearchNode.java does not copy the folder files, so was experimenting few things with OpenSearchNode.java I will open an issue separately for this topic.

Copy link
Contributor

❌ Gradle check result for 550756f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 7a5c00e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Nov 1, 2024

❌ Gradle check result for 17737cf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
Copy link
Contributor

github-actions bot commented Nov 2, 2024

❌ Gradle check result for 0332408: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Nov 2, 2024

✅ Gradle check result for 6feda4b: SUCCESS

Copy link

codecov bot commented Nov 2, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 3 lines in your changes missing coverage. Please review.

Project coverage is 71.97%. Comparing base (0363aa7) to head (db57995).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...rch/analysis/common/SynonymTokenFilterFactory.java 66.66% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16488      +/-   ##
============================================
- Coverage     72.00%   71.97%   -0.04%     
+ Complexity    65038    65006      -32     
============================================
  Files          5313     5313              
  Lines        303454   303483      +29     
  Branches      43910    43913       +3     
============================================
- Hits         218510   218427      -83     
- Misses        67040    67131      +91     
- Partials      17904    17925      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@prudhvigodithi
Copy link
Contributor Author

prudhvigodithi commented Nov 3, 2024

Hey @msfroh added tests similar to those described in the actual issue involving the synonym_graph filter used with word_delimiter_graph (see GitHub issue #16263) and updated the CHANGELOG accordingly. Please take a look at the PR and let me know if I'm missing anything.
Thank you
@getsaurabh02

Signed-off-by: Prudhvi Godithi <pgodithi@amazon.com>
Copy link
Contributor

github-actions bot commented Nov 3, 2024

✅ Gradle check result for db57995: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants