-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224
Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nknize ! I left some comments
|
||
public class LowerCaseTokenizerFactory extends AbstractTokenizerFactory implements MultiTermAwareComponent { | ||
@Deprecated | ||
public class XLowerCaseTokenizerFactory extends AbstractTokenizerFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the old name ?
@Override | ||
public Object getMultiTermComponent() { | ||
return this; | ||
return new XLowerCaseTokenizer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a LetterTokenizer followed by a LowerCaseFilter like explained in the deprecation javadocs ?
I don't think we need to keep an XLowerCaseTokenizer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hoped to do that, but unfortunately the contract of create
demands that we return a Tokenizer
, and I don't think there's an easy way of wrapping a Tokenizer
+ TokenFilter
combination here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks for explaining
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least restrict the usage to old indices (created before 7.0) in order to be able to remove it in 8 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with @romseygeek offline we'll handle the deprecation and removal in a follow up pr.
@@ -48,7 +48,7 @@ public CommonAnalysisFactoryTests() { | |||
tokenizers.put("edgengram", EdgeNGramTokenizerFactory.class); | |||
tokenizers.put("classic", ClassicTokenizerFactory.class); | |||
tokenizers.put("letter", LetterTokenizerFactory.class); | |||
tokenizers.put("lowercase", LowerCaseTokenizerFactory.class); | |||
// tokenizers.put("lowercase", XLowerCaseTokenizerFactory.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it commented ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests here are explicitly checking that we can load lucene analysis classes. LowercaseTokenizer
isn't there any more, so this needs to be removed - the commenting out was just to get tests to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thanks
@@ -77,7 +77,6 @@ private static String toCamelCase(String s) { | |||
.put("edgengram", MovedToAnalysisCommon.class) | |||
.put("keyword", MovedToAnalysisCommon.class) | |||
.put("letter", MovedToAnalysisCommon.class) | |||
.put("lowercase", MovedToAnalysisCommon.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need this ? The tokenizer is deprecated, not yet removed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is checking for lucene classes that no longer exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nknize I pushed a fix for the failing tests and added a norelease comment regarding the LowercaseTokenizer deprecation/removal. We'll handle the deprecation/removal in a follow up to not block this pr. I am +1 to merge as is if the CI passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 2
Thanks @jimczi I'm not familiar with the error just thrown:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the failing tests, let's see if the CI passes now.
Looks like a new failure. Reproduces for me.
|
looks like all tests passed but CentOS packaging is angry. Isn't that usually a flaky one anyway? |
…-agg * master: (528 commits) Register Azure max_retries setting (elastic#35286) add version 6.4.4 [Docs] Add painless context details for bucket_script (elastic#35142) Upgrade jline to 3.8.2 (elastic#35288) SQL: new SQL CLI logo (elastic#35261) Logger: Merge ESLoggerFactory into Loggers (elastic#35146) Docs: Add section about range query for range type (elastic#35222) [ILM] change remove-policy-from-index http method from DELETE to POST (elastic#35268) [CCR] Forgot missing return statement, SQL: Fix null handling for AND and OR in SELECT (elastic#35277) [TEST] Mute ChangePolicyForIndexIT#testChangePolicyForIndex Serialize ignore_throttled also to 6.6 after backport Check for java 11 in buildSrc (elastic#35260) [TEST] increase await timeout in RemoteClusterConnectionTests Add missing up-to-date configuration (elastic#35255) Adapt Lucene BWC version SQL: Introduce Coalesce function (elastic#35253) Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (elastic#35224) Fix failing ICU tests (elastic#35207) Prevent throttled indices to be searched through wildcards by default (elastic#34354) ...
This PR upgrades the master branch to lucene-8.0.0-snapshot-31d7dfe6b1. There are several changes that need to be reviewed:
MultiFields.getFields
and its impact onTermVectorsService
is the main issue that needs reviewed /cc @romseygeekOther changes include:
TokenStreamComponents
as a final class