Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224

nknize · 2018-11-02T22:40:09Z

This PR upgrades the master branch to lucene-8.0.0-snapshot-31d7dfe6b1. There are several changes that need to be reviewed:

removal of MultiFields.getFields and its impact on TermVectorsService is the main issue that needs reviewed /cc @romseygeek

Other changes include:

migration of Points encoding to selective indexing
change to TokenStreamComponents as a final class

jimczi

Thanks @nknize ! I left some comments

jimczi · 2018-11-05T08:01:22Z

...lysis-common/src/main/java/org/elasticsearch/analysis/common/XLowerCaseTokenizerFactory.java


-public class LowerCaseTokenizerFactory extends AbstractTokenizerFactory implements MultiTermAwareComponent {
+@Deprecated
+public class XLowerCaseTokenizerFactory extends AbstractTokenizerFactory {


Can we keep the old name ?

jimczi · 2018-11-05T08:01:25Z

...lysis-common/src/main/java/org/elasticsearch/analysis/common/XLowerCaseTokenizerFactory.java

-    @Override
-    public Object getMultiTermComponent() {
-        return this;
+        return new XLowerCaseTokenizer();


Can we use a LetterTokenizer followed by a LowerCaseFilter like explained in the deprecation javadocs ?
I don't think we need to keep an XLowerCaseTokenizer.

I'd hoped to do that, but unfortunately the contract of create demands that we return a Tokenizer, and I don't think there's an easy way of wrapping a Tokenizer + TokenFilter combination here?

ok, thanks for explaining

Can we at least restrict the usage to old indices (created before 7.0) in order to be able to remove it in 8 ?

As discussed with @romseygeek offline we'll handle the deprecation and removal in a follow up pr.

jimczi · 2018-11-05T08:01:46Z

...lysis-common/src/test/java/org/elasticsearch/analysis/common/CommonAnalysisFactoryTests.java

@@ -48,7 +48,7 @@ public CommonAnalysisFactoryTests() {
        tokenizers.put("edgengram", EdgeNGramTokenizerFactory.class);
        tokenizers.put("classic", ClassicTokenizerFactory.class);
        tokenizers.put("letter", LetterTokenizerFactory.class);
-        tokenizers.put("lowercase", LowerCaseTokenizerFactory.class);
+        // tokenizers.put("lowercase", XLowerCaseTokenizerFactory.class);


Why is it commented ?

The tests here are explicitly checking that we can load lucene analysis classes. LowercaseTokenizer isn't there any more, so this needs to be removed - the commenting out was just to get tests to pass.

jimczi · 2018-11-05T08:05:22Z

test/framework/src/main/java/org/elasticsearch/indices/analysis/AnalysisFactoryTestCase.java

@@ -77,7 +77,6 @@ private static String toCamelCase(String s) {
        .put("edgengram", MovedToAnalysisCommon.class)
        .put("keyword", MovedToAnalysisCommon.class)
        .put("letter", MovedToAnalysisCommon.class)
-        .put("lowercase", MovedToAnalysisCommon.class)


We still need this ? The tokenizer is deprecated, not yet removed ?

Again, this is checking for lucene classes that no longer exist.

…moval

jimczi

@nknize I pushed a fix for the failing tests and added a norelease comment regarding the LowercaseTokenizer deprecation/removal. We'll handle the deprecation/removal in a follow up to not block this pr. I am +1 to merge as is if the CI passes

s1monw

LGTM 2

nknize · 2018-11-05T17:39:40Z

Thanks @jimczi

I'm not familiar with the error just thrown:

11:25:39   1> [2018-11-05T17:25:37,187][WARN ][o.e.b.JNANatives         ] [[SUITE-AnnotatedTextHighlighterTests-seed#[496176D107CD889E]]] unable to install syscall filter: 
11:25:39   2> REPRODUCE WITH: ./gradlew :plugins:mapper-annotated-text:test -Dtests.seed=496176D107CD889E -Dtests.class=org.elasticsearch.search.highlight.AnnotatedTextHighlighterTests -Dtests.method="testAnnotatedTextStructuredMatch" -Dtests.security.manager=true -Dtests.locale=de-LU -Dtests.timezone=Etc/GMT+10 -Dcompiler.java=11 -Druntime.java=8
11:25:39   1> java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
11:25:39   1> 	at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:342) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]

jimczi

I fixed the failing tests, let's see if the CI passes now.

nknize · 2018-11-05T21:11:45Z

Looks like a new failure. Reproduces for me.

14:52:46 2> REPRODUCE WITH: ./gradlew :plugins:mapper-annotated-text:integTestRunner -Dtests.seed=638D69B8A84F3208 -Dtests.class=org.elasticsearch.index.mapper.annotatedtext.AnnotatedTextClientYamlTestSuiteIT -Dtests.method="test {yaml=mapper_annotatedtext/10_basic/annotated highlighter on annotated text}" -Dtests.security.manager=true -Dtests.locale=is -Dtests.timezone=Africa/Blantyre -Dcompiler.java=11 -Druntime.java=8

   > Throwable #1: java.lang.AssertionError: Failure at [mapper_annotatedtext/10_basic:38]: field [hits.hits.0.highlight.text.0] is null
   >    at __randomizedtesting.SeedInfo.seed([638D69B8A84F3208:EBD9566206B35FF0]:0)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:407)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:384)
   >    at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.lang.AssertionError: field [hits.hits.0.highlight.text.0] is null
   >    at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:79)
   >    at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:76)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:400)
   >    ... 38 more

nknize · 2018-11-06T02:10:04Z

looks like all tests passed but CentOS packaging is angry. Isn't that usually a flaky one anyway?

jimczi · 2018-11-06T11:16:29Z

@nknize I merged this pr and restored #35225 in 6x.

…-agg * master: (528 commits) Register Azure max_retries setting (elastic#35286) add version 6.4.4 [Docs] Add painless context details for bucket_script (elastic#35142) Upgrade jline to 3.8.2 (elastic#35288) SQL: new SQL CLI logo (elastic#35261) Logger: Merge ESLoggerFactory into Loggers (elastic#35146) Docs: Add section about range query for range type (elastic#35222) [ILM] change remove-policy-from-index http method from DELETE to POST (elastic#35268) [CCR] Forgot missing return statement, SQL: Fix null handling for AND and OR in SELECT (elastic#35277) [TEST] Mute ChangePolicyForIndexIT#testChangePolicyForIndex Serialize ignore_throttled also to 6.6 after backport Check for java 11 in buildSrc (elastic#35260) [TEST] increase await timeout in RemoteClusterConnectionTests Add missing up-to-date configuration (elastic#35255) Adapt Lucene BWC version SQL: Introduce Coalesce function (elastic#35253) Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (elastic#35224) Fix failing ICU tests (elastic#35207) Prevent throttled indices to be searched through wildcards by default (elastic#34354) ...

nknize and others added 8 commits November 2, 2018 13:46

upgrade to lucene-8.0.0-snapshot-ed8a395948

c861c14

Fix Annotated Text analyzer wrapping

7bd1dc9

Fix MatchQueryBuilderTests

c1ca8a8

upgrade to lucene-8.0.0-snapshot-31d7dfe6b1

a3c1282

remove unused imports

aa10fa5

hacky fixes to lowercase tokenizer tests

0781a49

change to pointIndexDimensions

2182fa1

remove unused imports

4b20850

nknize added >upgrade v7.0.0 labels Nov 2, 2018

nknize requested review from martijnvg and jimczi November 2, 2018 22:40

dnhatn mentioned this pull request Nov 5, 2018

Upgrade 6.x to lucene-7.6.0-snapshot-f9598f335b #35225

Merged

jimczi reviewed Nov 5, 2018

View reviewed changes

nknize requested a review from s1monw November 5, 2018 16:14

jimczi added 2 commits November 5, 2018 17:48

fix tests

6d5431b

add norelease comment regarding the LowercaseTokenizer deprecation/re…

1b358ee

…moval

jimczi approved these changes Nov 5, 2018

View reviewed changes

s1monw approved these changes Nov 5, 2018

View reviewed changes

fix another test

1ca6845

jimczi reviewed Nov 5, 2018

View reviewed changes

jimczi added 2 commits November 5, 2018 22:26

always wrap annotated highlighter analyzer

2c9e726

remove unused imports

88a0802

Merge branch 'master' into pr/35224

35a239f

jimczi merged commit a5e1f4d into elastic:master Nov 6, 2018

jimczi deleted the upgrade/lucene-8.0.0-31d7dfe6b1 branch November 6, 2018 10:55

jpountz added the >non-issue label Jan 28, 2019

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224

Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224

nknize commented Nov 2, 2018

jimczi left a comment

jimczi Nov 5, 2018

jimczi Nov 5, 2018

romseygeek Nov 5, 2018

jimczi Nov 5, 2018

jimczi Nov 5, 2018

jimczi Nov 5, 2018

jimczi Nov 5, 2018

romseygeek Nov 5, 2018

jimczi Nov 5, 2018

jimczi Nov 5, 2018

romseygeek Nov 5, 2018

jimczi left a comment

s1monw left a comment

nknize commented Nov 5, 2018

jimczi left a comment

nknize commented Nov 5, 2018

nknize commented Nov 6, 2018

jimczi commented Nov 6, 2018

Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224

Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224

Conversation

nknize commented Nov 2, 2018

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jimczi left a comment

Choose a reason for hiding this comment

s1monw left a comment

Choose a reason for hiding this comment

nknize commented Nov 5, 2018

jimczi left a comment

Choose a reason for hiding this comment

nknize commented Nov 5, 2018

nknize commented Nov 6, 2018

jimczi commented Nov 6, 2018