Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 #35224

Merged
merged 14 commits into from
Nov 6, 2018

Conversation

nknize
Copy link
Contributor

@nknize nknize commented Nov 2, 2018

This PR upgrades the master branch to lucene-8.0.0-snapshot-31d7dfe6b1. There are several changes that need to be reviewed:

  • removal of MultiFields.getFields and its impact on TermVectorsService is the main issue that needs reviewed /cc @romseygeek

Other changes include:

  • migration of Points encoding to selective indexing
  • change to TokenStreamComponents as a final class

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nknize ! I left some comments


public class LowerCaseTokenizerFactory extends AbstractTokenizerFactory implements MultiTermAwareComponent {
@Deprecated
public class XLowerCaseTokenizerFactory extends AbstractTokenizerFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the old name ?

@Override
public Object getMultiTermComponent() {
return this;
return new XLowerCaseTokenizer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a LetterTokenizer followed by a LowerCaseFilter like explained in the deprecation javadocs ?
I don't think we need to keep an XLowerCaseTokenizer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hoped to do that, but unfortunately the contract of create demands that we return a Tokenizer, and I don't think there's an easy way of wrapping a Tokenizer + TokenFilter combination here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for explaining

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we at least restrict the usage to old indices (created before 7.0) in order to be able to remove it in 8 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with @romseygeek offline we'll handle the deprecation and removal in a follow up pr.

@@ -48,7 +48,7 @@ public CommonAnalysisFactoryTests() {
tokenizers.put("edgengram", EdgeNGramTokenizerFactory.class);
tokenizers.put("classic", ClassicTokenizerFactory.class);
tokenizers.put("letter", LetterTokenizerFactory.class);
tokenizers.put("lowercase", LowerCaseTokenizerFactory.class);
// tokenizers.put("lowercase", XLowerCaseTokenizerFactory.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it commented ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests here are explicitly checking that we can load lucene analysis classes. LowercaseTokenizer isn't there any more, so this needs to be removed - the commenting out was just to get tests to pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

@@ -77,7 +77,6 @@ private static String toCamelCase(String s) {
.put("edgengram", MovedToAnalysisCommon.class)
.put("keyword", MovedToAnalysisCommon.class)
.put("letter", MovedToAnalysisCommon.class)
.put("lowercase", MovedToAnalysisCommon.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need this ? The tokenizer is deprecated, not yet removed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is checking for lucene classes that no longer exist.

@nknize nknize requested a review from s1monw November 5, 2018 16:14
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nknize I pushed a fix for the failing tests and added a norelease comment regarding the LowercaseTokenizer deprecation/removal. We'll handle the deprecation/removal in a follow up to not block this pr. I am +1 to merge as is if the CI passes

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 2

@nknize
Copy link
Contributor Author

nknize commented Nov 5, 2018

Thanks @jimczi

I'm not familiar with the error just thrown:

11:25:39   1> [2018-11-05T17:25:37,187][WARN ][o.e.b.JNANatives         ] [[SUITE-AnnotatedTextHighlighterTests-seed#[496176D107CD889E]]] unable to install syscall filter: 
11:25:39   2> REPRODUCE WITH: ./gradlew :plugins:mapper-annotated-text:test -Dtests.seed=496176D107CD889E -Dtests.class=org.elasticsearch.search.highlight.AnnotatedTextHighlighterTests -Dtests.method="testAnnotatedTextStructuredMatch" -Dtests.security.manager=true -Dtests.locale=de-LU -Dtests.timezone=Etc/GMT+10 -Dcompiler.java=11 -Druntime.java=8
11:25:39   1> java.lang.UnsupportedOperationException: seccomp unavailable: CONFIG_SECCOMP not compiled into kernel, CONFIG_SECCOMP and CONFIG_SECCOMP_FILTER are needed
11:25:39   1> 	at org.elasticsearch.bootstrap.SystemCallFilter.linuxImpl(SystemCallFilter.java:342) ~[elasticsearch-7.0.0-alpha1-SNAPSHOT.jar:7.0.0-alpha1-SNAPSHOT]

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the failing tests, let's see if the CI passes now.

@nknize
Copy link
Contributor Author

nknize commented Nov 5, 2018

Looks like a new failure. Reproduces for me.

14:52:46 2> REPRODUCE WITH: ./gradlew :plugins:mapper-annotated-text:integTestRunner -Dtests.seed=638D69B8A84F3208 -Dtests.class=org.elasticsearch.index.mapper.annotatedtext.AnnotatedTextClientYamlTestSuiteIT -Dtests.method="test {yaml=mapper_annotatedtext/10_basic/annotated highlighter on annotated text}" -Dtests.security.manager=true -Dtests.locale=is -Dtests.timezone=Africa/Blantyre -Dcompiler.java=11 -Druntime.java=8

   > Throwable #1: java.lang.AssertionError: Failure at [mapper_annotatedtext/10_basic:38]: field [hits.hits.0.highlight.text.0] is null
   >    at __randomizedtesting.SeedInfo.seed([638D69B8A84F3208:EBD9566206B35FF0]:0)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:407)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.test(ESClientYamlSuiteTestCase.java:384)
   >    at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.lang.AssertionError: field [hits.hits.0.highlight.text.0] is null
   >    at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:79)
   >    at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:76)
   >    at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:400)
   >    ... 38 more

@nknize
Copy link
Contributor Author

nknize commented Nov 6, 2018

looks like all tests passed but CentOS packaging is angry. Isn't that usually a flaky one anyway?

@jimczi jimczi merged commit a5e1f4d into elastic:master Nov 6, 2018
@jimczi jimczi deleted the upgrade/lucene-8.0.0-31d7dfe6b1 branch November 6, 2018 10:55
@jimczi
Copy link
Contributor

jimczi commented Nov 6, 2018

@nknize I merged this pr and restored #35225 in 6x.

matarrese added a commit to matarrese/elasticsearch that referenced this pull request Nov 6, 2018
…-agg

* master: (528 commits)
  Register Azure max_retries setting (elastic#35286)
  add version 6.4.4
  [Docs] Add painless context details for bucket_script (elastic#35142)
  Upgrade jline to 3.8.2 (elastic#35288)
  SQL: new SQL CLI logo (elastic#35261)
  Logger: Merge ESLoggerFactory into Loggers (elastic#35146)
  Docs: Add section about range query for range type (elastic#35222)
  [ILM] change remove-policy-from-index http method from DELETE to POST (elastic#35268)
  [CCR] Forgot missing return statement,
  SQL: Fix null handling for AND and OR in SELECT (elastic#35277)
  [TEST] Mute ChangePolicyForIndexIT#testChangePolicyForIndex
  Serialize ignore_throttled also to 6.6 after backport
  Check for java 11 in buildSrc (elastic#35260)
  [TEST] increase await timeout in RemoteClusterConnectionTests
  Add missing up-to-date configuration (elastic#35255)
  Adapt Lucene BWC version
  SQL: Introduce Coalesce function (elastic#35253)
  Upgrade to lucene-8.0.0-snapshot-31d7dfe6b1 (elastic#35224)
  Fix failing ICU tests (elastic#35207)
  Prevent throttled indices to be searched through wildcards by default (elastic#34354)
  ...
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants